Genome - Wide co - Localization of somatic copy number alterations and germline... - Marcin imielinski

So cancer is a disease of the genome and although arguably you can make that statement about a lot of diseases, part of what makes cancer fascinating is that there are two distinct ways in which that statement is true. Like many other diseases there are germline risk variance that confer a lifetime risk to developing cancer but there are also somatic alterations that develop in the cancer tissue that directly contribute to its rigenesis, metastasis, drug resistance and other phenotypes. The heritability of cancer has been known for a while from a family and twin studies and it varies quite a bit from tumor type to tumor type. We know that the inheritance and the heritability of cancer is not simple and it’s mediated very likely with a complex polygenic inheritance. A part of the genetic revolutions of the '80s and '90s was the use of positional cloning and family studies to discover rare, highly penetrant variants that mediate a familial cancer syndromes and the sort of highlights of these were the discovery of p53 and Li-Fraumeni syndrome, Apc and familial colon cancer, MLH/MSH genes in Lynch syndrome, BRCA1/BRCA2 in familial breast cancer and, and RB1 retinoblastoma. Together all these high risk, high penetrants variants explain a small percentage of the heritable risk for various tumor types. More recently the emergence of genome-wide association studies examining common SNP variation in case of control cohorts have yielded a host of loci in both cancer and non-cancer disease analysis and -- sorry, can I use this for a laser pointer or is this? Oh here, oh here’s the pointer, okay, great, thanks, okay. And so over the last five years there’ve been a number of GWAS, cancer GWASs that have yielded over 300 loci and over 20 cancer types, and together the -- these loci can explain as much as, sorry, 10, 23, and six percent of the heritable risk for breast, prostate and colorectal cancer respectively. So here’s a sort of a large scale picture from a review in 2010 of cancer of GWAS loci and we see that there are a number of these loci, around the genome. The majority of the GWAS loci have been discovered are for prostate, pancreatic, lung cancer, hematopoietic tumors and breast cancer. A lot of these loci lie in intergenic regions and their role in disease has not been functional yet listed. In familial germline loci there’s a lot of precedence for genes that are mutated in the germline families but also are somatically altered in cancer and of course among these are p53, Apc, RB1, CDKN2A, VHL, NF1. A lot of these are tumor suppressor genes which undergo a two hit alteration whereby they’re activated in the germline and then undergo a second hit in the tumor tissue to become lost in -- to undergo loss of function. So we decided to examine the interplay of common variant loci discovered in GWAS studies and somatic copy number alterations to examine this hypothesis on the genome scale. So we took 297 cancer loci document to the NHGRI GWAS database from over 80 GWASs and then took a list of from a recently published study by Beroukhim and colleagues at the Broad in 2010 examining somatic copy number alterations across over 3,000 tumors in over 20 tumor types, and this study yielded 258 copy number peak regions and our approach was quite simple to basically examine the overlap between these two regions of loci and determine whether it was significant against a null model built using permutations. So briefly the way we assembled the data was obtain these loci from the GWAS database, these are usually reported as a single SNP. We of course had to handpick the cancer related GWASs and then for each SNP divide -- define a locus using the linkage the neighborhood of that SNP, and with that sort of collapsing loci into unique regions we found 219 unique GWAS loci covering 1.1 percent of the mappable genome. On the SCNA side we took all the loci reported in colleague study in 2010, and basically we took the pan tumor analysis combined with all the tumor type subanalyses and found 258 total SCNA peak regions comprising 8.4 percent of the mappable genome and that included, I’m sorry, 198 amplification hotspots and 67 deletion peak regions. So crossing these two loci we see the plot shown here and this is a genome-wide plot where the GWAS loci are represented, they lie on the chromosome as these little dots and then we see the SCNA peak regions hovering over each chromosome and then the intersecting overlapping loci are shown in red. And so among these intersecting loci, we have a lot of known cancer genes as well as novel loci. So to determine the significance of this overlap we performed a genome-wide permutation, actually a number of permutations and what we found was a strikingly significant overlap in GWAS loci and frequently altered or frequently amplified regions of the genome and cancer. In contrast when we analyze the -- when we compared cancer GWAS loci against commonly deleted peak regions, we saw the distribution of overlap that’s shown here in the left plot and following permutation we found actually that this intersection was not significant. And so this is quite interesting, we pursued a second line of sort of orthogonal line of investigating this overlap by comparing cancer related loci versus non-cancer related GWAS regions and we found that cancer related GWAS regions were enriched, significantly enriched and overlap with amplification peak regions, but again not -- did not frequently overlap with deletion SCNAs. So one feature of the GWAS findings across all the cancer germline analyses is that there’s a hotspot of association where there’s multiple loci in prostate cancer and GBM and other cancer types on the region of 8q24 near the locus. So to examine whether our results were robust to the removal of this region which is also frequently known to be amplified we did a separate analysis where we excluded these loci and we still found a significant colocalization of these amplification peak regions with cancer GWAS loci outside of the nic locus. We applied this analysis to a different subtype analyses and we also saw significant association in the lung and hematopoietic overlap with lung and hematopoietic peak regions and cancer GWAS loci. So this -- these results show an interesting correlation or colocalization and they could be explained very simply by regions of -- or genes that tend to undergo somatic alteration frequently in some patients and then in other patients are altered -- are mutated in the germline genome and confer lifetime risk but it doesn’t really suggest any kind of germline somatic interplay. For example, in commonly the classic model of -- two hit model of tumor suppressor loss involves a germline inactivation followed by a somatic alteration that results in loss of function. So we wanted to ask this question of whether a germline SNP status actually confers risk for a specific somatic alterations, in this case amplifications and a really interesting precedent for this was published a few years ago where a common variant SNP conferring risk to myeloproliferative disease located in JAK2 actually was shown to be predisposed to development of somatic point mutation in JAK2 in these same neoplasms and what was really fascinating was that the actual risk conferring germline common variance SNP would occur on the exact same -- on the same haplotype as the somatic mutation. So we wanted to pursue this question in the context of copy number alterations. So how would we detect this? Well if a -- if we have a heterozygous SNP in the tumor in a patient’s tumor tissue, this is a germline heterozygous SNP and a tumor decides to undergo a copy number alterations case amplification at that locus, well it might sometimes chose the C locus, the locus containing the C SNP to amplify but other times it may chose the locus containing the A SNP to amplify. And if the tumor is sort of equivocal, it does not care about the status of these, of that germline allele well then it will -- it’ll amplify these two different alleles at an equal proportion. However, if there’s something special that it sees on that C allele that maybe it likes, it’s positively selected for, then we’ll see it amplifying that C allele time and time again. And so the way that you can test this is something, using something called the allelic distortion test, and this was first proposed by Dewal and colleagues in Bioinformatics in 2009. They said each heterozygous SNP that undergoes a copy number alteration we can measure how frequently one allele versus the other alleles amplified or deleted and then we can test the significant deviation of this frequency from one half just using square distribution or Fisher’s exact test. So basically if we see that the tumor amplifies an allele in allele A in 30 out of 35 hets and that may be evidence for some kind of allelic bias and a germline somatic interplay. So we examined this in several data sets including the original GCM data from which the global cancer, the global cancer map data and TCGA data using 6.0 SNP arrays and we were sort of disappointed to find that zero of these 36 loci, cancer GWAS loci that intersect somatic amplification peak showed significant allelic distortion. However, we applied this scan genome-wide and we found a significant allelic distortion at 11q13. This allelic distortion occurred zooming in on this locus we found that this is a SNP that lied 100KB upstream of the cyclin D1 locus and near the myovit gene and examining the individual events that contribute to this signal, we found that 44 out of the 50 tumors that amplify this SNP chose the C allele over the T allele and we can see here the individual tumors that support the signal and we see that across many tumor types it would be -- we’re seeing many tumor types that amplify this locus which is of course a frequently amplified region and it’s -- and they tend to choose this C allele over the T allele. We applied the same analysis in TCGA 6.0 SNP data or spanning over 2,000 tumors and again we saw many tumor types, ovarian, breast, lung, amplifying this locus and choosing this C allele over this T allele. This is strikingly significant. We saw the same effect in the Broad-Novartis Cancer Cell Line Encyclopedia and this is a set of over 900 cell lines that have been profiled with both expression copy number sequencing we saw the same effect, which is quite significant. So this is an interesting result partly because it does actually lie in obviously a frequently amplified region. It lies not too far from the GWAS peak but does not lie in actual -- in an actual significant GWAS region. So the question is what is the biology that may be driving this? We clearly see that these tumors tend to preferentially amplify the C allele over the T allele at this locus. Does this C allele, germline C allele carry some kind of selective advantage that the tumor’s really going after? For example, is that cyclin D1 associated that’s nearby more expressible or somehow has a selectively advantageous genotype? Or perhaps maybe there’s some kind of interaction with the amplification machinery that maybe alters that local amplification rate, somatic amplification rate and that phenotype is carried somewhere on, you know, by that C allele or it’s tagged by that C allele? So we took some steps to examine this hypothesis looking at the Cell Line Encyclopedia and examined the interaction of the allelic status of this SNP with total copy number and expression, to examine whether it was an expression quantitative trait locus at this SNP and we found a mild but significant effect whereby with increasing doses of allele C when we control for total copy number we actually see increased expression of cyclin D1. We do not see this eQTL with any of the other genes in this locus. So in summary we see a significant overlap of germline GWAS peaks or loci and SCNA peak regions across cancer types and this is predominantly with amplifications. Also I didn’t show an amplifications plus deletions, but not with deletions only. As far as we know this is the first evidence for germ -- genome-wide colocalization of germline susceptibility variants and somatically altered loci and cyclin D1 is an interesting candidate cis-somatic trait locus or cis-STL and we’re investigating that further. And I’ll take any questions next. Oh, and sorry. I’d like to thank the -- my collaborators on this effort are Scott Carter, Rameen Beroukhim, Craig Mermel, Gaddy Getz, and my mentor Matthew Meyerson. Thanks. [applause] Male Speaker: Thank you Marcin, that was a great, great talk. Questions? [inaudible] Female Speaker: Hi. Why did you choose to just use the heterozygotes and not use the two homozygotes as well? Marcin Imielinski: So the in the setting of the of a heterozygous SNP we can come up with a simple statistical test that would determine some sort of selective advantage that or some sort of an effect. I think we’ve also -- so that test I think is a good test because it’s less -- so the other alternative is to use a trend test with sort of a case control test where we look at amplify and non-amplify -- or patients that are amplifying versus not amplifying that locus and compare it. For example, minor allele frequencies. That test is, tends to be more prone to population stratification and yields a lot of -- yields messier results, although it is potentially powerful. So both are good ways of approaching this problem. The ADT is more similar to the TDT in germline genomics which again is also less prone to population stratification but is perhaps less powerful, requires more samples, but we’ve actually tried both. We just achieve cleaner results with this test. Male Speaker: Yeah, just a question. Have you taken the copy number variation into account or have you considered that as another possible way that you could get amplifications or deletions? Marcin Imielinski: Oh, germline copy number. Male Speaker: Yeah, germline polymorphism. Marcin Imielinski: Right. So in the context of this data we are only looking at somatic copy number events. But that certainly could be another driver of this kind of effect and I mean we’re only looking also at germline single nucleotide polymorphism. A more general application of this strategy would be to look at large -- other kinds of somatic variance and other kinds of germline variance and that’s certainly a great direction of additional exploration. Male Speaker: I have a quick question. So for the GWAS peaks have you attempted to further stratify them by looking at the heterozygotes and trying to discern whether or not the mode of inheritance is dominant versus recessive among the GWAS peaks and then repeating your analysis to see if there’s a difference? Marcin Imielinski: That would be a great analysis. Unfortunately a lot of the GWAS data that we’re using we don’t have the underlying genome types. Those are either not accessible or require additional -- we haven’t been able to access yet. So definitely that’s a -- that would be an interesting analysis also. Analyzing the summary stats of some of these GWASs to examine a loci that perhaps fell under the genome-wide significance threshold but happened to colocalize with a copy number. Somatic copy number region would be an interesting way of finding additional germline effects, so. Male Speaker: Okay, thank you. Marcin Imielinski: Thanks.