Tip:
Highlight text to annotate it
X
So cancer is a disease of the genome and although arguably you can make that statement about
a lot of diseases, part of what makes cancer fascinating is that there are two distinct
ways in which that statement is true. Like many other diseases there are germline risk
variance that confer a lifetime risk to developing cancer but there are also somatic alterations
that develop in the cancer tissue that directly contribute to its rigenesis, metastasis, drug
resistance and other phenotypes.
The heritability of cancer has been known for a while from a family and twin studies
and it varies quite a bit from tumor type to tumor type. We know that the inheritance
and the heritability of cancer is not simple and it’s mediated very likely with a complex
polygenic inheritance. A part of the genetic revolutions of the '80s and '90s was the use
of positional cloning and family studies to discover rare, highly penetrant variants that
mediate a familial cancer syndromes and the sort of highlights of these were the discovery
of p53 and Li-Fraumeni syndrome, Apc and familial colon cancer, MLH/MSH genes in Lynch syndrome,
BRCA1/BRCA2 in familial breast cancer and, and RB1 retinoblastoma. Together all these
high risk, high penetrants variants explain a small percentage of the heritable risk for
various tumor types.
More recently the emergence of genome-wide association studies examining common SNP variation
in case of control cohorts have yielded a host of loci in both cancer and non-cancer
disease analysis and -- sorry, can I use this for a laser pointer or is this? Oh here, oh
here’s the pointer, okay, great, thanks, okay. And so over the last five years there’ve
been a number of GWAS, cancer GWASs that have yielded over 300 loci and over 20 cancer types,
and together the -- these loci can explain as much as, sorry, 10, 23, and six percent
of the heritable risk for breast, prostate and colorectal cancer respectively.
So here’s a sort of a large scale picture from a review in 2010 of cancer of GWAS loci
and we see that there are a number of these loci, around the genome. The majority of the
GWAS loci have been discovered are for prostate, pancreatic, lung cancer, hematopoietic tumors
and breast cancer. A lot of these loci lie in intergenic regions and their role in disease
has not been functional yet listed.
In familial germline loci there’s a lot of precedence for genes that are mutated in
the germline families but also are somatically altered in cancer and of course among these
are p53, Apc, RB1, CDKN2A, VHL, NF1. A lot of these are tumor suppressor genes which
undergo a two hit alteration whereby they’re activated in the germline and then undergo
a second hit in the tumor tissue to become lost in -- to undergo loss of function.
So we decided to examine the interplay of common variant loci discovered in GWAS studies
and somatic copy number alterations to examine this hypothesis on the genome scale. So we
took 297 cancer loci document to the NHGRI GWAS database from over 80 GWASs and then
took a list of from a recently published study by Beroukhim and colleagues at the Broad in
2010 examining somatic copy number alterations across over 3,000 tumors in over 20 tumor
types, and this study yielded 258 copy number peak regions and our approach was quite simple
to basically examine the overlap between these two regions of loci and determine whether
it was significant against a null model built using permutations.
So briefly the way we assembled the data was obtain these loci from the GWAS database,
these are usually reported as a single SNP. We of course had to handpick the cancer related
GWASs and then for each SNP divide -- define a locus using the linkage the neighborhood
of that SNP, and with that sort of collapsing loci into unique regions we found 219 unique
GWAS loci covering 1.1 percent of the mappable genome. On the SCNA side we took all the loci
reported in colleague study in 2010, and basically we took the pan tumor analysis combined with
all the tumor type subanalyses and found 258 total SCNA peak regions comprising 8.4 percent
of the mappable genome and that included, I’m sorry, 198 amplification hotspots and
67 deletion peak regions.
So crossing these two loci we see the plot shown here and this is a genome-wide plot
where the GWAS loci are represented, they lie on the chromosome as these little dots
and then we see the SCNA peak regions hovering over each chromosome and then the intersecting
overlapping loci are shown in red. And so among these intersecting loci, we have a lot
of known cancer genes as well as novel loci. So to determine the significance of this overlap
we performed a genome-wide permutation, actually a number of permutations and what we found
was a strikingly significant overlap in GWAS loci and frequently altered or frequently
amplified regions of the genome and cancer. In contrast when we analyze the -- when we
compared cancer GWAS loci against commonly deleted peak regions, we saw the distribution
of overlap that’s shown here in the left plot and following permutation we found actually
that this intersection was not significant.
And so this is quite interesting, we pursued a second line of sort of orthogonal line of
investigating this overlap by comparing cancer related loci versus non-cancer related GWAS
regions and we found that cancer related GWAS regions were enriched, significantly enriched
and overlap with amplification peak regions, but again not -- did not frequently overlap
with deletion SCNAs.
So one feature of the GWAS findings across all the cancer germline analyses is that there’s
a hotspot of association where there’s multiple loci in prostate cancer and GBM and other
cancer types on the region of 8q24 near the locus. So to examine whether our results were
robust to the removal of this region which is also frequently known to be amplified we
did a separate analysis where we excluded these loci and we still found a significant
colocalization of these amplification peak regions with cancer GWAS loci outside of the
nic locus. We applied this analysis to a different subtype analyses and we also saw significant
association in the lung and hematopoietic overlap with lung and hematopoietic peak regions
and cancer GWAS loci.
So this -- these results show an interesting correlation or colocalization and they could
be explained very simply by regions of -- or genes that tend to undergo somatic alteration
frequently in some patients and then in other patients are altered -- are mutated in the
germline genome and confer lifetime risk but it doesn’t really suggest any kind of germline
somatic interplay. For example, in commonly the classic model of -- two hit model of tumor
suppressor loss involves a germline inactivation followed by a somatic alteration that results
in loss of function.
So we wanted to ask this question of whether a germline SNP status actually confers risk
for a specific somatic alterations, in this case amplifications and a really interesting
precedent for this was published a few years ago where a common variant SNP conferring
risk to myeloproliferative disease located in JAK2 actually was shown to be predisposed
to development of somatic point mutation in JAK2 in these same neoplasms and what was
really fascinating was that the actual risk conferring germline common variance SNP would
occur on the exact same -- on the same haplotype as the somatic mutation. So we wanted to pursue
this question in the context of copy number alterations. So how would we detect this?
Well if a -- if we have a heterozygous SNP in the tumor in a patient’s tumor tissue,
this is a germline heterozygous SNP and a tumor decides to undergo a copy number alterations
case amplification at that locus, well it might sometimes chose the C locus, the locus
containing the C SNP to amplify but other times it may chose the locus containing the
A SNP to amplify. And if the tumor is sort of equivocal, it does not care about the status
of these, of that germline allele well then it will -- it’ll amplify these two different
alleles at an equal proportion. However, if there’s something special that it sees on
that C allele that maybe it likes, it’s positively selected for, then we’ll see
it amplifying that C allele time and time again.
And so the way that you can test this is something, using something called the allelic distortion
test, and this was first proposed by Dewal and colleagues in Bioinformatics in 2009.
They said each heterozygous SNP that undergoes a copy number alteration we can measure how
frequently one allele versus the other alleles amplified or deleted and then we can test
the significant deviation of this frequency from one half just using square distribution
or Fisher’s exact test. So basically if we see that the tumor amplifies an allele
in allele A in 30 out of 35 hets and that may be evidence for some kind of allelic bias
and a germline somatic interplay.
So we examined this in several data sets including the original GCM data from which the global
cancer, the global cancer map data and TCGA data using 6.0 SNP arrays and we were sort
of disappointed to find that zero of these 36 loci, cancer GWAS loci that intersect somatic
amplification peak showed significant allelic distortion. However, we applied this scan
genome-wide and we found a significant allelic distortion at 11q13. This allelic distortion
occurred zooming in on this locus we found that this is a SNP that lied 100KB upstream
of the cyclin D1 locus and near the myovit gene and examining the individual events that
contribute to this signal, we found that 44 out of the 50 tumors that amplify this SNP
chose the C allele over the T allele and we can see here the individual tumors that support
the signal and we see that across many tumor types it would be -- we’re seeing many tumor
types that amplify this locus which is of course a frequently amplified region and it’s
-- and they tend to choose this C allele over the T allele. We applied the same analysis
in TCGA 6.0 SNP data or spanning over 2,000 tumors and again we saw many tumor types,
ovarian, breast, lung, amplifying this locus and choosing this C allele over this T allele.
This is strikingly significant.
We saw the same effect in the Broad-Novartis Cancer Cell Line Encyclopedia and this is
a set of over 900 cell lines that have been profiled with both expression copy number
sequencing we saw the same effect, which is quite significant. So this is an interesting
result partly because it does actually lie in obviously a frequently amplified region.
It lies not too far from the GWAS peak but does not lie in actual -- in an actual significant
GWAS region. So the question is what is the biology that may be driving this? We clearly
see that these tumors tend to preferentially amplify the C allele over the T allele at
this locus. Does this C allele, germline C allele carry some kind of selective advantage
that the tumor’s really going after? For example, is that cyclin D1 associated that’s
nearby more expressible or somehow has a selectively advantageous genotype? Or perhaps maybe there’s
some kind of interaction with the amplification machinery that maybe alters that local amplification
rate, somatic amplification rate and that phenotype is carried somewhere on, you know,
by that C allele or it’s tagged by that C allele?
So we took some steps to examine this hypothesis looking at the Cell Line Encyclopedia and
examined the interaction of the allelic status of this SNP with total copy number and expression,
to examine whether it was an expression quantitative trait locus at this SNP and we found a mild
but significant effect whereby with increasing doses of allele C when we control for total
copy number we actually see increased expression of cyclin D1. We do not see this eQTL with
any of the other genes in this locus.
So in summary we see a significant overlap of germline GWAS peaks or loci and SCNA peak
regions across cancer types and this is predominantly with amplifications. Also I didn’t show
an amplifications plus deletions, but not with deletions only. As far as we know this
is the first evidence for germ -- genome-wide colocalization of germline susceptibility
variants and somatically altered loci and cyclin D1 is an interesting candidate cis-somatic
trait locus or cis-STL and we’re investigating that further.
And I’ll take any questions next. Oh, and sorry. I’d like to thank the -- my collaborators
on this effort are Scott Carter, Rameen Beroukhim, Craig Mermel, Gaddy Getz, and my mentor Matthew
Meyerson. Thanks.
[applause]
Male Speaker: Thank you Marcin, that was a great, great
talk. Questions? [inaudible]
Female Speaker: Hi. Why did you choose to just use the heterozygotes
and not use the two homozygotes as well?
Marcin Imielinski: So the in the setting of the of a heterozygous
SNP we can come up with a simple statistical test that would determine some sort of selective
advantage that or some sort of an effect. I think we’ve also -- so that test I think
is a good test because it’s less -- so the other alternative is to use a trend test with
sort of a case control test where we look at amplify and non-amplify -- or patients
that are amplifying versus not amplifying that locus and compare it. For example, minor
allele frequencies. That test is, tends to be more prone to population stratification
and yields a lot of -- yields messier results, although it is potentially powerful. So both
are good ways of approaching this problem. The ADT is more similar to the TDT in germline
genomics which again is also less prone to population stratification but is perhaps less
powerful, requires more samples, but we’ve actually tried both. We just achieve cleaner
results with this test.
Male Speaker: Yeah, just a question. Have you taken the
copy number variation into account or have you considered that as another possible way
that you could get amplifications or deletions?
Marcin Imielinski: Oh, germline copy number.
Male Speaker: Yeah, germline polymorphism.
Marcin Imielinski: Right. So in the context of this data we are
only looking at somatic copy number events. But that certainly could be another driver
of this kind of effect and I mean we’re only looking also at germline single nucleotide
polymorphism. A more general application of this strategy would be to look at large -- other
kinds of somatic variance and other kinds of germline variance and that’s certainly
a great direction of additional exploration.
Male Speaker: I have a quick question. So for the GWAS peaks
have you attempted to further stratify them by looking at the heterozygotes and trying
to discern whether or not the mode of inheritance is dominant versus recessive among the GWAS
peaks and then repeating your analysis to see if there’s a difference?
Marcin Imielinski: That would be a great analysis. Unfortunately
a lot of the GWAS data that we’re using we don’t have the underlying genome types.
Those are either not accessible or require additional -- we haven’t been able to access
yet. So definitely that’s a -- that would be an interesting analysis also. Analyzing
the summary stats of some of these GWASs to examine a loci that perhaps fell under the
genome-wide significance threshold but happened to colocalize with a copy number. Somatic
copy number region would be an interesting way of finding additional germline effects,
so.
Male Speaker: Okay, thank you.
Marcin Imielinski: Thanks.