Tip:
Highlight text to annotate it
X
Nicholas Knoblaugh: So the title of my talk is "Genome-Wide Analysis
of eQTL in Breast Cancer." But really what I'm talking about today is the interaction
between genotype and phenotype, more specifically the interaction between germline genotype
and breast cancer phenotype.
So the genome-wide association study is the widely-used method for investigating this
inter-relationship between genotype and phenotype on a genomic scale. Breast cancer has been
a widely studied with the genome-wide association study, and if we look at the genome-wide association
study catalog, we see that the about 50 risk alleles which can predict risk of breast cancer.
A question you might ask is "How do we infer the mechanism of these risk alleles?" Or,
"How do these alleles lead to an increased risk of breast cancer, and what's the means
by which we can understand how variation of these loci has a functional consequence?"
Using the eQTL framework, we treat gene expression as a phenotype using gene expression profiling
methods such as RNA-seq or microarray. We can easily measure tens of thousands of features
simultaneously, and this facilitates the investigation of the functional consequences of genetic
variance of these loci.
So in our eQTL analysis consisted of three parts: our germline genotype data, our tumor
gene expression data, and our ER status data. This was from 382 TCGA invasive breast cancer
cases from Caucasian individuals. Our germline SNP data came from a Affy 6.0 SNP array and
our expression came from an Agilent 244K customary. We took the about one million loci from the
Affy 6.0 Array and we imputed it to that 8 million loci for the analysis.
So getting from one million SNPs to 8 million SNPs, like I said, is done using imputation
wherein we estimate genotype for ungenotype markers using a genotype reference panel,
in this case the one thousand genomes was the reference panel.
So we used BEAGLE to infer haplotypes for unrelated individuals and minimac to implant
the actual imputation. That got us to about 16 million SNPs. We then took the 8 million
most variant.
So here's a part of the first two principal components of our genotype data, and our 382
cases came from the red cluster you can see here.
So we represented the interaction between gene expression and genotype with a linear
model with parameters for genotype and ER status, which is our covariate. We use the
R package MatrixEQTL to implement the eQTL analysis. MatrixEQTL uses large matrix operations
to optimize the testing for every SNP-transcript pair, of which we had about 1.2 trillion,
which is a lot. And we did -- along with using ER as a covariate, we also did eQTL detection
and ER positive alone and ER negative alone.
So of the about 8 million SNPs, we found that about 140,000 of these were significant eQTL.
We also found that none of the 51 breast cancer risk alleles from the GWAS catalog were detected
as eQTL. So we see here that there does not seem to be an association between risk allele
status and eQTL status.
So another way we can think about our results is if we think about this as a bipartite graph
wherein each eQTL can be represented as a loci pointing to a quantitative trait. And
if we think about it this way we can compute the in-degree of our quantitative traits,
so how many loci per quantitative trait. The other way of thinking about it is out-degree,
so how many quantitative trait per loci. We can also look at connected regions of the
graph. So which quantity of traits are connected to one or two or three SNPS, et cetera.
So here we have the in-degree distribution of our quantitative traits. We see most of
the transcripts have one or two loci which they interact with. And a small number of
transcripts are interacting with a large number of loci. Here's the other side of that, these
are the out-degree distributions of loci, and we see the same sort of thing where a
small number of loci are interacting with a large number of transcripts.
Here the quantitative traits with the highest in-degree. We see some interesting stuff.
Prolactin is known to play a role in breast biology. MEN1 has been implicated in a variety
of cancers.
So another way we can sort of visualize this is by taking a rolling mean of eQTLs across
the genome, starting with genome one and going all the way to -- excuse me, chromosome one
-- and going all the way to the x chromosome.
So the last thing I wanted to talk about were these ER-dimorphic eQTLs. So like I said earlier
we ran the eQTL analysis in ER positive alone and ER negative alone, as well as with ER
as a covariate. So we found 32 eQTLs with an opposite sign of the interaction in the
positive and the negative. And these are the six genes which were -- the transcripts from
these eQTLs, and several of these seem to have roles with apoptosis, which I think warrants
further investigation.
So, finally, of the 1.2 trillion SNP-transcript transactions, about 375,000 eQTL were found.
We found that risk allele status really does not predict eQTL status, but the ER status
can interact with the direction of eQTL. Finally, it does seem that germline genotype can lend
insight into breast cancer phenotype.
I'd like to thank my boss, Andy Beck. And from the Harvard School of Public Health,
Aditi Hazra, Pete Kraft, John Quackenbush, and Connie Chen.
Since there's plenty of time, I'll take questions.
[applause]
Raju Kucherlapati: Thank you. Questions for Nicholas?
Is there any correlation, you know, between these SNPs that were identified and when you
do genomic DNA and SNPs?
Nicholas Knoblauch: I'm sorry?
Raju Kucherlapati: I mean, these are all obtained from expression
profiling, right?
Nicholas Knoblauch: The SNPs are from -- yeah, SNP genes, I think
[spelled phonetically].
Raju Kucherlapati: Yeah. Okay.
Male Speaker: So the eQTLs that are dimorphic between the
ER positive and ER negative, were they generally going -- like, for example, IGF1 receptor,
was that more highly expressed in the ER negative, or was there like a negative correlation or
positive correlation? Does it --
Nicholas Knoblauch: Right, between like the minor allele --
Male Speaker: Yeah, which direction was it -- versus ER
positive versus ER negative on the sets of them. Were they consistent across or --
Nicholas Knoblauch: Right, so, it seemed that most of the apoptosis-related
transcripts seems to be lower in the minor allele, in the ER negative, I believe. And
then the converse in the ER positive. Does that make sense?
Male Speaker: Okay. All right. I think I got it. Okay, thanks.
I'll talk to you later.
Raju Kucherlapati: Matthew?
Matthew Meyerson: Sure, I was very curious about your result
which is, at least naively thinking, surprising that the germline risk alleles are not associated
with eQTLs.
Nicholas Knoblauch: Right.
Matthew Meyerson: And you -- do you think -- does this suggest
alternative hypotheses for the role of these germline alleles in promoting cancer, other
than being modulators in expression?
Nicholas Knoblauch: Right. So I think that it's entirely possible
that these SNPs may lead to cancer, but then in cancer they do not predict any change of
expression; that seems to be probably the most likely result, but...
Female Speaker: I have a question - I want to understand,
did you use adjacent normal tissue or the tumor tissue to look at the gene expression?
Nicholas Knoblauch: Gene expression is from tumor tissue.
Female Speaker: So, these could be affected by the stage of
the tumor. Did you do analysis by stage?
Nicholas Knoblauch: We didn't do analysis by stage. We really
only broke it down by ER status, really, to keep sample size large. But, yeah, that certainly
can play a role.
Raju Kucherlapati: One last question.
Male Speaker: Yes, in the ER -- in the case of ER-associated
eQTL, have you checked whether separating premenopausal or post-menopausal cases could
change things, because actually estrogen levels vary before menopausal state, and that could
affect gene expression.
Nicholas Knoblauch: Yeah, no, that -- absolutely. We haven't looked
at anything really besides ER status, but looking forward to gather a number of different
covariants we could use.
Raju Kucherlapati : Thank you, thank you very much.