Tcga - Genome - Wide analysis of expression quantitative trait loci breast cancer - Nicholas knoblauch

Nicholas Knoblaugh: So the title of my talk is "Genome-Wide Analysis of eQTL in Breast Cancer." But really what I'm talking about today is the interaction between genotype and phenotype, more specifically the interaction between germline genotype and breast cancer phenotype. So the genome-wide association study is the widely-used method for investigating this inter-relationship between genotype and phenotype on a genomic scale. Breast cancer has been a widely studied with the genome-wide association study, and if we look at the genome-wide association study catalog, we see that the about 50 risk alleles which can predict risk of breast cancer. A question you might ask is "How do we infer the mechanism of these risk alleles?" Or, "How do these alleles lead to an increased risk of breast cancer, and what's the means by which we can understand how variation of these loci has a functional consequence?" Using the eQTL framework, we treat gene expression as a phenotype using gene expression profiling methods such as RNA-seq or microarray. We can easily measure tens of thousands of features simultaneously, and this facilitates the investigation of the functional consequences of genetic variance of these loci. So in our eQTL analysis consisted of three parts: our germline genotype data, our tumor gene expression data, and our ER status data. This was from 382 TCGA invasive breast cancer cases from Caucasian individuals. Our germline SNP data came from a Affy 6.0 SNP array and our expression came from an Agilent 244K customary. We took the about one million loci from the Affy 6.0 Array and we imputed it to that 8 million loci for the analysis. So getting from one million SNPs to 8 million SNPs, like I said, is done using imputation wherein we estimate genotype for ungenotype markers using a genotype reference panel, in this case the one thousand genomes was the reference panel. So we used BEAGLE to infer haplotypes for unrelated individuals and minimac to implant the actual imputation. That got us to about 16 million SNPs. We then took the 8 million most variant. So here's a part of the first two principal components of our genotype data, and our 382 cases came from the red cluster you can see here. So we represented the interaction between gene expression and genotype with a linear model with parameters for genotype and ER status, which is our covariate. We use the R package MatrixEQTL to implement the eQTL analysis. MatrixEQTL uses large matrix operations to optimize the testing for every SNP-transcript pair, of which we had about 1.2 trillion, which is a lot. And we did -- along with using ER as a covariate, we also did eQTL detection and ER positive alone and ER negative alone. So of the about 8 million SNPs, we found that about 140,000 of these were significant eQTL. We also found that none of the 51 breast cancer risk alleles from the GWAS catalog were detected as eQTL. So we see here that there does not seem to be an association between risk allele status and eQTL status. So another way we can think about our results is if we think about this as a bipartite graph wherein each eQTL can be represented as a loci pointing to a quantitative trait. And if we think about it this way we can compute the in-degree of our quantitative traits, so how many loci per quantitative trait. The other way of thinking about it is out-degree, so how many quantitative trait per loci. We can also look at connected regions of the graph. So which quantity of traits are connected to one or two or three SNPS, et cetera. So here we have the in-degree distribution of our quantitative traits. We see most of the transcripts have one or two loci which they interact with. And a small number of transcripts are interacting with a large number of loci. Here's the other side of that, these are the out-degree distributions of loci, and we see the same sort of thing where a small number of loci are interacting with a large number of transcripts. Here the quantitative traits with the highest in-degree. We see some interesting stuff. Prolactin is known to play a role in breast biology. MEN1 has been implicated in a variety of cancers. So another way we can sort of visualize this is by taking a rolling mean of eQTLs across the genome, starting with genome one and going all the way to -- excuse me, chromosome one -- and going all the way to the x chromosome. So the last thing I wanted to talk about were these ER-dimorphic eQTLs. So like I said earlier we ran the eQTL analysis in ER positive alone and ER negative alone, as well as with ER as a covariate. So we found 32 eQTLs with an opposite sign of the interaction in the positive and the negative. And these are the six genes which were -- the transcripts from these eQTLs, and several of these seem to have roles with apoptosis, which I think warrants further investigation. So, finally, of the 1.2 trillion SNP-transcript transactions, about 375,000 eQTL were found. We found that risk allele status really does not predict eQTL status, but the ER status can interact with the direction of eQTL. Finally, it does seem that germline genotype can lend insight into breast cancer phenotype. I'd like to thank my boss, Andy Beck. And from the Harvard School of Public Health, Aditi Hazra, Pete Kraft, John Quackenbush, and Connie Chen. Since there's plenty of time, I'll take questions. [applause] Raju Kucherlapati: Thank you. Questions for Nicholas? Is there any correlation, you know, between these SNPs that were identified and when you do genomic DNA and SNPs? Nicholas Knoblauch: I'm sorry? Raju Kucherlapati: I mean, these are all obtained from expression profiling, right? Nicholas Knoblauch: The SNPs are from -- yeah, SNP genes, I think [spelled phonetically]. Raju Kucherlapati: Yeah. Okay. Male Speaker: So the eQTLs that are dimorphic between the ER positive and ER negative, were they generally going -- like, for example, IGF1 receptor, was that more highly expressed in the ER negative, or was there like a negative correlation or positive correlation? Does it -- Nicholas Knoblauch: Right, between like the minor allele -- Male Speaker: Yeah, which direction was it -- versus ER positive versus ER negative on the sets of them. Were they consistent across or -- Nicholas Knoblauch: Right, so, it seemed that most of the apoptosis-related transcripts seems to be lower in the minor allele, in the ER negative, I believe. And then the converse in the ER positive. Does that make sense? Male Speaker: Okay. All right. I think I got it. Okay, thanks. I'll talk to you later. Raju Kucherlapati: Matthew? Matthew Meyerson: Sure, I was very curious about your result which is, at least naively thinking, surprising that the germline risk alleles are not associated with eQTLs. Nicholas Knoblauch: Right. Matthew Meyerson: And you -- do you think -- does this suggest alternative hypotheses for the role of these germline alleles in promoting cancer, other than being modulators in expression? Nicholas Knoblauch: Right. So I think that it's entirely possible that these SNPs may lead to cancer, but then in cancer they do not predict any change of expression; that seems to be probably the most likely result, but... Female Speaker: I have a question - I want to understand, did you use adjacent normal tissue or the tumor tissue to look at the gene expression? Nicholas Knoblauch: Gene expression is from tumor tissue. Female Speaker: So, these could be affected by the stage of the tumor. Did you do analysis by stage? Nicholas Knoblauch: We didn't do analysis by stage. We really only broke it down by ER status, really, to keep sample size large. But, yeah, that certainly can play a role. Raju Kucherlapati: One last question. Male Speaker: Yes, in the ER -- in the case of ER-associated eQTL, have you checked whether separating premenopausal or post-menopausal cases could change things, because actually estrogen levels vary before menopausal state, and that could affect gene expression. Nicholas Knoblauch: Yeah, no, that -- absolutely. We haven't looked at anything really besides ER status, but looking forward to gather a number of different covariants we could use. Raju Kucherlapati : Thank you, thank you very much.