Tip:
Highlight text to annotate it
X
Evert van den Broek: Thanks for the opportunity to give this presentation
on part of our work. And I'm Evert van den Broek and I work at the Pathology Department
of the VU University Medical Center in Amsterdam in the Netherlands. And my supervisors are
Gerrit Meijer and Raymond Fijneman. And this presentation is about structural variant detection
in colorectal cancer. And this project is supported by CTMM Projects -- that stands
for Center for Translational and Molecular Medicine -- and by the Cancer Center Amsterdam,
VUMCCCA.
Colorectal cancer is a major health care problem with an incidence worldwide of 1.2 million
patients each year, and the incidence in the U.S. is almost 150,000 patients. And there
is an inverse relation between the state of the disease and the survival rate. And in
total, approximately 40 percent of all colorectal cancer patients will die due to metastatic
colorectal cancer.
In our group we are focusing on biomarker discovery, and this is a picture of an adenoma
to carcinoma progression, and we need diagnostic biomarkers for early colorectal cancer detection.
And we need prognostic biomarkers and predictive biomarkers, and I will focus on the prognostic
biomarkers.
Eighty-five percent of all colorectal cancers exhibit chromosomal instability resulting
in gains and losses of chromosomal segments. And this is inscribed -- spectral karyotyping
of a higher abundant colorectal cancer cell line. And you can clearly see both the numerical
and the structural variant in this plot. And I will show you, for example, chromosome two
in which there are more pieces of chromosomes -- pieces of DNA -- present in this tumor
than what you're expecting in a normal situation, and there is a translocation between chromosome
10 -- this is chromosome 10 -- and chromosome 23. The red part belongs to chromosome 10.
CAIRO and CAIRO2 studies were performed in the Netherlands -- Phase III clinical studies
in the Netherlands -- and, in total, 1,575 patients were included. And this study was
focusing on chemotherapy and metastatic colorectal cancer. And the CAIRO study is published in
the Lancet, and the CAIRO2 study is published in the New England Journal of Medicine. And
from this patient cohort, we have DNA from 356 patients. This is a representative group.
And the DNA is derived from the primary tumor and the matched normal tissue, and the DNA
is isolated from FFPE material.
This whole set of 356 DNA samples were performed on Agilent 180k CGH array comparative genomic
hybridization. And after segmentation and calling, we are able to find the copy number
changes, the numerical aberrations. And I will explain this in the next few slides.
This is a profile of a tumor, and on the y-axis you can see how the log2 ratio of the tumor
DNA compares to the normal DNA, and on the x-axis are all the chromosomes. And each black
dot represents a probe, and the colored lines are the segment failures.
And we use the CGH call package to define the gains and the losses, and the green parts
mean that there is a gain in the tumor compared to normal, and the red, that there is a loss.
So this is the aim of my project: to identify the recurrent somatic structural genomic variants
that cause colorectal cancer.
And we use the CGH profiles from these last cohort of 356 CAIRO samples, and after segmentation,
we identified the breakpoint and merged the breakpoints per gene. And we ended up with
a list of candidates' genes potentially involved in structural variants.
And this is the CGH plot again. And the definition of a breakpoint is here. Breakpoints are defined
by the start position of the first -- start position of the first probe of each segment,
and that suggests an underlying chromosomal break that could disrupt the normal architecture
and the normal function of a gene.
And we found, in total, 5,737 genes with one or more breakpoints and 482 genes of our candidate
genes identified with a recurrent breakpoint at a full discovery rate less than 0.1. And
here are, in this bar graph, the most affected -- 50 most affected candidates genes, and
the in the y-axis, the amount of affected samples in array CGH, and MACROD2 is the gene
that is most prominent. It is present in about 40 percent of all samples.
And we have clinical data available, so we asked the question, whether this effect -- did
it have an effect on survival? And this is a Kaplan-Meier plot, and the red line represents
the samples lacking a breakpoint in MACROD2, and the blue lines are the patients with a
breakpoint in MACROD2.
So we have our candidate list, but there are some important limitations using array CGH
data. And the first is about the probed entity. So the average distance between the probes
is 17kb, so the breakpoint location is only an estimation. And we don't have insight in
the structure of the DNA, so we don't know which parts unstitch [spelled phonetically]
to what. And old copy number neutral events will be missed -- the balanced events.
So we need candidate validation, and next generation sequencing can help with these
problems. And we used The Cancer Genome Atlas colorectal cancer cell lines of samples -- sorry
-- which are sequenced -- whole genome sequencing, paired sequencing, and we only used the tumor
normal sets.
And we developed our own algorithm -- in-depth algorithm for structural variant detection
which is candidate-driven. And we selected genes lacking a breakpoint as a negative control.
And our algorithm is mainly based on a read-pair approach, and the criteria for discordance
are listed here: based on the location of the reads, the bridge length, and the orientation
of the reads. And these are the discordant pair types, and translocation of the reads
-- the main [spelled phonetically] reads are aligned on different chromosomes. And an insertion
and deletion based on the bridge length, and an inversion and an eversion based on the
orientation of the reads. And a single mapped read could indicate that there is a breakpoint.
And we combined this read-pair approach with a read-depth analysis, and we defined the
breakpoint location using the soft-lipped part and the mixed [spelled phonetically]
part of the reach, and at least we determined the tumor-specific events.
And this is an example of translocation MACROD2. The colored reads indicate a translocation
DPs, and here you can see the fusion partner. And an additional evidence is that there is
a clear breakpoint in both parts of the event, in the gene itself and in the fusion gene.
And overlapping reads with the same DP type were grouped together, and the distribution
is in this pie chart. And the biggest part of all DP groups are the deletions. And these
are the eversions, the inversions, the insertions, and 8 percent of all DPs are translocation.
That is data over all samples over all candidates. And this is preliminary data. And I will focus
in this presentation on translocation. And we found in our candidates' genes a five-fold
higher number of translocation-DP groups compared to our control genes.
And this is the distribution of the translocation-DP groups on the y-axis of all the candidates'
genes on the x-axis. And focusing on zooming in on the first 20, we see that MACROD2 is,
again, the most prominent one.
And we plot these data together in this plot. On the y-axis, the frequency of the translocation-DP
groups, and on the x-axis, the frequency of affected samples in array CGH breakpoint analysis.
And MACROD2 is on the upper right part of this graph, and there are some genes that
correlate very nicely, and there are other clouds, and that is work in progress. So we
want to know what these candidates are. These are all the candidates' genes.
To conclude, we identified 482 candidate genes with recurrent breakpoints in a cohort of
356 colorectal cancer samples based on array CGH breakpoint analysis. And the TCGA provide
an essential colorectal cancer reference data set to validate our candidates' genes to validate
the structural variants in the candidates' breakpoints. And identification of breakpoints
based on array CGH is correlated with structural variant detection in TCGA's data. And further
studies will be performed to investigate clinical and functional significance of validated candidate
genes.
Thanks, that's my presentation.
[applause]
John Weinstein: And on time besides. [laughs] Questions, yes?
Female Speaker: Yes, this is Angela from Harvard. My first
question is, did you use the low-pass polygenome samples for your validation? And second, can
you comment on the presence of how many TCF fusions you found by your method? Because
we found them in the marker paper, and also the Meyerson group previously published TCF7
fusions.
Evert van den Broek: We are digging into data at this moment, and
we used indeed the low core fritz [spelled phonetically] samples and the most challenging
part of using the low core fritz data is the statistical analysis --
Female Speaker: Right.
Evert van den Broek: -- and that is one of the problems. So we
tried to find the recurrent event over the whole sample set. So that's what we tried
to do. And we have our candidate list based on breakpoint analysis. So we can -- we can
be very sensitive in our computational methods. And we didn't find the fusion gene you mentioned.
We only have the three high core fritz [spelled phonetically] samples, so I don't know if
one of the samples should harbor this fusion gene.
John Weinstein: Last quick question rule, Bruhl [spelled phonetically],
and quick answer.
Male Speaker: All right, I thought it was a great talk.
I wondered if you had looked at the RNAC data from these TCGA colorectal samples to see
the effect on transcripts?
Evert van den Broek: That is the next step, so we haven't done
that at this moment, but we will do that, yeah.
John Weinstein: Thanks. That's an excellent example of how
we hope increasingly the TCGA data will be used in the context of other sorts of studies,
including ones which are designed for clinical answers to clinical questions.