Tip:
Highlight text to annotate it
X
Janet Jansson: First I'd like to thank Lita Proctor for the
invitation, and also the organizing committee. This is great to have a chance to present
our research, and also to have the opportunity to give my own opinion about some future directions
in the area. So I'll be discussing multi-omics technologies, and giving a few examples of
how we use that in our own research.
But first I'd like to give a little bit of background. We already saw this slide earlier
this morning. I think we've seen it a couple of times already. And I think it's important
to think that the information we get from these DNA-based studies is only part of the
way towards where we want to go. So if we look at these particular graphs where we can
see the16S data that shows different body sites and the variation in the microbial community
composition between different individuals; so, just a reminder that these are different
persons, different body sites. You can see there's a considerable variability in the
composition of the communities, whereas the functional genes are relatively constant at
this very gross level of examination of function. So this is DNA-based cog level examination
of function, and things look very similar.
Sometimes, however, composition does matter. I did take out -- away a few slides where
I was going to show this, but I know Alex Khoruts is going to give a talk about this
later, so I won't talk about it. But we do have examples where we know, for example,
through fecal transplantation that the community composition is very important for health,
and so I'll let Alex talk about that.
So this is actually one of my own photographs. I was just in Greenland a couple of weeks
ago, and this is -- it's great to have your own iceberg to show. But what we know is just
really the tip of the iceberg. We know information, and quite a bit of information, about the
community composition, if we specifically focus on the gut environment. We know a lot
about which microorganisms are there, which are the most dominant species, and how variable
it can be across human populations. So this knowledge is starting to really consolidate.
What we don't know is what lies underneath the water. We don't really know what these
organisms are doing, just at a very basic level. Most of the microbes still, you know,
they're not very well characterized. They're many that have not been cultivated.
So in order to understand function, we do have access to omics tools nowadays. And for
those of you that aren't familiar with these technologies, I kind of refer to this as the
omics pipeline. So, depending on the information, you can get different types of information
about expression. So, at the very beginning of the pipeline, this is where we really focused
a lot of our work so far, and it's on the composition. So using 16S sequencing, for
example, to understand the microbial community composition, so I would call that the microbiome.
The next level is the metagenome, so sequencing not only the phylogenetic genes, but also
the functional genes, so we know the gene composition. The next step would be to look
at the RNA, so which of those genes are expressed. Then the proteins of those expressed genes,
which ones are translated and form proteins. And then, finally, the metabolites or the
metabolome. And the metabolites are important for carrying out a lot of the reactions in
our bodies.
So I'd like to think a bit about these different kinds of omics tools. So if we think genomics
or metagenomics, this is information about gene content that has the potential for being
expressed. So just because you see a functional gene, that doesn't mean it's being expressed.
It has the potential to be expressed. And what is particularly problematic is that the
DNA that you're looking at could be extracted from dead cells or from dormant cells. So
the cells don't even need to be active or even alive in order to get DNA. I'm not saying
that this information isn't important, but it's also important to keep in mind that this
is a limitation. If we go to the next level and look at RNA, that provides you with a
snapshot of activity at that moment in time when the RNA is extracted. It's also very
important to understand that that's the expression profile at that -- with those given set of
circumstances and conditions. And, as we do know, cells are experiencing a lot of different
kinds of regulation, and so not all genes are going to be transcribed. But at least
you get information about activity at that moment in time.
If you do look at the proteins, or the metaproteome, which would be the community proteins or complementor
proteins, that provides evidence, then, that this protein has passed all of the regulatory
steps at the RNA level, and also has been translated in producing this protein. A caveat
would be with that is you don't know if the protein is actually active in all cases if
you do detect it. However, the genes must have been transcribed and translated to produce
a protein. And I would say that that's better for assessment of microbial function, is to
look at the proteins. This does require an annotated gene database, so then you do need
the metagenome information anyhow because, otherwise, it's not possible to identify what
the proteins are.
So what do we know from model microorganisms? So I'm just going to kind of go back a little
bit here. This is an E. coli cell. And a lot of work has been done on systems biology of
single organisms. And so this is a reference from Corbin from PNAS in 2003, where they
detected a positive relationship between protein abundance and transcript abundance during
exponential growth of E. coli. So I think this is a very, kind of, it's a nice confirmation
that the kind of information you get from RNA and protein is very consistent, and you
can use both kinds of data sets, just exchange them. However, if you look at the single cell
level, and this is a reference from 2010 from Taniguchi, they found no correlation between
messenger RNA and protein levels in single E. coli cells at a particular point in time.
So I think that even with a single organism, we're still really at the beginning of understanding,
at a systems level, how we can use this different kinds of information to understand function.
Now, there hasn't been that much done in the microbiome yet using these kinds of tools,
but there has been quite a bit done in the marine environment. And what can we learn
from other ecosystems? Well, this is a relatively -- or a very recent paper from Mary Ann Moran,
and what she did was she looked at the amount of macromolecules in a single milliliter of
seawater. And here you can see the amount of genes, transcripts, and proteins from the
same milliliter of seawater. So you can see that the abundances vary dramatically, and
this is a log scale. So this is a quote from Mary Ann Moran. She said that "the most important
factor responsible for the poor messenger RNA yields compared to the protein correlations
is the long half-life of proteins relative to messenger RNAs." And I like it that she
actually did these calculations that a typical bacterial protein half-life is about 20 hours,
which is about two orders of magnitude longer than a messenger RNA half-life. That means
that most proteins persist in a bacterial cell long after the messenger RNA that encoded
them have been degraded.
So, this is important also to keep in mind if you're using a different omics method.
And again, I think this is a reason that I like proteins because at least they're more
-- they're going to give you more information about the history of activity of those cells.
So this was the first publication using this relatively new technology to look at fecal
sample, so to understand what the protein complement was in human fecal samples. And
I have to say that this technology has really been a major revolution to the use of proteomics
for microbiome studies, because, up until the point, everything was based on 2D gel
separations and extraction of spots and sequencing nodes. However, this is a shotgun approach,
so for the human, the sample is taken. In this case, we use differential centrifugation
to extract the bacterial cells. The cells are lysed, the protein is extracted, and then
directly digested with trypsin into fragments. Those fragments are separated by 2D-LC-MS/MS,
so it's completely gel-free. And then collected on this colon, and using electrospray into
very high mass accuracy mass spec. Now there are even better mass specs for this purpose
now, but this was in 2009. So then you get these spectra, and those need to searched
against your databases. And this is where the metagenome data and also reference genome
data is extremely important because you need to have those annotated genes, and you rely
on exact matches to understand what your proteins are.
So those that can be identified, you can then predict their functions. But in the case of
hypothetical proteins, it's also possible to look at the sequence and to be able to
do a hypothetical protein identification. So this is an average of the metagenomes that
we had available we had at the time. So just looking at average cog categories of function
from the metagenome level and comparing that to the average cog categories for metaproteomes.
And what we can see is, if you look at the metagenomes, it's a relatively even distribution
of cog categories. But if you look at the proteomes, it's really enriched in certain
functions. For example, translation, energy production, and carbohydrate metabolism, and
these are functions you would expect to be dominating in the gut environment because
they have to, for example, metabolize carbohydrates. So this is a good sign that the information
we're getting from the proteome is more indicative of the function that is actually being carried
out in that system.
Another nice thing about doing the proteome is that, at first, I was -- I always say this,
but as a microbiologist, I thought that human proteins were contaminants, but they actually
turned out very useful because you can get a study of the microbiome interaction with
the host by looking at the human proteins. So we get the human proteins for free, at
least the proteins that are attached to the bacterial cells because we enrich the bacteria.
And when we look at the human proteins, the largest groups of proteins are usually digested
enzymes, and those involved in cell adhesion. However, we can see these very interesting
proteins, including antimicrobial peptides. And this is just an example of one protein
that was identified early on. It's a DMBT1, which is thought to play a role in cellular
immune response. These little blue bars just show the peptides that were lined up along
this protein, or the gene for the protein.
So that's proteomics. What about metabolites? Well, metabolites are the ultimate proof of
processes and pathways that have occurred. And it's really the final signature of metabolic
processes. The thing that's different about looking at the metabolites compared to the
other kinds of omics is that it's not so easy to key it to a particular organism. You don't
have that way to track back to a gene. Instead, you're dependent on massive data correlations.
So metabolomics is, of course, very important. When you consume food, the food is digested.
If you can have rather insoluble carbohydrates, or more soluble polysaccharides, oligosaccharides,
and depending on the organisms that encounter those in the intestine, you're going to have
different kinds of metabolites that are produced. And you can have primary degraders and also
hydrogen utilizers that are consuming the hydrogen that's produced, and eventually the
metabolites, some of them are used by the community, but some of them are actually taken
into the systemic system and can have impacts on the body. So that's a background about
the omics technologies, and I will give you a couple of examples of projects that we've
carried out where we used multi-omics approaches.
The first is for IBD cohorts. We have a twin cohort. And also a longitudinal study where
we looked at microbiomes, metagenomes, metaproteomes, and metabolomes. And the second is a dietary
study, which is looking at microbiomes, metagenomes, metaproteomes, and metabolomes. This was an
earlier study, so it was using the 454-sequencing platform, whereas we migrated over to Illumina
for the second.
So first I'll show you the example for inflammatory bowel disease, and this is a disease that
has many different consequences for the body, and it has a very complex ideology. But one
thing that is of interest for this meeting is that there's often been reported a dysbiosis,
or an altered microbiome, in individuals that have inflammatory bowel disease compared to
healthy persons. So this is just an example of publications that show -- that have reported
dysbiosis. There are many more papers than this, so these are just a few examples. I
know you can't read it, but that's fine. It just lists a lot of different bacterial species
that are either reported to be more prevalent, higher in individuals that have inflammatory
bowel disease, or lower in individuals with inflammatory bowel disease.
And so just -- I summarized some of the key points here from the publications, and one
is that dysbiosis in IBD is characterized by an overall decreased diversity of bacteria
in the gut compared to healthy people, and typically, a greater relative abundance of
proteobacteria, such as enterobacteriaceae. And another important point is there's often
a loss of beneficial microbes, such as butyrate producers and other producers of short-chain
fatty acids.
So the study we did was to study twins. The reason for doing that, and I think we heard
a beautiful example from Ruth Ley this morning, is that you have these genetically matched
individuals, and therefore you can discount a lot of the confusing impact of the genetics
in early childhood exposures when you're looking at dysbiosis. So it's a Swedish twin cohort,
twin cohort, 46 twin pairs. They included healthy twins, those that had ulcerative colitis;
both those that were discordant for the disease, so the healthy one is a smiley face and the
sick one is not smiling, and then those were concordant, so both were sick, and the same
for Crohn's disease. We had discordant twin pairs and concordant pairs.
So these were all the tools that were used on the same samples. So it was the same fecal
sample, and we used everything on the same samples. It included, also, biopsies taken
from five locations. So we got a lot of information from these individuals, including all of the
different parts of the pipeline. For the microbiome, at that time, we first started with a fingerprinting
method called Terra AFLP. We used QPCR, but then we moved to pyrotag sequencing, the metagenome
pyrotag sequencing. And then did the shotgun proteomics and metabolomics.
So this is a Terra AFLP profile survey of 90 different children. The reason I like to
show this is that just was one of our first indications that every single one of these
children had an individual fingerprint. That's the only reason I'm showing this here. But
when we looked at the identical twins, this was amazing to me, that they were so similar,
and these were adults that lived apart for decades. And their Terra AFLP profiles of
their fecal samples were very, very similar. And this also supports Ley -- Ruth Ley was
talking about earlier today. By contrast, if we look at these discordant twin pairs,
their fecal microbiomes were very different. So this, again, is an indication that there
is a dysbiosis. There is something different in these individuals.
So if we look at this pipeline again, and this is just looking data from one pair of
healthy identical twins, and the correlations between the two individuals in the twin pair,
we can see at the microbiome level, we have a very high correlation between the OTUs present,
0.9 r-squared. If we look at the proteome, we start to get more of a separation, some
more individuality, r-squared of 0.396. And at the metabolome, even more individualized,
r-squared of 0.301. So this means that at -- as you go through this pipeline, you start
to get more and more individual characteristics. It gets to be more discriminating.
So when we looked at the 16S data, we could see very distinct clusters. So this is all
of the patients that have inflammation in the ileum, which I'll call ileal CD, and sometimes
I abbreviate ICD. They clustered separately from those that had inflammation in the colon,
so colonic CD, which I sometimes abbreviate CCD, and from those that were healthy, which
are in green, and the blue had all ulcerative colitis. Now, this grouping was much more
significant than this twin pair similarities. So even the healthy twin pair's similarities,
okay, the healthy twins did cluster together, but disease was the major clustering factor
over zygosity. So once we saw this data, we focused more on the disease comparisons.
Now, the reasons we looked at the biopsies was to study the mucosa-associated microbiota.
And here we found that these are just different locations ileal and distal colon, but what
we basically saw was that we, again, saw, when we included the biopsies and fecal samples,
we had this distinction between ileal Crohn's disease and those that were healthy and had
colonic Crohn's disease. And also the individual biopsies and fecal samples clustered together.
So when we look at the composition at a phylum level, from -- these were individuals that
were healthy or had colonic Crohn's disease, just averages, ileal biopsies and fecal samples,
we can see that there are differences in the biopsies compared to the fecal samples. If
you look at the blue, which is lachnospiraceae, you can see that it's much greater in the
biopsies of the healthy, which is H here, compared to the fecal samples. So there are
differences. But still, when you look at one person, their biopsies cluster with their
fecal samples.
And so we were interested in seeing, well, what -- which particular organisms were higher
or lower in abundance. Now, I already told you that some of these butyrate producers
are known to be more abundant in healthy compared to those with some of the IBD phenotypes.
And definitely with ileal Crohn's disease, this organism is basically absent in the biopsies,
in either the ileum, or the colon, or in the fecal samples, compared to healthy and those
with colonic Crohn's disease. So here again we see that separation between these different
Crohn's disease phenotypes. Whereas other organisms were more abundant, and this is
an example of E. coli that -- these are different biopsy locations, the five different locations
were much more prevalent in Crohn's disease compared to healthy. And we found one ruminococcus
albus that was higher in the biopsies in those with ileal Crohn's disease, compared to the
other locations and to healthy.
So a gap -- that was looking at single time point studies. A gap is really to look at
this in a longitudinal scale. So previous studies have focused only on these single
time points. This really provides limited insight, especially for something like IBD,
where you can have a flare up, remission, and different things going on over time, drug
therapy. IBD has active acquiescent disease states. Therefore, it's really important to
have a temporal study to properly assess IBD.
So, more recently, we did a longitudinal study with 139 subjects, and up to 10 time points
were collected for these individuals every three months. And during that time we have,
from our clinical collaborators, information about remission, drug therapy, et cetera.
So what we do find when we look at all of this data, we still get this major clustering
based on disease. It might be a little bit hard to see. But this is ileal Crohn's disease
here in purple. Ulcerative colitis is the light blue, colonic Crohn's disease in darker
blue, and healthy in green. So even when all of these time points are taken into account,
we still get this clustering, but what is -- this is the super interesting thing here,
if I can get it to work.
So Rob Knight's group did this for me. This is looking, then, at the trajectory of these
individuals over time. And so if you follow the orange and the yellow, so the healthy
and the ulcerative colitis, they are starting to form a cluster here on one side, whereas
the different IBD phenotypes are varying dramatically. They're jumping back and forth in this space
over time. And so these -- each of these segments represents a three-month sampling period.
And here you can see another healthy person is still continuing in this plane. So when
it is finished rotating, here you can see the healthy and the ulcerative colitis are
almost as flat as a pancake on that plane. This is where they rotate. But the IBDs are
-- they exhibited different space. So I think this is really important to understand what's
going on there.
If you look at individual temporal dynamics, you can see these -- this is a healthy person.
There is some variability. For example, there is a balloon in this bacteroides, I think,
can't really see the color. But there's some difference over time, but not nearly what
you see when you look at the IBD phenotypes. So here's an example where there's a real
enrichment of enterobacter, and then the bacteroidaceae come in, and then lachnospiraceae, so it's
a lot more dynamic. And this is -- it's individual, though. You have a different pattern for each
person.
So what we're currently doing is the metagenomes and metaproteomes for five of these patients
at five time points. And so I don't have that data yet, but that is ongoing. So for -- I
have to go faster. For our HMP Demonstration Project, we examined a subset of these pairs
that had matched metagenomes and metaproteomes. And these are just showing the proteome similarities
in the twin pairs, so we see a lot of similarity with the healthy twin pairs, but -- and with
the colonic, but much less with the discordant twin pairs. And the metaproteomes, they cluster
according to disease phenotype, and here, you can see that here as well. This is ileal
Crohn's disease, healthy, and colonic Crohn's disease. And when you look at individual pathways,
so this is the lowest phylogenetic level where we can identify the proteins and what they're
assigned to, we can see that all of these pathways are less abundant in ileal Crohn's
disease at the protein level. But there are some proteins, especially for outer membrane
proteins, that are more abundant in Crohn's disease. That's just saying what I just said.
If we look at the human proteins that we find in healthy, more proteins that function in
mucosal integrity, and also, in the ileal Crohn's disease, a higher abundance of proteins
involved in inflammatory response, this human alpha defense and pancreatic enzymes, so we
think that's demonstrating a defective epithelial or a leaky gut symptom. Looking at the metabolites,
so, again, just to emphasize, these are from the same samples, so the pellet was sent for
proteome analysis and the fecal water was sent to Germany for mass spec analysis of
the metabolites. Again, the same pattern, we see the clustering. Here, red is colonic
Crohn's disease, blue is ileal Crohn's disease, and green is healthy. And so we get this very
distinct clustering. And this is just showing some of the differentiating metabolites.
We had so many differentiating metabolites. Over almost 8,000 metabolites significantly
differed between these, and we had over 18,000 metabolites and most of them are unidentified.
One example is bioessence biosynthesis. That was higher in Crohn's disease. And we think
this may also be due to inhibition of bioacid absorption by inflammation.
So I just want to mention this study. I won't have time to really go through it, but this
is an ongoing dietary study funded by General Mills and NIH NIDDK. And we're looking at
different high carbohydrate/low carbohydrate diets in a crossover study. And this is just
showing the study and the different kinds of analysis. One thing we find with a resistant
starch diet is that we get -- we do have more -- let's see -- with a high resistant starch
diet, we do get the lower insulin resistance. But these are different patients. Now, we're
interested in differences in the microbiome, and so with these different arms of the diet,
when you do the crossover, there is definitely a significant difference between the high
carb and low carb, and also with the high resistant starch and low resistant starch
in both branches. And we find our favorite fecal bacterium is enriched in the high resistance
starch diet. And these are metabolites that were detected, and we do find the metabolites
separate according to high resistant starch diet.
Okay, I'm going to have to finish here. So I need to mention where we should go from
here. So I think the current grand challenge is how to analyze all of this multi-omics
big data. I'm so thankful there's the call coming out for big data analysis because this
is really an enormous amount of data, and we generate it and we want to correlate it.
And what we want to avoid is this, interpreting the hairball, because often I get the data
back and it -- this is what it looks like. That's an example, an anonymous hairball.
So I think that what we need is more multi-disciplinary collaborations with microbial ecologists,
clinicians, bioinformaticians, biostatisticians, to be able to really dig down in this data.
We have a huge resource of data, but we need to be able to analyze it.
And I'd like to conclude with acknowledgements. Thank you very much.
[applause]
Female Speaker: We have time for one question. And -- no?
If not, then we can move on to our next speaker. We thank you, Dr. Janet Jansson, for an excellent
presentation, and we're moving on to our next speaker, Dr. Dan Rudolf Littman, from NYU
and the Skirball Institute, and he will talk about Approaches for Host Immune and Microbiome
Studies.