Tip:
Highlight text to annotate it
X
This presentation will describe a new web-based tool to group genes by their expression profile.
A gene expression profile shows the mRNA concentration levels that exist in a cell or tissue.
This measure is done with the help of microarray chips. One microarray can measure the concentration
levels of thousand of previously known mRNA chains.
In the figure on the right each column is the output of a microarray and each row shows
the expresion level of a probeset in the different microarrays.
A probeset is a collection of probes desinged to interrogate given a mRNA sequence.
Some probesets can match different sequences of the same gene, others can match a sequence
that appears in more than one gene.
We are interested in finding genes that show a similar variation of its expression on different
external conditions.
A gene will be underexpressed under certain condition if it presents a lower expression
level for that condition. In a similar way, a gene will be overexpressed
if it presents a higher expression level for a given condition.
There are many algorithms that can be used to group genes that show similar levels of
expression. In this presentation we will focus on a new
biclustering algorithm called K-Formal Concept Analysis.
Formal Concept Analysis groups objects that share some common attributes.
It is possible to represent these formal concepts in a lattice.
The lattice on the right is a Hasse diagram of the table presented on the left.
The colored boxes on each side represent the same concepts.
K-Formal Concept Analysis is a generalization of Formal Concept Analysis.
It starts with a exploration that sweeps over all possible thresholds in the max-plus and
min-plus semirings. Each threshold will give origin to one lattice.
The maxplus semiring will group the underexpressed genes and the minplus the overexpressed.
We have created a web page that shows how this biclustering algorithm works
This is the workflow of an analysis.
First, the data for each experiment is uploaded to the web.
Then some preprocessing is done in order to fit the raw data into the clustering algorithm.
After this step the K-Formal Concept Analysis algorithm is run, and each of the concepts
produced at the output will represent a cluster.
The input data will be a CEL Affymetrix file. Currently the web supports these four microarrays.
At the Gene Expression Omnibus database there are thousands of datasets.
We will search for a experiment which contains the gene expression of roots and shoots from
an Arabidopsis Thaliana grown on selenate. There are 8 CEL files which can be downloaded
here.
Once we have the files we can go to the web page and start the analysis.
First we must login to upload new data
Now click on "Upload new data" and a new view will appear.
Here you can select the name of the dataset, give a brief description and select the proper
Affymetrix microarray chip.
Then you can upload the CEL files and click "Next" when finished.
In this example we will upload the 8 CEL files that we downloaded a moment ago.
Now we must select a preprocessor to apply to the raw data.
First we enter the name that we want to give to this preprocessor.
Then we select among the four different types of normalization, for this example
we choose the default, the aritmetic mean.
Then we write a descriptive name associated with each uploaded file.
These 8 names are the 8 different cases that we read from the Gene Expression Omnibus web.
There are 4 files for the root tissues, 2 control and 2 with selenium, and other
4 shoot tissues 2 control and 2 with selenium. This view shows the preprocessor details.
The first line shows the name we gave it previously.
Below is the preprocessor type, this means that we divide each probeset by its mean value.
The Data size indicates the dimensions of the matrix uploaded, the number of probesets
and the number of experimens.
Bellow we can see the representation of the expression level.
which represents the input for the K-Formal Concept Analysis algorithm.
We can access by clicking on "Explore".
The results appear in this view.
The first graph shows the number of concepts in function of the threshold
for the maxplus and minplus semirings.
Scrolling down we can see the concept lattices.
The slider on the left fixes a threshold and the corresponding conceptual lattice
is plotted on the canvas.
We will start in the maxplus domain, which corresponds to the underexpressed genes, and
the minimun threshold. We can see that there are only three circles,
each circle is a formal concept with a radius proportional to the number of probesets
it has.
The top concept corresponds to the objects that do not have any attributte.
By contrast the bottom formal concept has all the attributes.
If we change the threhold new concepts appear.
Three concepts deserve a special mention in this lattice.
This concept has the probesets that are underexpressed in the root when Selenium is present.
If we select one probeset a brief description will appear on the right.
In this description there are links to other web resources like gene ontology where more
information can be consulted.
Other two interesting concepts are the one corresponding to the probesets underexpressed
in the root, no matter if selenium were present or not,
and the other is the one corresponding to the probesets in the shoots.
We can change the threshold and see how the concepts change the number of probesets they
have.
This has been a brief description of a gene biclustering tool.
You can find more information on this topic in the following references.