Tip:
Highlight text to annotate it
X
The purpose of this video is to show how to perform multiple factor analysis with FactoMineR
and especially how to improve graphics
that are obtained with FactoMineR. I'll first load the library
and I'll perform a factor analysis multiple on the example data set
of MFA, so I'm gonna do ?MFA
to have the help of the function.
On the bottom of the help I can use
the example on the data wine.
In a first step, I load the dataset,
and then perform the MFA.
I quickly explain the lines of code. I first load the dataset wines.
Then I did the MFA on the dataset wine,
I create groups of variables. The MFA will balance the influence of
each group of variables in the construction of the dimensions, in
the construction of the first dimension.
I have a first group of variables consisting of the two first variables
then the next 5 generate another group
then the next 3, then next 10, the next 9 and finally the last 2.
The nature of the variables of a group: the first group of variables are
categorical variables therefore qualitative variables, Then there are quantitative variables
with an "s" here that means we will "scale"
that is to say reduce the variables.
Then ncp = 5 means we will
have the results for five dimensions.
Finally I can name the groups of variables:
a variable on the origin of wines
(label, type of soil)
olfactive description variables,
olfactive variables after shaking, taste variables,
and overall liking variables.
And then the states of the groups of variables 1 and 6, the origin and
overall variables,
are supplementary groups and thus will not participate to the construction of dimensions.
Once MFA is performed, results are in the object res
and I can have a summary
of this object
by summary(res)
I'll extend a little the window to see all the results
summary(res) first recall the command line;
then a table of eigenvalues
and the percentages of inertia associated with each dimension.
So the first dimension
recovers 49% of the information, 49% of the inertia,
and the second dimension 19%.
Then results on the groups of variables, first the active groups
first active with the coordinates of the
groups, the contributions of each groups in the construction of
the first dimension and the quality of representation on the first dimension.
Then, the results on the second dimension: coordinates,
contributions and cosine
and then the 3rd dimension. One can have
results on, by default 3 dimensions, but one may
have results on 4, 5 or 2 dimensions for instance using the argument ncp.
Then the results of the additional groups with
coordinates and the squared cosines. There was no contribution since these
groups do not contribute to the construction of the dimensions.
Then we have the results of the individuals,
on the first individuals, by default the first 10.
You can see the results of all individuals
using the argument nbelements = Inf
for infinity. That is to say, we will have results for all
elements, so all individuals all variables, etc.
Again we have the coordinates contributions and squared cosines
first on the dimension 1, 2 and then 3.
We have the results of the quantitative active variables, so the results
for the first ten variables with again coordinates, contributions and cos².
For the supplementary quantitative variables, we have just
coordinates and cos²; Again these are variables
which have not contributed.
There is no qualitative variables here that are active so we have the results
for supplementary qualitative variables more particularly for categories
of the supplementary qualitative variables. So we have the coordinates,
the cosine, no contribution of course and a test value.
The test value is often between -1.96 and 1.96. A value outside this interval
means that the coordinate is significantly different from 0.
For instance, the coordinate of the category reference
is significantly different from 0 on the first dimension;
and greater than 0 on the first dimension.
These are the main results of the MFA.
So then, many graphs appeared earlier.
We first have a graph of the groups,
with active groups in brown and in green supplementary groups.
Then a graph with partial axes.
For each group of variables we performed an analysis: for the quantitative variables
a PCA was performed and we project
the dimensions of PCA as supplementary information.
So for instance, for the group of visual, the first dimension of
visual is closely linked
to the first dimension of the MFA.
The second visual dimension is less linked.
For groups of qualitative variables
as the origin here
we perform a MCA and dimensions of MCA are projected.
Here is a graph with only the individuals,
a graph with the variables,
and variables are colored based on their group.
So a same color for a same group.
One may have on the individuals' graph,
some individuals who have partial points. So here we want individuals
who have very different partial points therefore
a within inertia very large.
The two individuals who have the largest within inertia
have partial points and the two individuals who have the smallest within inertia.
And there are graphs with
additional categories or active categories (here there is no active ones)
then the graph with only the categories.
I can go back on some graphs
to show how to build graphs. I create a new device
and I will build the individuals' graph
plot(res) to see
how to change these graphs, improving these graphs eventually.
Here I have a graph with active individuals
and the categories of the supplementary variables.
So I can make invisible, for instance,
categories of qualitative variables.
Individuals have all different colors,
I can color individuals
rather not color the individuals
so put one color, black, for the individuals.
The graph is quite readable;
if I want that
the labels are less close to each other
or if I have more points and I am willing separate labels,
I can reduce a little the police
using the argument cex = 0.8 for instance.
Points are better separated and the graph is readable.
On this graph I can put partial points
for instance wines 1VAU, PER1.
For these two points, I have partial points.
Partial points can
be colored
based on groups rather than put any color.
The fact of coloring by group allows to better visualize.
So for instance, for wine 1VAU
I have the red dot here that means how
the wine 1VAU is seen with only the olfaction variables.
And here how the wine 1VAU has been seen with only visual variables.
Here we find the colors that were used for the variable groups
when we had the graph of variables.
If I do
plot (res, choix = "group", habillage = "group")
groups are colored with olfaction in red, olfaction after shaking
in dark blue,
tasting in sky blue and visual in green.
And so these are the same colors which are used for partial points here.
I can select some individuals
so for instance, do not draw
all points but select only wines
which are well represented on the map, with a cosine squared greater than 0.8.
So I'll have
in black, with a label,
individuals who have a cos² greater than 0.8 and in gray here
individuals who have a cosine squared lower than 0.8.
Partial points are drawn
when individuals have a cos² greater than 0.8.
I can also color the individuals
according to a variable, the variable 1
which is a qualitative variable.
I color in red points.
I'm not going to point here because part we have a little too much information.
So I colored in red points
the wines with the Saumur label, in green
the Bourgueuil and in blue the Chinon.
So I put habillage = 1
but I could also use the variable soil.
So I can either put the number of the variable or the name of the variable
and therefore wines
are colored according to the soil variable.
So I always have a selection cos² > 0.8
I can also
select using the contribution
and take the 8 individuals who have contributed the most
to the construction of the dimensions.
So are colored only individuals
which contributed the most to the construction of the dimensions.
The 8 individuals who have contributed the most to construction of the dimensions.
I can play on transparency here putting unselect = 0
Points are not at all transparent, so the points are of the same color:
Points that have not contributed significantly are of the same color as other points
but we see that there is no labels for these points.
On the contrary, I can put
completely transparent, that is to say points disappear for individuals
that are not the 8 with the highest contribution to the construction of the dimensions.
There can possibily some problems
when you put transparency because these graphs cannot
be used in PowerPoint
because transparency is not managed
and so I recommend to use a gray color grey70 for instance.
So it is the same gray color; points are not colored according to
the soil but they are in gray
however you will be able to use the graph in PowerPoint, and move eventually the labels.
So by default, I have a graph with the map 1-2 but I can draw
the map 3-4.
I just have to put axes = 3:4
So here is the graph with dimensions 3 and 4.
And I therefore have individuals who contributed the most to the construction
of dimensions 3 and 4.
For my selection, I have the most important contributions,
I can also specify the wines I want to see: 1VAU, PER1 for instance.
So I have 2 wines that were specified.
So here are the possibilities for the individuals' graph.
We can also work on the graph of variables.
I put choix = "var" to work on the variables' graph.
So the default graph is this one.
So it is always interesting to color the variables
according to their group.
I advise to to this any time.
You see here that a lot of labels overlap. In fact, the labels are
long enough and can not be written on the right of the graph.
What is recommended to do is to extend the window and restart
the command. So now, the labels can go to the right
and therefore quite a few labels overlap. The graph is a little
more readable.
In this graph, we see that we can improve the graph by setting a shadow
putting shadow = TRUE to say that I want a shadow
under the label. So you see now
the crircile is no longer drawn when there is a label that
passes over the circle.
So these graphs are better
but unfortunately the shadow remains visible when you import
the graph in PowerPoint for instance. You will have
white squares that are difficult to manage if you want to move
labels. So if you want to move thereafter labels, it is advisable not to shadow
but if you can
use the graph as it is, it is advisable to put a shadow because
the graph will be a little readable.
So this is what we can do for the variables' graph. So again, you can use a selection
and select only the variables who contributed the most to the
construction of the dimensions, so the 8 variables that contributed the most
will be with labels for other they are drawn with
transparency. So we see
their group, we see where they are projectectd, we see they are less well projected.
This allows to draw graphs with labels that are less overlap.
Consequently graphs are more readable if there is a lot of variables.
Often this is interesting because the variables which are very close to the center of the circle,
i.e. variables with very short arrows,
are not very interesting to interpret because they are not very
well projected. And so we will often focus on variables
that are better projected, which therefore have a high coordinate
or a high contribution (this is the same information)
to the construction of the dimensions.
I can do the same thing on the map 3-4
and therefore have the variables that have the highest contribution to the construction of plane 3-4.
Obviously, the variables are less well projected on this dimensions 3-4 but
variables that contributed the most are the following 8 varuables.