Analyse Factorielle Multiple - Afm - Avec factominer

The purpose of this video is to show how to perform multiple factor analysis with FactoMineR and especially how to improve graphics that are obtained with FactoMineR. I'll first load the library and I'll perform a factor analysis multiple on the example data set of MFA, so I'm gonna do ?MFA to have the help of the function. On the bottom of the help I can use the example on the data wine. In a first step, I load the dataset, and then perform the MFA. I quickly explain the lines of code. I first load the dataset wines. Then I did the MFA on the dataset wine, I create groups of variables. The MFA will balance the influence of each group of variables in the construction of the dimensions, in the construction of the first dimension. I have a first group of variables consisting of the two first variables then the next 5 generate another group then the next 3, then next 10, the next 9 and finally the last 2. The nature of the variables of a group: the first group of variables are categorical variables therefore qualitative variables, Then there are quantitative variables with an "s" here that means we will "scale" that is to say reduce the variables. Then ncp = 5 means we will have the results for five dimensions. Finally I can name the groups of variables: a variable on the origin of wines (label, type of soil) olfactive description variables, olfactive variables after shaking, taste variables, and overall liking variables. And then the states of the groups of variables 1 and 6, the origin and overall variables, are supplementary groups and thus will not participate to the construction of dimensions. Once MFA is performed, results are in the object res and I can have a summary of this object by summary(res) I'll extend a little the window to see all the results summary(res) first recall the command line; then a table of eigenvalues and the percentages of inertia associated with each dimension. So the first dimension recovers 49% of the information, 49% of the inertia, and the second dimension 19%. Then results on the groups of variables, first the active groups first active with the coordinates of the groups, the contributions of each groups in the construction of the first dimension and the quality of representation on the first dimension. Then, the results on the second dimension: coordinates, contributions and cosine and then the 3rd dimension. One can have results on, by default 3 dimensions, but one may have results on 4, 5 or 2 dimensions for instance using the argument ncp. Then the results of the additional groups with coordinates and the squared cosines. There was no contribution since these groups do not contribute to the construction of the dimensions. Then we have the results of the individuals, on the first individuals, by default the first 10. You can see the results of all individuals using the argument nbelements = Inf for infinity. That is to say, we will have results for all elements, so all individuals all variables, etc. Again we have the coordinates contributions and squared cosines first on the dimension 1, 2 and then 3. We have the results of the quantitative active variables, so the results for the first ten variables with again coordinates, contributions and cos². For the supplementary quantitative variables, we have just coordinates and cos²; Again these are variables which have not contributed. There is no qualitative variables here that are active so we have the results for supplementary qualitative variables more particularly for categories of the supplementary qualitative variables. So we have the coordinates, the cosine, no contribution of course and a test value. The test value is often between -1.96 and 1.96. A value outside this interval means that the coordinate is significantly different from 0. For instance, the coordinate of the category reference is significantly different from 0 on the first dimension; and greater than 0 on the first dimension. These are the main results of the MFA. So then, many graphs appeared earlier. We first have a graph of the groups, with active groups in brown and in green supplementary groups. Then a graph with partial axes. For each group of variables we performed an analysis: for the quantitative variables a PCA was performed and we project the dimensions of PCA as supplementary information. So for instance, for the group of visual, the first dimension of visual is closely linked to the first dimension of the MFA. The second visual dimension is less linked. For groups of qualitative variables as the origin here we perform a MCA and dimensions of MCA are projected. Here is a graph with only the individuals, a graph with the variables, and variables are colored based on their group. So a same color for a same group. One may have on the individuals' graph, some individuals who have partial points. So here we want individuals who have very different partial points therefore a within inertia very large. The two individuals who have the largest within inertia have partial points and the two individuals who have the smallest within inertia. And there are graphs with additional categories or active categories (here there is no active ones) then the graph with only the categories. I can go back on some graphs to show how to build graphs. I create a new device and I will build the individuals' graph plot(res) to see how to change these graphs, improving these graphs eventually. Here I have a graph with active individuals and the categories of the supplementary variables. So I can make invisible, for instance, categories of qualitative variables. Individuals have all different colors, I can color individuals rather not color the individuals so put one color, black, for the individuals. The graph is quite readable; if I want that the labels are less close to each other or if I have more points and I am willing separate labels, I can reduce a little the police using the argument cex = 0.8 for instance. Points are better separated and the graph is readable. On this graph I can put partial points for instance wines 1VAU, PER1. For these two points, I have partial points. Partial points can be colored based on groups rather than put any color. The fact of coloring by group allows to better visualize. So for instance, for wine 1VAU I have the red dot here that means how the wine 1VAU is seen with only the olfaction variables. And here how the wine 1VAU has been seen with only visual variables. Here we find the colors that were used for the variable groups when we had the graph of variables. If I do plot (res, choix = "group", habillage = "group") groups are colored with olfaction in red, olfaction after shaking in dark blue, tasting in sky blue and visual in green. And so these are the same colors which are used for partial points here. I can select some individuals so for instance, do not draw all points but select only wines which are well represented on the map, with a cosine squared greater than 0.8. So I'll have in black, with a label, individuals who have a cos² greater than 0.8 and in gray here individuals who have a cosine squared lower than 0.8. Partial points are drawn when individuals have a cos² greater than 0.8. I can also color the individuals according to a variable, the variable 1 which is a qualitative variable. I color in red points. I'm not going to point here because part we have a little too much information. So I colored in red points the wines with the Saumur label, in green the Bourgueuil and in blue the Chinon. So I put habillage = 1 but I could also use the variable soil. So I can either put the number of the variable or the name of the variable and therefore wines are colored according to the soil variable. So I always have a selection cos² > 0.8 I can also select using the contribution and take the 8 individuals who have contributed the most to the construction of the dimensions. So are colored only individuals which contributed the most to the construction of the dimensions. The 8 individuals who have contributed the most to construction of the dimensions. I can play on transparency here putting unselect = 0 Points are not at all transparent, so the points are of the same color: Points that have not contributed significantly are of the same color as other points but we see that there is no labels for these points. On the contrary, I can put completely transparent, that is to say points disappear for individuals that are not the 8 with the highest contribution to the construction of the dimensions. There can possibily some problems when you put transparency because these graphs cannot be used in PowerPoint because transparency is not managed and so I recommend to use a gray color grey70 for instance. So it is the same gray color; points are not colored according to the soil but they are in gray however you will be able to use the graph in PowerPoint, and move eventually the labels. So by default, I have a graph with the map 1-2 but I can draw the map 3-4. I just have to put axes = 3:4 So here is the graph with dimensions 3 and 4. And I therefore have individuals who contributed the most to the construction of dimensions 3 and 4. For my selection, I have the most important contributions, I can also specify the wines I want to see: 1VAU, PER1 for instance. So I have 2 wines that were specified. So here are the possibilities for the individuals' graph. We can also work on the graph of variables. I put choix = "var" to work on the variables' graph. So the default graph is this one. So it is always interesting to color the variables according to their group. I advise to to this any time. You see here that a lot of labels overlap. In fact, the labels are long enough and can not be written on the right of the graph. What is recommended to do is to extend the window and restart the command. So now, the labels can go to the right and therefore quite a few labels overlap. The graph is a little more readable. In this graph, we see that we can improve the graph by setting a shadow putting shadow = TRUE to say that I want a shadow under the label. So you see now the crircile is no longer drawn when there is a label that passes over the circle. So these graphs are better but unfortunately the shadow remains visible when you import the graph in PowerPoint for instance. You will have white squares that are difficult to manage if you want to move labels. So if you want to move thereafter labels, it is advisable not to shadow but if you can use the graph as it is, it is advisable to put a shadow because the graph will be a little readable. So this is what we can do for the variables' graph. So again, you can use a selection and select only the variables who contributed the most to the construction of the dimensions, so the 8 variables that contributed the most will be with labels for other they are drawn with transparency. So we see their group, we see where they are projectectd, we see they are less well projected. This allows to draw graphs with labels that are less overlap. Consequently graphs are more readable if there is a lot of variables. Often this is interesting because the variables which are very close to the center of the circle, i.e. variables with very short arrows, are not very interesting to interpret because they are not very well projected. And so we will often focus on variables that are better projected, which therefore have a high coordinate or a high contribution (this is the same information) to the construction of the dimensions. I can do the same thing on the map 3-4 and therefore have the variables that have the highest contribution to the construction of plane 3-4. Obviously, the variables are less well projected on this dimensions 3-4 but variables that contributed the most are the following 8 varuables.