Tip:
Highlight text to annotate it
X
In this video, you will learn to explore bivariate correlations using the CORR procedure in SAS.
In this video, you will learn to explore bivariate correlations using the CORR procedure in SAS.
One important part of your exploratory data analysis is uncovering relationships,
or associations, between a response variable and potential predictor variables.
You might also be interested in learning if any of the potential predictor variables are interrelated themselves.
This could pose a potential problem of collinearity if they are used in the same regression model.
In SAS, the CORR procedure performs both of these tasks. Let’s take a look at the SAS code that will accomplish this.
We begin with the DATA= option, where we indicate to SAS where the data set is located.
The name of our data set is bodyfat2, and it is located in the sasuser library.
When you want to investigate the correlation between predictor variables and a response, you place the predictor variables in the
VAR statement and the response variable (or variables) in the WITH statement.
This will limit the output to only pairings of the VAR variables with the WITH variable.
By requesting the RANK option, our correlation output will be ranked in descending order by
absolute value (thus yielding stronger correlations listed first).
Using the PLOTS option, we can request high resolution scatter plots of the variable pairings.
The ONLY option suppresses the default graphics automatically produced by
PROC CORR and displays only the graphics requested.
This enables us to visualize the relationship and ensure that it is linear (which is the only time we are allowed to interpret correlation correctly).
If needed, ellipses can be included on the image to make positive or negative associations more clearly visible.
Let’s submit this code and look at the output.
PROC CORR provides a table of simple statistics for the variables involved.
This information can be suppressed from the output using the NOSIMPLE option.
The second table contains our Pearson correlation coefficients.
Because we used the RANK option, the relationships are listed in descending order of strength.
In addition to the correlation coefficients, we are also given p-values that test the significance of the association.
Correlations can be correctly interpreted only if the relationship is linear in nature.
PLOTS=SCATTER generates the scatter plots needed for us to ensure the linear relationship.
But what if you are also interested in the potential correlation among your predictor variables?
Removing the WITH statement from the code tells SAS that you want to analyze all of the cross-correlations of the variables.
Remember that because we have already seen the simple statistics for the variables,
we can suppress that information from the output with the NOSIMPLE option.
This time let’s put all of the scatter plots together in a single scatter plot matrix.
We request this using PLOTS=MATRIX.
The HISTOGRAM option requests that the major diagonal of the scatter plot matrix be filled with
histograms displaying the distribution of each variable. If that option is not used, the name of the variable will fill the major diagonal.
With the number of observations in this data set combined with the number of scatter plots that will be included in this matrix,
we will need to increase the number of points that are capable of being placed on the graphic.
MAXPOINTS= is the option that increases the number of points allowed.
If the value specified in the MAXPOINTS= option is exceeded, the graphic is suppressed from the output.
Let’s submit this code and look at the output. This time we have a matrix of correlations and their respective p-values.
Note that this matrix is symmetric.
Now let’s look at the scatter plot matrix. With our inclusion of the HISTOGRAM option, histograms appear along the major diagonal.
Like the correlation matrix, this matrix of scatter plots is symmetric in that the images in the
upper triangle are just rotated versions of the lower triangle’s images.
With this, we can again determine whether the relationships are linear in nature and, if so, we can correctly interpret the correlation value.
Now that I have shown you a brief tour of PROC CORR in SAS, it’s your turn to look at the correlations of your variables.
Now that I have shown you a brief tour of PROC CORR in SAS, it’s your turn to look at the correlations of your variables.
Thank you for your interest in SAS.