Tip:
Highlight text to annotate it
X
>> SUE: Hello! This afternoon I’d like to show you a quick example of analyzing a SAS
data set using R using a data set from a New York City health survey .
With the RevoScaleR package that is bundled in Revolution R Enterprise it is quick and
easy to directly access and analyze SAS and other data sets such as SPSS, text data files,
and ODBC connections.
A data source object in R contains information about where the data is located, and additional
information you might want to provide – such as how categorical data should be handled.
A data source object can be used to import data for use in R, or can be used directly
with analysis functions such as summary statistics, linear models, logistic regression, and generalized
linear models.
As an example, we’ll use the 2010 Community Health Survey for New York City. Let me switch
to the R Productivity Environment to give you a quick run through.
I can create a SAS data source just by using the file name and location. I don’t have
SAS installed, but that’s not a problem. I can retrieve basic variable information,
including the variable descriptions that are contained within the file. I see that this
file has 194 variables, and note that there is survey weight variable I should use for
the analysis.
When I downloaded the data set, information about categorical data was also provided.
I’ve put some of this information in R in these other R scripts, and can add it to my
SAS data source by using the colInfo option.
I’m not very familiar with New York City, so I thought I’d start with doing some visual
data exploration. I can quickly create a variety of histograms, looking at responses by borough.
For example, looking at physical activity level, we see the Bronx has the highest percentage
of people who report to be “Very Active”.
But when we look at nutrition, it’s people in Manhattan who have the most “Excellent”
diets.
When it comes to commuting, those on Staten Island most often use cars, but in Manhattan
it’s more common to commute by walking than by car.
I can use other my data source in other analysis functions as well. For example, let’s run
a regression on the average number of drinks in the past 30 days.
With this simple model, we predict that those on Manhattan have an average of .2 additional
drinks per day relative to those on Staten Island.
If we want to analyze this data set further, we might want to import it – either into
an .xdf file which is very efficient for analysis, or a data frame if the data set is sufficiently
small. Here, we’ll just import a few variables into a data frame in memory. Then we’re
ready to use any of the multitude of available R analysis functions.
Please contact us if you’d like more information on how you can use your SAS data files with
R. Thanks for listening!