2 Non - Parametric - Examining skewness of data

Well, let's continue our journey through the wonderful topic of non-parametric analysis. And in this video what I'm going to do is demonstrate a set of data that I've extracted. We will examine those data for skewness, and then really spend a little time looking at the output from that SPSS readout. For our purposes, I've extracted the following 2011 data set from the IPEDS. Now, in our previous lessons, you have been working with data extraction from IPEDS. I just went out and what I did is, I did some groupings. I did two-year public degree-granting colleges in-- now, these are two-year, public, degree-granting, and they're colleges-- in Texas, Oklahoma, and Louisiana. So I had to do three extractions. One for Texas, one for Oklahoma, one for Louisiana, and then weld them into one data set. My dependent variables were the expenditures as a percent of total expenditures for instructional support, academic support, student services, and other. Now we will examine the skewness of these data in SPSS. I'm going to take you into SPSS. We already have the data set design. And we will look at the variables of instructional support, academic support, student services, and other to determine if they are normally distributed or not. Now, I will remind you that it is the dependent variable that we're concerned about with its distribution. The groupings can be nominal or orbital. Here we have a nominal grouping. There's Texas institutions, Oklahoma institutions, and Louisiana institutions. Bear with me, and let's go check SPSS out. Here, now, is the data set that I told you that I had built. I extracted for three groups the following variables. The groups are coded. One is Texas, two is Oklahoma, and three is Louisiana. Now, there's nothing ominous in that ordering. If you're from Louisiana, don't send me any hate mail. I love Louisiana. I married a very beautiful woman who has a lot of family from Louisiana, and I love going to their family reunions, because they're good people. Not so much so true for some of my Texas relatives, but that's another story. So we have groupings. I have the percent of total revenue spent on instruction-- percent of our total expenditures spent on instruction, the percent of total expenditures spent on academic support, the percent of total expenditures spent on student support, and the percent of total expenditures spent on others. Now, these data sets are for two-year degree-granting institutions in Texas, Oklahoma, and Louisiana. And these are extracted from the IPEDS. Again, they are all GASB percentages, and those of you that are accountants know that those are very solid numbers. Now what I want to do is examine these four dependent variables to look at their distributions, to see if any of them might possibly be non-normally distributed. Now the groupings don't matter. The groupings are nominal. They're the independent variable. But in this case, we're going to look at these four other variables. We would go up to Analyze, and remember that when you evaluate the normality you're interested in kurtosis and skewness. So we'll go to descriptive statistics. Do you recall that you do that with frequencies? Oh, here it comes. What variables are we going to do? We're going to do those four dependent variables. Let's go up to Statistics. Oh, you know me, I have to do the mean, median, and mode. That's just me. The Dawg loves the mean, median, and the mode. Let's do skewness and kurtosis. Now we also want to look at them. One picture's worth a thousand words. Let's do the histograms, and once we do that, let's tell it OK. And away it goes, and man, SPSS is a cookie. Here's our readout right now, and we'll go into that in just a minute, but it's getting after it. And my gracious, we now have what we need. So I'm going to take just a minute now and carry you over, and let's look at the readout that we have received. Well, if there's anything you can say about me, I am indeed a creature of habit. I went over to Frequencies, took those three data sets in. Let's scroll up and see what we have. We have the percent instruction, the percent academic support, the percent student support, and the percent other. These were our dependent variables. You know I'm just into the mean and the median, I must not hit the mode. I'm kind of dense, a little bit, but look at that just a second. Let's go across to this variable right here, which is called skewness, and we have the standard error of the skewness. The first thing that we notice, percent instruction is a negative 0.628. Means it's a little skewed to the right. Percent academic support 0.595, skewed a little bit to the left. Here's the one that might really be of interest to us, the percent of student support. Very much skewed to the left. And then the percent of other is somewhat skewed to the left. The kurtosis, if we look at this, the percent of student support not only is skewed to the left, but it's very sharply clustered around the mean, telling us that we may have some wings laying out on it. So we'll go down and-- of course, here I said about being a creature of habit, I had to run those frequencies. But I wanted to run the histograms and let you look at those, that we might go through one. Well, let's see what's going on here. This is the percent instruction. You saw that it didn't really have much skewness. A little bit to the right, but wasn't too bad. Kurtosis wasn't bad. That's approximately normally distributed. Don't have a problem with that one. Let's go to the percent of academic support and see if we can't get it lined out here. Well, I'll get it down there in just a second. Percent of academic support wasn't too bad. Kind of an approximate normal distribution there as well. We're not too upset about it. The one that was really troublesome was this percent of student support. Remember, it was heavily skewed to the left, and it had a very high kurtosis, which means it's really clustered right in there on the mean. Let's say the mean is 9.54, so it's really clustered right in here closer to the mean. More so than you would have in a normal distribution, and you have some data out in the wings. That one is the one that we may want to do a non-parametric design on. Let's start on the percent other. It almost looks normally distributed. It's close enough-- as they used to say in East Texas where I grew up-- close enough for government work. Well, I hope you've enjoyed looking at the readouts. Pretty interesting. The one that is of the greatest interest to us is this third dependent variable. This is the one that we're going with, the percent of student support. The percent of expenses out of the total expenses spent on student support is the one that we want to look at. So I thank you very much for your support. Apparently the computer's moving a little slow there, or maybe I'm pushing the buttons a little slow. Who knows. OK, let's move forward. In the words of my old East Texas friends I grew up with, the old clock on the wall says it's time to get out of here. We will continue our discussion and examination of non-parametric analysis here in just a moment. I hope you enjoyed this video. May the odds be ever in your favor.