Tip:
Highlight text to annotate it
X
Hi everyone, Lloyd here with my first presentation of the course, course on several presentations
of key fundamental statistical ideas knowing these ideas is very important for making sense
of the statistical computations we will be performing in this presentation I'll introduce
you to the key ideas to get you ready for the first Excel spreadsheet example OK let's
get started OK let me start with one of my favorite quotes there are lies and damn lies
and statistics. I'm not really sure who gets credit for this quote but it's an interesting
one, isn't it? I've always felt it didn't quite do justice to statistics because I believe
that only people can lie. Now they can lie with statistics but statistics themselves
really had nothing to do with it. So let us explore this idea a little bit here and the
first example is bowling and I think I gave you the heads up about this that our first
Excel spreadsheet example was going to be in the context of bowling. So if somebody
said to you that they have a bowling average of 175 you begin to make certain assumptions
for example you assume that that average is based on a collection of scores actually based
on bowling at a bowling alley and not perhaps on a Wii play station or some other videogame
version of bowling. You also expect the course that they followed the rules of bowling as
they gathered that data. So if their foot crossed the line that would be a foul and
you would expect their scores to reflect that Now you might have wondered did they count
all their scores? So, for example somebody who has a bowling average of 175 usually only
counts the scores when their bowling as part of the games in a league, and you don't count
the games that are part of practice. So I think these are all again examples of where
do the statistics come from and what are the assumptions under which that data was collected.
And you know sometimes especially in education research we would like to collect data that
frankly isn't available to us. So sometimes we have to make some tough choices and you
will often hear of people saying that they had a convenience sample you have be kind
of a careful about just how convenient something is if that data really will be meaningful.
And it reminds me of an old joke here of a husband who is looking for a shirt button
in the kitchen and his wife says, "What are you doing? and he says, "Well, I'm looking
for a button that fell off my shirt in the bedroom she says "Well, why are you looking
for it in the kitchen?' "Well," he says, "You know the light is better in here." So sometimes
it can be very convenient to gather certain kinds of data but again it won't necessarily
be that meaningful, if at all. And I apologize for this slide being so text heavy. I really
try to avoid text heavy slides. But I think this is a good one and the slide actually
comes from ideas in a book that we use in one of our classes at the University of Georgia.
And the book is Leedy and Ormrod's Practical research planning and design. It talks about
the idea of measurement as a tool of research and the importance of measurement as limiting
the data. Because you see there is far more data than you could ever possibly be able
to to deal with or collect. So you have to make some decisions on how you're going to
limit the data and the data are going to be limited by the measurement construct, the
thing you're trying to measure -- learning, motivation, engagement -- the instrument capability
an online survey for example as compared to a paper survey has very different capabilities
and finally the amount of raw information that you are prepared to deal with as a result
So again you have to make some very difficult decisions but certainly in all research you're
going to have to limit the data. And to give you an idea or example of this -- it's really
more of a metaphor of the different kinds of data that you will be able to collect,
depending on what options you choose. Here's a satellite photo of a place in the United
States and I wonder if you can figure out where this place is. Now at this high altitude
obviously I can't make out very many details of all really known about the actual place
but in terms of the own the of the buildings and so forth but I did get some sense of the
geography I think you can see those rivers that are quite prominent and I see it looks
like there's a there are three rivers there to coming together to form a third as I go
from right to left. And I see that triangle right where the rivers meet Well, it turns
out this is my hometown of Pittsburgh and again taken from this viewpoint you can get
a very interesting sense of, again the geography, but you don't get any sense of the communities
that we have in Pittsburgh. Down if I come down, let's just say I'm in this helicopter,
and I come down about halfway closer to the earth I might get this view. And in this view
I start to see much more information about neighborhoods, and I see the street layouts,
I get some sense of the green areas. I see that baseball field down there. I get some
sense perhaps of the groupings of people now if I come down as close to street level - there
we go - we get a much better idea of what that neighborhood looks like and the size
of the houses and size of the yards and indeed this is actually my neighborhood. This is
the street where I grew up in Pittsburgh. Now what is really interesting by the way
as I go back up in my helicopter I stay centered on my, on my street. This is again is right
above my street. And this imagery as well is taken where my street is the focal point
of the image. OK so to kind of wrap that up here is another little metaphor for you. We'll
see if you can figure this out based on my imagery there of the firefighters on the left
holding that fire hose and that teacup on the right. So isolating meaningful data when
conducting most research studies is like what? Well, the metaphor is you know, the amount
of data that's possible to collect might be that fire hose but the data that I'm able
to collect and make some sense of it would be like the tea cup's worth of the water coming
out of the firehose. Very good. So, let's now talk about a very important a set of concepts
in statistical circles called the four scales of measurement. Which is going to refer to
when I collect data, what kind of data is it? And depending on the scale of the measurement
of that data it will tell me what kinds of manipulations or calculations I'm allowed
to perform on it. And there are four scales of measurement nominal, ordinal, interval,
and ratio. So let's briefly consider these in turn here. So first is nominal, the nominal
scale actually, kind of gives you an idea what it means because nom- nominal means name
you're giving a number as a name for something in the data. So for example I could give somebody
a survey that says what is your favorite color and in order to code that I might say blue
is one, red is two, yellow is three, green is four, and purple is five. So you can imagine
giving a survey like that but again the numbers don't really mean that four, that green is
twice as much as red. It just means that it's a convenience that four is the name. But some
people may do the exact wrong thing by actually saying well, why not just add it up and say,
the average color, average favorite color was 1.6. Lloyd says this result makes absolutely
no sense because the data are nominal, therefore we can't average them. I'm not allowed to
do that mathematical calculation on them. And again to use a Pittsburgh example, two
of my favorite Steelers Of course, Hines Ward on the right retired, but there we have Hines
and Troy Polamalu You know, Hines his number when he was playing was 86 and Troy's was
43. That's their name on the field it does not mean that Hines was twice as good as Troy.
OK, let's consider the ordinal scale of measurement. And, really ordinal means the order of things,
or a ranking, and so we're going to compare various pieces of data in terms of one being
greater or higher than another. So that is the idea of ranked order data. So to give
you a visual, to give you a sense of this, here are some of the presidential contenders
in the 2008 presidential race, again only on the Democratic side. So some of these people
you know probably very well and what their futures held. Some you might be struggling
to know who they actually were. So if I were to give you a survey of some sort to say please
rank the current candidates in your order preference. Well, I could put them in order
but it doesn't mean that my first-place person necessarily is twice as preferred as the second
place. The only thing that matters here is the order of the data. So let's compare that
to the interval scale of measurement. So again, somewhat of a text heavy slide because the
these are the key ideas but let's go through them and then come up with a few memorable
examples. So an interval scale of measurement has equal amounts of measurement so from point
1 to 2 or 2 to 3 that does actually have some meaning. It's considered an equal amount from
point to point. But the zero point has been established arbitrarily and that's going to
be something I'll back to later. So the zero itself doesn't really mean zero in the sense
of nothing. And the example that I will come to that I might as well give to you now is
the idea of temperature. OK, zero on one scale like Fahrenheit means one thing but on the
Celsius scale it mean something else. So by having an interval scale though I am allowed
to determine the mean, the standard deviation and things like the product moment correlation.
It also, you are allowed to conduct inferential statistical analyses on interval scales of
measurement. Now finally let's compare all that to the ratio scale of measurement. It
is also important to note that as you go up from nominal nominal to ordinal to interval
and now to ratio, each of those latter scales of measurement take on the characteristics,
they include the characteristics of what came before. And in fact the ratio scale, the key
idea here is that there is an absolute zero point. So it is again very similar to interval
with the important new difference that it does have an absolute zero point. And one
is allowed to conduct virtually any inferential statistical analysis. So what I like to leave
you though, because I think most people get confused as they struggle with what's the
difference between the interval and ratio scales of measurement. I'll leave you with
the example here that a measurement of of heat or temperature we use a thermometer.
And if you think to yourself, it does not make sense to say that 40 degrees is twice
as hot as 20 degrees because no, no you know that's an interval of that I can look at 20
to 30, 30 to 40, that's an equal amount. But I can't be making ratio comparisons that this
is twice as much as that. Whereas a ratio scale I think the most common one would be
length. If I measure a board and this board is 5 feet and that board is 10 feet, then
I can make the ratio comparison that the second board is twice as long as the first board
because zero length has meaning. But it does not have meaning when it comes to temperature.
So, I hope you walk away from this was that important distinction. If not, please study
this whether that be in a textbook that you may have, or Google this so that you make
sure that you understand the difference. So here's a nice chart that let's you know when
it's OK to compute these different kinds of calculations. And as you can see under nominal
there really is only one type of computation I can do and that would be, well, frequency
distribution, or counting something up for example. So, like your so being able to count
your colors, which color was the most preferred for example, so the most of something. Whereas
with ordinal, I can do a little bit more with that. I can certainly count up as well ... so
the preferred candidates, but I can also compute things like medians and percentiles. And you
can see for interval I can do just about every kind of calculation except those that involve
ratios, whereas as with a ratio scale of measurement I can do all the things. And as we'll see
throughout this course, there are two important types of statistics -- measures of central
tendency and measures of variability. And in this particular module we're only going
to be concerned with measures of central tendency. In the next module, when we get into descriptive
statistics We're going to explore both of these in more detail. So there are three measures
of central tendency we have the mean which we are all familiar with. It is the average
of a set of numbers. We have median, which I think we are also somewhat familiar with,
if only from the news, because many statistics are often reported using the median, such
as home sale prices the median is simply, if you take a set of numbers and arrange them
in descending order the median would be the number at the midpoint, although sometimes
you would have to interpolate to find that number. The mode is simply the number in the
set of scores that has the greatest frequency. You might say it is the most popular number.
The interesting thing is given a normal distribution these are all the same number. So, here you
have a graphical representation of the normal distribution, the classic bell curve. And
if you look at the apex of the curve and kind of follow that down, trace it down to the
x axis that's going to identify the mean, median, and the mode. So a normal distribution
is symmetrical and when you have normal distribution the mean, median, and mode are all the same
number or the same value. But that is not the case when you have a skewed distribution
so in this graphic, and I apologize, it's a little fuzzy, we have of course the normal
distribution again in the middle, but you see on the left and the right two skewed distributions.
They are not symmetrical. The left is negatively skewed, you can see the numbers are rather
bunched up on the right-hand side and the tail going toward the left. And we call it
negatively skewed because we follow the tail going in the negative direction. Positively
skewed is the mirror opposite. You see again the numbers bunched more toward the left and
tail going to the right. Now notice in the two skewed distributions the fact that the
mean, median, and mode do not represent the same value the mode, of course, is again,
it can be traced down from the tallest part of the curve, given that it has the highest
frequency, far to the left on the positively skewed we see the mean is far to the right.
And somewhere in the middle is going to be the median. Now this is very important and
I actually have another video to share with you where we really demystify the normal distribution
to make sure we really understand where it comes from and how it is generated. So I do
recommend that you watch that video and make sure that you understand the normal distribution
very very well. The reason is that the statistics that we're going to be calculating all depend
on or are based on the assumption that your distributions are normally distributed and
I think one classic example I think helps to understand why I would use median instead
of mean and what a skewed distribution really means so this is an exaggeration but it drives
the point home pretty well so you often hear median income so imagine a neighborhood of
1000 people and in that neighborhood there are 999 extremely poor people. Let's just
say they all have an average income of maybe $1000 a year. But also in that neighborhood
you have one billionaire who earns $1 billion a year. Well, if I take up all 1000 scores,
their incomes, divide by 1000 you're going to get a data point that says "Wow! while
that's a pretty rich neighborhood I need to go live there." Well, no, it is obviously
a very poor neighborhood. The best measure of the central tendency, what number best
captures or represents the entire a group is not going to be the mean. Instead, the
median is going to do a much better job of saying, well if you take middle income of
all those incomes that's going to a much better representation of the income of that entire
neighborhood. It's the same thing with the median price of home, you often have neighborhoods
where you have several homes that might be quite exquisite, or just over-the-top in terms
of what they what they're worth and most of the other homes or the far majority of the
other homes are not nearly with the same value. So the median price is a better measure of
that value of the homes, in that neighborhood. OK, we're going to stop there. In the first
Excel spreadsheet example all we are going to do is compute some averages to get some
sense of how that works. And I'm actually using it as a starting point because I know
all of you understand what an average or mean is. It will be a good example of following
me in Excel in a video tutorial and then seeing how we're going to submit that for evaluation.
And this concludes this presentation.