Tip:
Highlight text to annotate it
X
Hi Everyone, Lloyd Rieber here again with our next example
of using Excel to compute some statistics. In this video, we compute the first of three
inferential statistics covered in this course, and the first of two t tests, namely the t
test for a correlated-samples design, also known as a dependent t test.
To begin, I suggest you fasten your seat belt and put your tray in the locked, upright position,
because you need to be ready for some mild turbulence that will take the form of some
mental gymnastics. Rest assured, however, that computing a t test is based on some computations
you have already done. And, if you have completd the previous videos and saved your excel spreadsheets
along the way, you really don't have that many steps to complete. When done, I think
you'll say "Wow, I can't believe how few computations I actually did in this example."
Remember, I've designed this course for mere mortals, so the route I've mapped out for
us -- though a little bumpy -- will get us all there in one piece.
I'm going to start with an explanation of the formula and I will show how everything
is actually based on N, or total number of participants, the standard deviation of each
of the two data sets, and Pearson r, the correlation coefficient we computed in the previous video
showing the relationship between the two data sets.
So, let's get started and don't worry, a beverage of your choice and a bag of peanuts will be
served after we reach cruising altitude.
So at the top we have the two formulas for the two t tests we are going to be computing
again in this video I'm just going to be showing you how to compute a t test between correlated
samples. But I wanted to show you at the very beginning some relationships between the two
formulas now you can see that I've dimmed out the details about what we will talking
about in a few minutes but I wanted to point out that we have some very familiar friends
coming up here. We have N, the total number of participants we have the standard deviations
of both of the data sets, and we have the Pearson r, the correlation coefficient that
we computed in the last video so everything we're about to talk about I want you to remember
is based on these calculations that we've already perform so as we look at the formulas
we can see that a t test is a difference between means divided by something called the standard
error of a difference so in the numerator we have the difference between the means we
have x bar minus y bar, x bar of course being the mean of the first data set and y bar being
the mean of the second data set, so I think we are very familiar with that and that looks
like a very easy computation to do. Let's take a look at the denominator as you see
in both cases the denominator is called the standard error of a difference and yes this
is a new term so if they're both a standard error of a difference why do they look different
well it's because In the formula on the left it's really called the standard error of a
difference between correlated means whereas on the right it is the standard error of a
difference between means. So this subtle difference is actually very important if you think about
what a correlated samples design is all about it's really about a group of people who have
been given two measures so we have two data sets but each pair of data much like with
the Pearson r they are going to be related because they are coming from the same person
but we get to the formula for a t test between independent samples we are going to have two
separate groups, or two samples where each sample consists of a completely different
set of people so throughout this explanation I'll continue to use red to distinguish the
standard error of a difference. But I'll use a circle to distinguish the standard error
of a difference between correlated means and a rectangle to show the standard error of
a difference between means. So let's focus now on the left-hand side formula, the t test
between correlated samples So, how we compute that standard error of a difference? Well,
that looks pretty daunting and you might be getting a little nervous at this point But,
just relax! First let's just take a look at it and I want to point out something namely
that we have r we have a Pearson r as part of the formula and remember that we have already
computed that now I want to point that out because that's an important part of the formula
and after all this is for a correlated samples design so we have in the denominator the Pearson
r because we are very interested in what is the relationship between these two data sets
now before we try to explain the rest of that formula let's go and take a look at the standard
error of a difference between means so you can see again with the rectangular box I've
try to show that connection now computing the standard error of a difference between
means when I look at that before I understand what it actually is I want you to see that
there is something in common between both of the standard errors of a difference and
there it is so what do we have going on there? Well, we have the square root of s subXbar
squared plus s subYbar squared well I just want to point out that s subXbar is also known
as the standard error of the mean again a new term but I want you to see that relationship
between these two formulas Now let's go a little bit further and I think you'll start
to relax as you begin to see where this is all heading so how to compute that if we go
a little bit further we begin to see some familiar statistics are we that we have computed
in the past I want to point out that the standard error of the mean, that s subXbar is really
nothing more than the standard deviation of the first data set divided by the square root
of the total number participants of the first data set so we will be using that in different
parts of the formula. As you can see on the left we are going to be squaring that. And
the same holds now true for the standard error of the mean for the second data set s subYbar
and again we will be squaring that in a couple places in the formula so I hope you can see
that even though it looks a little bit daunting, we have some mathematical symbolism that can
really get in way of understanding but it's going to be pretty straightforward once we
move to excel because we will be using N, the total number of participants we will be
using the standard deviations that we've already computed and we will be using that Pearson
r that we have already computed as well
OK, let's move back to Excel.
A reminder that I will talk rather quickly, so be sure to use the pause and rewind controls
as needed.
A reminder also that I'm using the Macintosh Excel 2011 version, so if you are not then
you'll need to translate my actions for the particular version you are using.
The first step is to open the spreadsheet that we saved at the end of the last video.
Mine was titled "Rieber-Correlation." and then we're going to save this with a new name.
So, I'm going to go up to "File" come down to "Save as" and I created a new folder in
my course folder called inferential statistics and that's where I want to save this and I
suggest we give this a name as your "last name-correlated_t_test." Even though I'm using
a Macintosh I prefer to save with a more standard format where I use underscores for spaces
OK be sure that it's going to the right folder then click save the next step is to use our
imaginations you see we're not going to change any of the numbers but we have to imagine
a completely different context in which the numbers were derived so instead of the general
survey and the course survey that we used in the correlation tutorial we are going to
imagine an entirely different context we're going to imagine that we are using a one group
pretest-posttest design recall from the presentation about inferential statistics that this is
a design where you would select a group of people at random from your population give
them a pretest and then have them go through some sort of treatment or intervention such
as a training program and then you would give them a posttest and the posttest would be
exactly the same as the pretest and you wind up with two scores and what we're interested
in is whether there has been a significant increase from the pretest to the posttest
within this group. So let's go and change some labeling let me go and click on general
survey and in place of it I'm going to type pretest and I'm going to go and click on course
survey and type posttest so a very small change in my labeling but a very big conceptual change
in terms of where the data have come from next let's change the layout of our spreadsheet
a little bit we are going to be computing a t test as we know we're going to have a
numerator divided by a denominator to get the t value so let's go over here to column
L. and we'll come down to the same row as we have some other labels and I'm going to
type in t num, for numerator, press return. Then, I'm going to come down and click on
the cell in the same row as I have the labels for standard deviation and I'm going to type
t denom, for denominator, press return finally I'll come down to the same row as I have the
label for r and I will type t value, and press return. I like the divide and conquer strategy
and I'll be using throughout this video tutorial lets go and consult the formula to see what
we do next well looking at the formula for a t test between correlated samples I think
we should start with the numerator so we have X bar minus Y bar or the mean of the pretest
minus the mean of the posttest let's click in the cell right below t num and we enter
an equal sign So let's go now and click on the mean of the pretest minus the mean of
the posttest and press return Alright, good start! I suggest we go and do a save let's
go back to the formula well we need to start attacking that denominator but let's use the
divide and conquer strategy and we'll begin by focusing on the standard error of the mean
for the pretest now we'll use the term s subXbar because that will be a little bit more descriptive
especially in our spreadsheet as you can see that is going to be calculated by taking the
standard deviation of the pretest and dividing by the square root of the number of participants
in the pretest it's always good to start with a label, so I'm going to go over here to column
D, and I'm going to go to the same row as I had the labels for r and t value and I'm
going to type s subXbar I know it looks a little weird press return there but it is
a very good descriptor as we go and consult with the formula again s subXbar is also known
as the standard error of the mean we start to formula with an equal sign now we need
the standard deviation and of course we have two that we have calculated from previous
videos. Recall that the first standard deviation was one that we calculated manually and the
second standard deviation was one that we let Excel calculate using its built-in standard
deviation function. For reasons that I will make clear later we're going to absolutely
choose the very first standard deviation in my case that is D26 now we are going to divide
by the square root of N. So we hit the forward slash / square root the square root function
is SQRT parenthesis and we'll click on N that was calculated over here which is the 10 click
on the 10 close my parentheses and press return so the standard error of the mean or s subXbar
is 1.98 for the pretest let's go back and check our formula next we do the very same
thing for the standard error of the mean for the posttest or s subYbar it is a standard
deviation of the posttest again divided by the square root of the number of participants
let's start by putting in the label in the corresponding spot in the posttest SsubY bar
press return now I'm going to take advantage of Excel by going over to the 1.98 I'm going
to choose to copy that and come over to cell right below my label and I'm going to choose
paste that press return and let's do a save let's click on that cell to doublecheck to
make sure everything is OK if I go up now and click on the formula we see, yeah, we're
we have the standard deviation and N for the posttest so the copy and paste worked just
fine because I was using a relative reference for the two data points in that formula okay
let's make sure we hit the exit out "x" there so we have a standard error of the mean for
the posttest of 1.82 let's go and check our formula to see what we need to do next well
next we need to square the standard error of the mean for both the pretest and the posttest
well, that will be easy to do. Let's go ahead and do it. Let's first put in a label let
me go over to the pretest, and I'm going to come down two cells from our s subXbar and
I'm going to type s subXbar squared press return to square that all I do is enter the
equal sign click on the 1.98 then type an asterisk and click on the 1.98 again and press
return let's do the same thing for the posttest enter the label s subYbar squared press return
then enter an equal sign click on the 1.82 press an asterisk then click on the 1.82 again
and press return there you have it let's go back to our formula to see where we stand
and what we need to do next okay well it looks like we have successfully computed s subXbar
and s subYbar, the standard error of the mean of the pretest and posttest and those values
will go there in our formula we've also computed the squares of those two calculations and
they will go right there and of course we have already computed r. So I don't know about
you but I'm getting rather excited we have everything we need to compute the denominator
so let's see how we're going to do that. Before we go back to Excel, just to make sure we
are all together let's take a look at how we are going to do this in Excel, what the
formula construction will look like and so consider these three parts of the denominator
and I've color-coded them to try to show the grouping in Excel, what we are going to be
doing is taking the square root of the entire thing So you can see where I have a set of
parentheses in red then I have this first part the the pink plus then blue then I'm
going to subtract from that this piece in green and that piece in green, since I have
2 multiplied by r multiplied by the two standard error of the means I'm going to need to put
all of that in its own set of parentheses Alright, let's go to to excel and do it! Let's
go to the cell right below the label "t denom" select that cell we always start a new formula
with an equal sign and we are going to be taking the square root of everything so we
will put in the square root function SQRT and a parenthesis the first part is going
to be s subXbar squared so I click on that plus enter the plus sign We now enter the
s subYbar squared and now minus so let's enter another parenthesis it is 2 times r, so enter
an asterisk click on the r value and we are going to multiply that by s subXbar multiply
that - another asterisk - by s subYbar and we need to close the parentheses for the second
or, I should say for the last part, and then close the parentheses for the square root
function and press return OK we are now getting close! Let's go and do a save and of course
the last part is simply to divide the numerator by the denominator. So, let's go down to the
cell below the label "t value" click on it put in an equal sign click on t numerator,
divide by - so the forward slash - enter that and then click on the t denominator starting
with 1.57 and press return Excellent! I have a -4.1400 and so on obviously we should do
some rounding Let's go up and do the rounding now. I'll click on the column header "L" go
up to format, choose cells and then I'll click on the category number and two decimal places
would be good click OK. So I have a final t value rounded to -4.14 OK let's quickly
go up to file and choose to save. And now let's stop a second and catch our breath!
Can you feel the turbulence going away and a calmness overtaking you? You have calculated
a t test by hand! And yes, Excel helped us out quite a bit, but we had to do it all really
by hand. I think now is a really good time to stop and think about what that t value
means. So the numerator was simply the X bar or the mean of the pretest minus the mean
of the posttest. So you can see I hope that we had that difference or variability between
the two groups represented by the numerator now the denominator we really focused on the
standard deviation and that gives us an understanding of the variability within each of the two
groups the greater the variability within each of the two groups the greater the standard
deviation and the greater the standard deviation the greater the denominator and when the denominator
gets bigger the overall value gets smaller, the overall t value would get smaller and
think about that last part of the denominator that we were subtracting that contained r.
So think about when r is greater than zero. When r is greater than zero it is reducing
the overall standard error of the difference so it takes some mental gymnastics to see
these overall relationships but I hope you see this ratio where the numerator is about
variation between the groups and the denominator being variance within each of the groups and
you're getting an overall value showing that ratio now we need to check our work Excel
does have something called a data analysis tool available which contains a large assortment
of statistical analyses that are not commonly found in Excel such as the variety of t tests
and even the analysis of variance. But for some reason Microsoft in their infinite wisdom
decided to start charging people for that I actually have it available on an older version
of Excel but with my Macintosh 2011 version I don't have access to it but I do have access
to SPSS which is a very powerful and a standard statistical software package so I put in these
scores for the pretest and the posttest and here is what I got so here's the output from
SPSS and we'll take a look at this in more detail in just a moment but the main thing
to look at is right here... a t value of -3.928 Now, wait a second, that doesn't look right!
Let's go back to Excel Our value is -4.14. Now you might say, Lloyd that's pretty close,
let's call it even! Well, no, that's not good enough. As it turns out I know exactly what's
happening and you may recall my insistence that we use the first of the two standard
deviations of the pretest and also the posttest So let's take a look to see what's going on
there let's go and click on the standard deviation of the pretest that we used in our calculation
And let's go up and take a look at the formula and as you can see is the square root of the
squared deviations divided by N 23 is N or 10 and you may recall that there are some
times when you divide by N another times when you divide by N-1 when calculating a standard
deviation and to simply cut to the chase here we should have divided by N-1 so what I am
going to do and you should do this as well is I'm going to revise this formula to show
N-1 and it is very simple, we are just going to go right after the A23 and put in minus
one now we do need to put parentheses around A23-1 so let's do that. Let me put one parenthesis
at the end one parenthesis at the very beginning. It's very important OK, press return. Let's
do the same thing for the standard deviation for the posttest Go up here to the formula
put in minus one and then surround that with parentheses press return let's go back and
take a look at the formula again there was actually a clue in the formula and maybe some
of you saw it in that the standard deviation for both the pretest and the posttest was
represented with s hat and s hat is used to show that we are estimating a population's
variability when we are estimating a population's variability we need to divide by N-1. OK back
to Excel. Now that t value still is not correct because there is still one more thing to do.
I don't know if you noticed it but our r value decreased. It went from .66 down to .60. And
the reason is and I know this is going to be as clear as mud is that it needs to be
based on the other standard deviation namely the standard deviation based on a population
that is the one where we divide by N. and recall that the second standard deviation
that we have in our spreadsheets is one that we put in to check our work and the second
one is based on the built-in function that Excel has for computing a standard deviation
based on the population, which is based on dividing by N. and indeed if I go and click
on that and take a look up in the formula bar you'll see that we have that function
STDEV.P ... P standing for population and there's another one, .S, that is based on
the sample. And the .S variant would be based on having N-1. But again the r value needs
to be computed based on the standard deviation of a population. So we need to change our
z values so that they actually point to the second standard deviation but that's an easy
fix also. Let's go to the first z score in the pretest recall that r is based on the
z scores and if I go up to my formula I see that it is using the first standard deviation
which is in D26 and we need to make that based on the second deviation D27. So, I'm going
to change the 26 to a 27 and press return and I need to use the fill down trick to do
that for all of the other z scores and I'll come over now to the z score for the first
score in the posttest again go up to the formula. Again change it from I26 to I27 press return
use the fill down technique update all the z scores As you can see that puts our r back
to .66 and that gives us a t value of -3.93 let's go check that against the SPSS output
let's circle that t value again here indeed -3.928 rounded off to -3.93 We've done it!
And the last thing we need to do is compute the number of degrees of freedom so that we
can then check to see if our t value is statistically significant. Well here I am again, still in
the SPSS output and it really tells us all of those answers for example here we have
the degrees of freedom as being nine so you can see that computing the degrees of freedom
is going to be N-1 and here we have the significance of this particular t value ... .003 So it
is clearly below, much below the significance level of .05. So we obviously have a statistically
significant t value. Well let's go back to Excel and do this by hand and then we'll look
up the value and the degrees of freedom in a table for us to actually make sure that
this matches the SPSS output let's go to the cell below the t value and we'll type in DF
for degrees of freedom press return type an equal sign click on the first N, showing the
10 and then minus one and press return So degrees of freedom of 9 so we now have a t
value and we have degrees of freedom we can now go to a table and look up whether or not
that t value is significant Let's do it! Here is a table of the t distribution showing the
critical values I found this on the Internet and I'll share the link to it with you there's
lots of them on the Internet First, a little reminder what our value is -3.93 and by the
way the negative sign is inconsequential. We really just look at the absolute value.
The negative sign tells us the direction so going from the pretest to the posttest if
it increases during the posttest we would have a negative sign. If it would decrease
then it would be a positive sign but again here we just look for the critical value we're
also going to be looking in the category called a two tailed test there is our probability
level .05, so that's the column we're going to be looking at Our degrees of freedom are
nine so we look at the intersection and we see the critical value is 2.262 and the absolute
value of our t value is equal to or greater than that So, yes we have a statistically
significant difference between the pretest and the posttest so I have the grounds to
make the assertion statistically speaking that that training intervention or activity
had a significant impact or made a significant difference in the learning of those employees
let's go back to Excel to wrap things up before I do my final save and end this session I
think I'm going to do a little bit of mop up and maybe a little bit of adjusting to
my layout for example I see here in column I, I have some rounding I should still do,
so I'm going to click on that go up to format then cells then number and then two decimal
places And I see I had a little white area there so I'm going to go and choose that light
blue and, what the heck, let's go ahead and add a different color to both the column for
the Pearson r and for the t value. So I'll click on the column K and I'll just choose
very light orange click on column L now I'll choose a light green Alright! I like the way
that separates those two statistics from the rest of my calculations let's go up and choose
to do a final save.
Now I know what many of you must be saying... "Lloyd, I was with you all the way up the
point where you were congratulating us on computing the t value, only to tell us we
were wrong!" I know, I know, computing statistics can be like that. Just when you think you
are done, you check your work and find something is amiss. But, I'm actually glad we encountered
that extra bit of turbulence at the end because I think it helped to put into focus one of
the subtleties of statistics. Dividing by N or N-1 results in very small differences,
but these differences are important. Frankly, I'm a believer in the principle that less
learning occurs when everything goes too smoothly -- we really learn when we have a problem
...... and figure it out.
In the next part of the course, we move on to computing a t test for an independent-samples
design. However, this will be a statistical piece of cake because all we have to do is
two things: we will remove the part of the denominator that contains the Pearson r -- in
other words we will be revising the standard error of a difference for an independent-samples
design, and we will need to compute a different number for the degrees of freedom.
So, I promise a very smooth brief flight!
Until next time!