Tip:
Highlight text to annotate it
X
>> Well, hello, we're going
to spend a few minutes just
to talk about how you
interpret the SPSS output
for the chi square goodness
of fit test.
As you're aware the chi square
goodness of fit test
as applied when we're working
with nominal variables,
specifically we mean
that the possible values
for the variable are
categorical
and cannot be ranked.
When we're going
to use a chi squared goodness
of fit test there's some basic
requirements that we need
to make sure that we've met
in order for anything we find
to be of value to us.
Number 1, the sample had
to have been randomly drawn
from the population
and by this, what is meant is
that the sample is
representative
of the population.
That's going to be important
if we hope
to generalize our results
from the sample
to the entire population.
Number 2, the values
for the variable are
mutually exclusive.
By this I mean
that if our variable
for example is do you work,
the values
for that variable could be yes
and no and those two values
for the variable do you work,
yes and no,
they would fulfill this
requirement in terms
of being mutual exclusive,
that is either someone works
or they don't,
they couldn't do both.
So the values
for the variables must be
mutually exclusive.
Finally our 3rd requirement is
that the minimum expectation
of five occurrences
in each category
and let's take a moment
to look at that.
In terms of our variable do
you work and we'll actually be
looking at this particular
variable for this presentation
but for this variable do you
work the values were yes
and no and those values
represent categories
so we have a yes category
and a no category
and this third requirement
says that you have to expect
at least five people
for each category so if I
for example interviewed 10
people I would need
to expect 5 people
to say yes I work and 5 people
to say no.
If for some reason the
expectation
of how many people expect
to say yes or say no,
that is the number
of people expect
to pick each particular value,
if that should ever drop below
5 then the chi square goodness
of fit would not be
as appropriate perhaps
as another statistical test
and results may not be
as valid.
We'll talk a little bit more
about this as we go along.
Well, for this scenario let's
say we surveyed 20 students
about whether they work
and in terms
of our first requirement
that we do a chi square
goodness of fit test we'll set
the sample as randomly drawn
from the population.
That is our sample is
representative
of the population
and it would therefore be
appropriate to generalize
from the sample
to the population as a whole.
Our survey let's say had a
question on it, do you work
with the possible responses
yes and no.
In terms of the second
requirement for the goodness
of fit test these two
responses to the question do
you work, yes and no,
they are mutually exclusive,
that is someone would not
select both of them,
they would select one
or the other.
Then we have our null
hypothesis
and the null hypothesis
specifies the expected
frequency for each category.
So we interviewed 20 people
and our null hypothesis
that is what would be expected
in terms of the number
of people who would say yes
and number of people
who would say no.
Well, let's say and instructor
in terms of assigning homework
has an assumption
that well only half the
student work,
the other half don't
and so there's no reason
to go any lighter
on homework per se
because you know
at least half the students
aren't working.
That might then be the null
hypothesis
and of course the students
in the class might be saying
well hey a lot of us work
and if that's the case
that would be the research
hypothesis
so null hypothesis would be
assumption no only half
students work
and the research hypothesis
might be the students saying
oh no more than half
of us work,
certainly it's not only half
of us who work.
With the research hypothesis
we look at the observed
frequency,
that is we for example hand
out a survey
and then we observe what the
responses will be
so right now you see question
marks under the observed
frequency yes or no
because the survey hasn't
yet been handed out.
Okay here you're taking a look
a the chi square output
and of course
in the SPSS basics book it
will show you the step
by step process
to get this chi square output
but for now we're just going
to look and focus
on the chi square
output itself.
Looking at it you'll see
that there's two tables the
top table shows us our
descriptive statistics,
that is it helps us
to describe the results
for out particular sample
and you can see
that 16 people said
that they work
and only four people said
that they do not work.
So our null hypothesis was
that it was going
to be a 50/50, 10 saying yes,
10 saying no
and what we observe certainly
did not match
up with what was expected
based upon the
null hypothesis.
The bottom table is the
inferential statistics,
that is well this difference
that we observe
between the observed
and the expected,
this difference
that we've found should we
generalize from our sample
to the entire population,
that's what the inferential
statistics is all about,
you know perhaps what we
observed was just due
to chance so we'll also take a
look at this bottom table
as well.
Okay so focusing
on the top table the
descriptive statistics you'll
see that there's three
columns, they're labeled,
the observed number,
the expected number
and the residual.
The observed number
as was mentioned tells us what
we observed based upon our
data collection,
16 people saying yes
and 4 people saying no
to the question do you work.
The expected end is what we
expected based upon the null
hypothesis,
as you recall the null
hypothesis was
that only half the people
would be working so out
of 20 half would be 10
who should say yes
and the other half,
the other 10 would say no.
Finally the last column is a
residual and that's the
difference
between the observed
and the expected so in terms
of the people who said yes,
we observed 16 only 10 were
expected to say yes,
6 more said yes
than had been expected.
In terms of the now response
we observed 4 people saying
no, 10 people were expected
to say no,
we saw 6 less people
than expected saying no.
So the residual is how big
of a difference was there
between what we observed
and what was expected based
upon the null hypothesis?
The larger the residual the
more confident we can be
that this is a real difference
and not just some chance
fluctuation in terms
of we happened
to just pick a sample
where more people work
than normal
or less people worked
than normal
so we want a big residual
if you want
to reject our null hypothesis.
Well we got to stop
for a moment and we got
to consider
that third requirement
to do a chi square test.
As you recall
that first requirement was
random sampling
to achieve the goal
of a representative sample
to allow us to generalize
to the population.
The second requirement was
at the values
for our variable,
our variable is do you work,
our values were yes and no
but those guides would be
mutually exclusive,
and the third requirement was
that expected value
for each cell,
that is each category has
to at least be 5,
again if it isn't 5 well
that can make our results less
of value in terms
of making sense of the data.
Taking a look
at our top table we can see
that the expected number
of people saying yes
and expected number
of people saying no is 10
in both cases,
looking at the bottom table
at foot note A it says 0 cells
have expected frequencies less
that 5, that's good.
We...that means we're meeting
that third requirement,
in fact the minimum expected
cell frequency is 10
so it says you're clear
to move forward.
Okay so looking
at the bottom table more
closely you'll see
that there's three rows,
there's the chi square row,
which is the value
of your chi square analysis,
there's the DF for degrees
of freedom
and then there's a
significance
and that's our P value.
That top row chi square
and notice they have a fancy
looking X squared
and the value
for the chi square is 7.2,
the larger
that value the more likely
we'll be able
to reject the null hypothesis.
F four degrees
of freedom it's equal to 1
and our degrees
of freedom is calculated
as a number
of categories minus 1.
Well how many categories did
we have, we had well people
could respond either yes
or no, so we had two
categories,
so number of categories minus
1, we had two categories,
two minus 1, that's 1,
our degrees of freedom is 1
and SPSS tells us that.
Finally the A's
of significance that's our P
value, that is,
okay so we observed a
difference
between the expected
and the actual observed
frequencies, well probably
that just could be due
to chance.
Well, if there...if
in the population
of all students half worked
and half did not
and if we randomly sampled 20
students from that population
the probability
of us getting 16 saying yes I
work and 4 people saying no I
don't just due
to chance is .007,
less than 1 percent,
a very small probability
of this current due to chance.
So if the null hypothesis was
right that really
in the whole population 50
percent people work
and 50 percent people don't,
you could get this sample
with this large difference
but it would only happen due
to chance .007 of the time.
Importantly the probability
of this happen due
to chance .007 is less than
or equal to our alpha level
of .05, so we're going
to be able
to reject the null hypothesis.
So the A significance that's
our P value and we want it
to be less then .05.
At the bottom
of the screen you'll see chi
square 1 equals 7.20 P less
than equal to .05
and this information is
written in it's exact format,
this is our APA format
for how we would write
up the result and it is taken
from that bottom table giving
us our inferential statistics.
We have chi square
in parentheses we put the
degrees of freedom which is 1,
we say what it's equal to,
that's where we put
in the actual chi square value
and that was 7.20.
In terms of probability were
either going to say P less
than equal to .05
because .05 is our alpha level
and if P is less than or equal
to .05 we get
to reject the null
which is a wonderful thing
because that would only mean
you get to publish your
results in this particular
scenario it means
that the students could tell
their instructor hey way more
than half of us work,
please be kinder
on the homework.
The instructor might say well
hey this homework is very
necessary for you
to learn the material
but some negotiation could
perhaps take place.
If the probability is greater
than .05 well then the
probability is happen due
to chance is greater
than the .05
and so we're not willing
to say that it's a real thing
and in that case we would
retain the null
so probability is less
than equal to .05 we reject,
we say probability is
happening due to chance is
so small I'm not going
to go along
with that possibility instead
I'll say something real is
taking place.
If probability is greater
than .05 then we'll retain
that null hypothesis.
Okay and then here you can see
how someone might write
up the results
of what we've been looking
at in terms of the SPSS output
so we could say we sampled 20
students, all right we're
letting our reader know how
many people were in the survey
and that's going
to be important
because we're dealing
with frequencies here,
frequencies of people
who said yes and the frequency
of people who have said no.
So we sampled 20 students
and evaluated whether the
number of students who work
and there we're letting our
reader know how people work,
F because that frequency
and notice
that it's italicized is equal
to 16, was equal to the number
of students who did not work
and again we have F italicized
is equal to 4
so our first sentence says we
sampled 20 students
and the value number
of the students
who work F equal 16 was equal
to the number of students
who do not work,
F was equal to 4.
That tells us what we
were evaluating.
Whether there was the same
number of people who work
as who do not work.
We also told our reader what
we actually observed,
16 work, 4 do not.
The data was analyzed using a
chi square goodness
of fit test,
all right we're letting our
reader know what statistical
tests, what inferential
statistical that is
that we used
to analyze our data.
The null hypothesis was
rejected and here we give our
reader the inferential
statistical test results,
chi square,
1 degree of freedom that's put
in parentheses is equal to
and then the chi square value
7.20 comma and then
that P was equal to less than
or equal to .05 saying hey the
probability
of this was less then alpha
so again we get
to reject the null.
Finally our last sentence just
puts it out there
in everyday understandable
English, more
than half the students
at the college also work.
Okay hope that this was
helpful and again this is a
presentation focusing
on how do you interpret the
SPSS output for chi square
and then write it
up with the APA style.