Tip:
Highlight text to annotate it
X
Okay let's take a look at some of the ideas coming out of chapter 8.
This is the hypothesis testing, I can't even write
let's fix that, hypothesis
testing.
So the idea is that we start with an assumption, right. So we start with
this assumption
that if we had a particular population,
and let's take a sample or a subgroup this population,
that equally represents the whole
group. So in other words, we have a sample that's a random sample,
and if we apply a treatment to this sample, so we
manipulated in some way. We start with the assumption that the treatment has no
effect.
Okay, so we start with an assumption
the treatment
has no effect, okay.
What we wanna do is
find enough evidence that says that that is not
likely. So in other words, we wanna find enough
evidence that we can reject that first assumption and that evidence we would
we would come up with would be the observation of an extremely unlikely
event.
Okay, so we want
to find enough evidence
to suggest, otherwise
our assumption
is incorrect.
Okay and like I said, that that enough evidence
idea is to find, when you actually have a sample you
applied to treatment to the sample you get a value
usually the sample mean, let's say or some times some other statistic, but you find a
value
and you say okay what's likelihood of observing this value
or something more extreme and if the likelihood of observing that value is
very small,
in other words it's a very unlikely event, then that is enough
evidence well we set a standard but typically we say okay this is enough
evidence to say that our first initial assumption
is not correct. Okay, so
just to kind of give some background on this
the idea in in mathematics is to do a proof by contradiction to say okay well
let's assume this thing is true and then show that it's not true and that's proof
that something else is true.
Okay and that's when you have the assumption that it's either yes or no.
Okay, you're either inside of you're outside. So if I can
established that you are not outside that means we're definitely inside.
Okay, that's the basic premise but with statistics we don't necessarily have a
yes no, such a clean
you know, clearly defined boundary for
um a treatment working or not working. And measuring
the effect or things like that and so I wanna go through an example in in a
moment,
and kind of talk about those ideas and give you a
good sense of what the statistics statistics is telling us
but before then I wanna kind of outline the hypothesis test start to
finish and
give you a sense of some of the mechanics of it, okay.
So there's two ways to go about doing a hypothesis test. So there's two ways
that's an s,
to test. The first way would be to
compare critical values with test values, so let me just write it and then I'll explain it
compare
critical values
with test values. The critical values
are gonna be based on a set significance level,
so these right here gonna be based
on set significance level.
so in other words we set how, mmm
how unlikely we're gonna allow something to be before we say that that is really
unlikely.
Okay so this is the significance level is the alpha
and its its maybe it's five percent,let's say as an example. Switch colors so you know
it's an example. If it's
.05 we're saying that that if we could show that the
the probability of observing this test, the test I did, the
sample mean is less than five percent
then that is what we're gonna call unlikely.
In other words, it's it's significant. What we could do then is
translate that alpha level .05 into a z-score,
and say in other words, what z-score corresponds to that
tail area 5 percent and then what we would do is compare our
x bar that we get, let's our test value, we would compare it with the
um we would compute a z-score for a test value compare that with
the z-score the critical value. If we find that our z-score,
so maybe I'll write it like this, if we find that
our z-score from the test is less then
or equal to the critical z
then we do nothing,
but if we could show that
our z-score from the test was more extreme
then the critical value, then we would, what we would say is
we would reject the initial assumption.
I'm gonna change up the language in just a moment
after a highlight some ideas okay, that's the idea. So what I'm
suggesting maybe this isn't so practical in terms of computation, maybe it's a
little bit too
finagling, but I just wanna highlight this because this is a viable way for a lot
of fields to to compare since,
um if you have a, if you rely on a significance level.
I'm gonna kinda pitch this idea that maybe we don't worry so much about the
significance level and just
compute the probability of observing this test score in which case then we would
do the second method.
Okay, so if we had a
observation from a sample, the second way we could test it is we we compare,
so the second way is that we we compare what we call the p-value
and the Alpha, the level of significance, sorry the significance level,
significance level. So in other words if, and I'll write this is a different color, if our
p-value that we calculate
is less than the alpha that we've set, in other words, if we can show
that the probability of observing our test score,
this the p-value is the probability of observing our test score or something more
extreme.
If we could show that probability is less than the significance level
than that is enough information for us to go ahead and say that we can
reject our initial assumption.
Alright, um what we're trying to say is that the probability of observing our
test score is
is less then the minimum probability for something being quote unquote
unlikely, alright. So
let me back up a little bit, I've already mentioned the significance level but the
p-value
is just the probability
of observing
on our test score
or something more extreme.
Okay, um let me back up even further when we have a hypothesis test,
we always start with an initial assumption at and
to give this a kind of consistent
reading, I'm gonna call this the null hypothesis, null referencing 0 or the
ground level.
So the null hypothesis, I'm gonna write the words and then show the symbols. The
normal hypothesis
is expressed as this
H sub-zero, so subscript 0,
and the null hypothesis is in a nutshell, saying that we assume
the two treatment had no effect. Okay, so this is our assumption that
that the treatment
had no effect.
Okay, so typically we write this as the mean is equal to
whatever that means should be. Okay, and I'll kind of highlight that with an example, okay.
We can have a different hypothesis if we weren't looking at means, maybe we were
looking at proportions, in which case this this part of the proportion,
this part of the hypothesis would change but it's still the basic
statement that whatever we we've manipulated that manipulation had no
effect
on our statistics. In other words, the treatment for
the drug, you know, the if the treatment of the drug, the drug did not influence
anything in our population, okay. We always compare this,
I realize I didn't finish the word hypothesis,
okay. We always have to have a pre-planned alternative hypothesis, so in
other words,
before we go and collect data we have to have a sense of what we want to see
or what we um, what we observe the data suggesting we should see,
okay. For every hypothesis test, in order to be
um objective, you must collect new data or
or reprocess the data from a new stand point
and so the alternative hypothesis, alternative
hypothesis, is our pre-planned expectations. So we
we write this with an H with the subscript one
some references, some books use a subscript A,
the reference alternative and and I put a one here to be consistent but it's
also nice to
to recognize you can have more than one alternative. You know in other words, the
the drug might actually influence different, different people in a
different way and so we need to be
able to to run those kinds of analyses, but at this point we're just gonna have two
hypothesizes
so that the treatment had no effect or that the treatment had a
an effect in a specified direction. This is what I mean, it's pre-planned so
the most generic form of alternative hypothesis is that the mean
value is not actually equal to the initial assumed values. So
maybe I should highlight that, this is the assumed
value for our mean,
okay. This is called a two tail test
and I'll
emphasize that again in just a moment. This two tail test
is the basic test, it allows for
um how should I say, it's the most conservative test.
So it's a it's a conservative test,
meaning that the likelihood of um rejecting something falsely is
is much much lower, um but what happens is that we we fail to reject
the null hypothesis more frequently with the two tail test
and that's referencing the power. If we do a one tail or directional test,
we increase the power our test and probably,
I should say hopefully, get better results.
Okay, the alternative hypothesis usually is a two tail test unless you have
definite evidence to suggest you should do a direction or one tail tail test. Sometimes
we do, if our initial hypothesis is that the mean value is equal to some
specified assumption value,
okay this has to be known or given.
Okay and we might
actually say that we think the mean should be greater than
or maybe we we mean
we we think the me should be less than the hypothesized value. So these two
are
one tail or sometimes we call them directional tests.
Okay and they allow us to do a stronger comparison
okay and and I want to highlight the example um number fifteen coming off
page 243 and I wanna look at that
in a moment but the idea here is to say that we have choices and we need to make
the choice before we collect the data, otherwise
I'm gonna use the word cheating it's it's like it's like having a dataset and analyzing
in a way that gives you the result that you think you can get from it. Instead of
saying I need to establish this result, now let's go collect data and see if we
can.
I don't know if that's distinction that everybody
follows um but it's like knowing the end result and saying yeah, yeah, yeah that's what I
meant, instead of saying this is what I'm gonna say and this is what I wanna show,
and then actually establishing it. There's a lot more variability if you have
um you know research suggesting one idea
and then you find data that then, you know, kinda as a second source
supports your idea, alright. So this is the alternative hypothesis, this pre-planned
hypothesis,
um it is imperative that we
we remembered that a one tail test gives us different results than a
two tail test, okay, and that's coming down to the computation.
Now wanna make a comment about error.
So there's there's a potential for always making a mistake and and
statistics and hypothesis test directly
and so the idea is that we have two types of error
and I just want to focus on type 1 error, type
one error. This is the error that we want to minimize
the most and it goes with the idea of failing to reject their, sorry falsely
rejecting
the null hypothesis when it's true. So
we want to minimize this false rejection
when it's really true
and the way we we do that is we set a significance level.
We don't always have to but the significance level alpha
is precisely the amount of type 1 error we're willing to have.
Type 1 error that is acceptable,
acceptable,
it's the end of the day, alright. So the idea with this is saying okay if I if
our significant level is 5 percent that means we're we're
allowing ourselves a 5 percent chance have falsely rejecting the null
hypothesis.
Okay so when we establish a p-value, let's say our value is 2 percent,
what we're saying is that our p value 2 percent, the likely to observing the
value that we did see
occurs 2 percent of the time or less
and that is a a less risky thing
than the set significance level of five percent. So what we're saying is that
based on our data
we're actually making type 1 error less than five percent of the time
therefore, we can say with more assurance that
the null hypothesis is not true. Okay in other words, we
we reject the null hypothesis. We don't know for a fact exactly what mu
should be but we just know that it's not whatever it was given as.
Okay so just clarify a bit with the significance level
we have z-scores that correspond to significance level
and so for the two tail, if we have an alpha of
5 percent we would have a z of
plus or minus 1.96. So that's the two tests, in other words
the picture is, here's our center at 0,
here's negative 1.96, here's positive 1.96.
In between those two tales we have five percent of the observations. So in other words,
in here we have
two and a half percent and then here we have two and a half percent of the area.
Okay so between the two tails we add up to five percent.
Now if we had an alpha of .01
our z-scores would be a plus or minus 2.58.
If we had an alpha that was really small .001,
our z-scores would be equal to plus or minus
3.30, by the way I'm pulling this from
figure
8.5, I gotta remember
and I didn't write down the page number, sorry. Figure 8.5 has all this
information listed.
So we're saying that these z-scores would would create the boundary line such that
the two tails would
collectively add up to the right significance level. Now if we had a
single tail,
let's say, so for one tail,
the picture is going to be that
single tail and so if we had an alpha level of
2.05. Our picture, we're saying what's the z-score that corresponds, and I'm just
gonna do the upper tail
by the way. You could do the lower tail so this here,
we're saying all of that the probability is tech
is secured in that tail, so five percent. So the z-score that corresponds with
that
would be, well depending on if it's upper or lower tail, it's
1.645. By the way, I pulled these from the table in the book,
in the back of the book. This is table,
Appendix B, statistical tables.
This is table B1,
okay. Now if we had an alpha of
.01 the z-scores corresponding to that would be plus or minus
2.33. So again,
if you had, this would be 2.3 sorry,
this would be 1.645.
If you had a new picture, so depending on the tail that you want
0, here's my upper bound, this is 2.33,
negative 2.33. So a one tail test,
if for example, my null hype hypothesis was that I think they mean should be
that the mean is less than or equal to some hypothesize value and my alternative
is that the mean is actually greater than the hypothesize valued and I'm looking
at this one tail test up here,
and I think it's nice to note that the corner of the inequality points at
the tail. Okay and what we're saying is that in here we have five percent, okay now
if we had a different null hypothesis and that hypothesis was that the mean should
be less than
some value then looking at that one tail that's below, and again in here we should
see 5 percent, okay.
Um the idea that we have, is to say depending on the test,
our p-value is either going to be the localized
tail or maybe it's the sum of the two tail. So let's look at an example,
um I think the p-value is very straightforward to compute. So I'm going to do
it,
two tail test for number 15 and then I wanna show the one tail test.
Okay, so let's see let me get a good color, look
on page 243 and this is number
15, alright. So
the idea here, you can read through this problem, researchers have noted that
that as people age their cognitive functioning ability
decreases so if we can introduce some
ingredient into their food or
just given a medicine, we can potentially and stop that decline in their cognitive
functioning. So
where they did is they took a sample of, let me switch colors, they took a
sample of sixteen elderly people and administered some,
a this what is it, anti antioxidant concoction
and they found that when they gave them this
test have cognitive ability, the average score
was 50.2.
Okay, so this was after the treatment was given.
Okay, so sample of sixteen the given this test
and after the treatment, the test average was 50.2.
Now they knew from previous
information that in the population
the average score on the same exact
test, the average score was 45
with the standard deviation of 9. Okay so they knew this information, this was known
okay. So what they said is, okay well if if in our sample of size 16 we
observed a mean
of 50.2, how likely is that, okay?
So in other words they want to come up with the p-value,
which tells you the likelihood observing
a mean score greater than or equal to 50.2
in a sample of 16
from the population, um
population thats distributed normally with
mean of 45 and standardization 9, okay.
Blah blah blah we want to find a probability, like how likely is it that
we would see this
be our score and so let's just compute and that they give us a significance level and
I'm gonna ignore that for a second.
Let's look at the two tail test, the reason why I'm looking at the two tail
test is I wanna go in without some meaning that says that
the people should do better or worse. Mainly because
I wanna know for either way.
In other words maybe our intervention made it works.
Maybe the people thought, oh wow I'm gonna take this drug then and take a test
I better do better and then the stress made them to worse. So
just to kind of block out and not have any sort of lurking variable influencing
we're gonna do it two tail test. So in other words our assumption
is that the mean is
equal to 45 and this is
the known value. Okay we're gonna run it against the alternative and this is the
two tail test, we're gonna run against the alternative that the mean is actually
not
45 after
the treatment, okay.
In other words I'm not saying that its greater than 45,
I'm saying that it's just not 45. So it could be greater than
or could be less than, okay, and of course our data suggests greater than in so we
might them wanna,
the temptation is to put the the directional tail and in fact we want to
use the
mu is greater than 45 but that's using the data
after the fact. Okay so before we went into this if we wanted to we could say
you know what we really think based on some other evidence that the
antioxidant is gonna increase the scores and so we could go in
with that assumption already established and then test it with the data we got
after we've already created the assumption. Okay so the two tail test,
we would come up with this probability, we would say that the probability of
observing the mean,
okay, greater than
or equal to 50.2.
Okay, we wanna come up with that. That's going to be our p-value for the one tail.
What about in the other direction and I'm not gonna list it out just yet, let
me,
let me clear this. To find the two tail test, we're gonna double whatever we get
here. So for ours
for the two tail tests
the the p-value is gonna be twice the p-value
from the one tail, okay.
It's kind of an interesting thing and it's not helpful to write it
out but I just want you to know. So in other words, extreme and
the ideas how far away is the 50.2 from the mean
and then maybe we go below that same amount. What's the likelihood of going
being that far in general. So let's compute the z-score in then it'll be easier
for me to
convey the two tail tests so the z-score would be 50.2 minus the mean of
45
divided by the standard error, right,this is a sample mean drawn from a size,
sample size of 16, and so this is going to be the standard deviation divided by
the square root of 16.
you do the computation you come up with the z-score
2.311, what does that tell you?
It tells you that 5.2
is 2.311 standard deviations
away from 45 and above that.
So in other words to do the two tail p-value, we want to look above and below
that distance. So what's the probability
that we have a z that is less than or equal to -2.311
plus the probability that a z is greater than or equal to positive 2.311
and that would be the p-value for the two tail
test, okay.
So it's nice that these are actually the same and so that's what I'm referencing
up here, which is double the p-value we get. So
what is that the p-value, what's the probability of observing
a 50.2 when the mean is 45? So the probability
of Z being greater than or equal to 2.311, if you look this up in Table
B1 you find that that probability is .0104.
Okay notice this would-be
it, that would be the p-value for the one- tail let's look at it for the two-tail
and so the p-value we actually want,
so the p-value for
two-tail
test
would be twice that, so point p is equal to
.0208. In other words
just barely over 2.8 percent of the time,
just barely two percent of the time we would observe a value as extremist
50.2.
That is very unlikely, you know, two percent a time if that's not that's
that's a very low percent of the time but you know what in application you
need to establish what unlikely means for your field. So in other words,
sometimes in business
5 percent is fine, sometimes in in business 10 percent is fine. In other
words if our p-value is less than 10 percent were totally okay with that
but in manufacturing, so think like
really detailed manufacturing, a 2 percent error
really a lot. I mean you have to be very precise and so the margin
of error that we'd be willing to accept would be much much lower and so our
significant level might be a .0001.
Okay, that was kind of a lot
.0001, so one ten thousands.
So other words if if it's like a a clean room or something in
that saying that you would allow one particle of dust
for every 10,000 particles air, something like that.
Okay, so that's a very low value and so depending on the context and nursing
on you might just stick with the standard five percent but I think it's
better to just report the p-value and let somebody interpret your results.
Okay so our interpretation and just based on the p-value
we would say, switch the page here,
having a p-value of
.0208
is enough evidence
to reject the assumption that Ho is true.
So in other words, we are accepting,
well I'm not even gonna say that, we're rejecting Ho in favor of
the alternative. In other words the mean is not
45, what we will need to do now, okay so next step
in research would be to repeat this,
repeat the
test. I don't mean test, I should say repeat the experiment,
I don't know if I'm spelling that right. It doesn't seem right,
expirement, experiment, whatever you can
figure it out. Repeat the experiment and and use the specified alternative,
that the mean should be greater than 45.
Okay so real quick,
there's another idea coming out of chapter 8. That's the idea of Cohen's
d, the to measure effect size,
effect size, so Cohen's d
which measures effect size in
absolute magnitude. In other words, we're not going to scale up by the standard
error.
We're going to scale the difference between the means, just using standard
deviation okay. So
this is am disregarding sample size.
Okay,
it's very straightforward computation, d is equal to the difference in the means
divided by standard deviation.
Okay so for our example, number fifteen that we're just finishing
the the d value, the effect size, would be the difference in the means 50.2
minus 45
divided by the sample, uh the stand deviation.
Okay I didn't do the competition ahead a time, I should have
sorry, 50.2 minus 5 so the effect
is .57 with a bunch of 7's, let's just round it to 57,
okay, alright.
What I wanna do now is look at chapter 9.