Tip:
Highlight text to annotate it
X
Hypothesis testing
Hypothesis testing is a key procedure in inferential statistics
It is based on the idea that we can tell things about the population
from a sample taken from it
Hypothesis testing can be explained in five steps
hypotheses
significance
sample
p value
decide
Step 1: Hypotheses
Decide on your hypotheses
You need a null hypothesis : H nought
and an alternative hypothesis: H 1 or H a
Inferential statistics is based on the premise that you cannot prove something
to be true but you can disprove something by finding an exception
You decide what you're trying to provide evidence for ... which is the alternative hypothesis
Then you set up the opposite as the null hypothesis and find
evidence to disprove that.
it can be said that the alternative hypothesis is usually a thing we're
trying to prove or find out about,
while the null hypothesis is the opposite or status quo.
Note: 1. The hypotheses always about the population parameters, not the
sample values or statistics
Two: the null hypothesis usually refers to the status quo - the thing we're
trying to find evidence against.
it generally represents no effect.
And three, the null hypothesis should include a statement of equality and the
alternative should not.
Step 2: Significance Decide on the level of significance
unless there is a good reason not to, people generally use 0.05 as the
significance level.
also known as the alpha value.
The significance level is the probability that you will say that the
null hypothesis is wrong
when really it is correct.
This is known as a type 1 error.
Step three
Take a sample from a population to provide the statistics you need.
Step four
Calculate the p-value.
This is almost always done by a computer package
Step five
Use the p-value to decide whether to reject the null hypothesis
if the p-value is less than significance level you chose earlier,
you will reject the null hypothesis. The sample has given you evidence that the null
hypothesis is wrong.
We will now go through the step-by-step with an example.
Helen sells Choco-nutties
Her brother, Luke
is taking a marketing class and tells her that people will buy more choconutties
from her if
she gives away a free gift with each packet
Helen is skeptical so decides to gather data to see if it's true.
She decides that for the next month she will try out some days where she offers a free
sticker with each packet of choconutties
and some when she doesn't.
From the month of trials she will have a sample that can be used to draw
conclusions about days in the future.
The population in this instance is all days of selling choconutties.
The sample is the days that occur in the next month.
This is not a random sample but it is all we can do in this instance.
To make the assignment of sticker or no sticker random, Helen will toss a coin each morning,
and if it is heads she will offer a free sticker.
She will keep track of her sales for each day
She asks you to help her do the analysis.
The first step is to decide on the hypotheses.
There are two different circumstances sometimes known as treatments.
Offering a sticker and not offering.
The null hypothesis is that there is no difference in the sales for the two treatments.
the statistic of interest is the mean or average value of daily sales.
The null hypothesis can be written H 0
The alternative hypothesis is written:
Helen thinks the sales could go up or down as a result of offering a free sticker.
Written in mathematical terminology we use myu, a greek letter to represent the
population mean.
These are the mean sales for all days that choconutties are sold.
These are different from the sample means which are the values we calculate from
our data.
The subscript free sticker, or no sticker indicates whether we're talking about
the population mean for the days when the free sticker is offered
or when it is not.
Written mathematically it looks like this:
which rearranges to
Note that there is an equals sign in the null hypothesis.
Similarly the alternative hypothesis looks like this
We do not know what the values of the population means are but we will use
information from the sample to get sample means
which was then help us make inferences about the population means and the
difference between them.
There is a not-equal-to sign in the alternative hypothesis.
This means we are interested in differences from zero in both directions.
This is called a two-tailed test or exploratory hypothesis.
If Helen was sure that the sales would not go down,
and was only interested in whether they went up or stayed the same,
the hypotheses would look like this:
Written in mathematical terminology the hypotheses look like this
This can be rearranged to this if Helen is looking for evidence that sales
will increase due to the free sticker.
This is called a one-tailed test or directional hypothesis
It has a greater-than sign.
We will stick to the two-tailed test for now.
Step Two
Decide on the level of significance
We choose alpha = 0.05
Step 3: Helen goes ahead with her plan and provides 23 days of sales
figures, 13 of which were
for days in which she offers free stickers
and ten when she did not
We put the figures in the spreadsheet and use Excel to draw histograms of
the data
and calculate the appropriate p-value
The process for calculating the p- value
is shown on a separate video called Two means t-test in Excel.
The results look like this:
The mean sales for the free sticker days is $301.92.
While the mean sales for the no-free-sticker days is $265.83
Step 4.
You see that the p-value for the two-tailed test
is given as 0.01998 which rounds to 0.02.
Step 5.
The calculated p-value of 0.02 is less than the level of significance
which is 0.05,
so you reject the null hypothesis.
Helen would like to know what that means so you explain that the data she has collected
indicates that there is a difference in mean sales depending on whether she
offered a free sticker or not.
If she had chosen a one-tailed test
then the p-value for that
would be point 0.00999 or rounded to 0.01.
The hypotheses are made about the population.
We collect sample data to draw an inference about the population.
We know whether there is a difference between the sample means.
We use information about the samples to decide
using the p-value
whether there is evidence to say that there's a difference between the
population means.
You might like to watch understanding the p-value to help your understanding
OR two-means test in Excel to learn how to go about doing this test
using the Excel analysis tool kit.