Hypothesis tests, p-value - Statistics Help

Hypothesis testing Hypothesis testing is a key procedure in inferential statistics It is based on the idea that we can tell things about the population from a sample taken from it Hypothesis testing can be explained in five steps hypotheses significance sample p value decide Step 1: Hypotheses Decide on your hypotheses You need a null hypothesis : H nought and an alternative hypothesis: H 1 or H a Inferential statistics is based on the premise that you cannot prove something to be true but you can disprove something by finding an exception You decide what you're trying to provide evidence for ... which is the alternative hypothesis Then you set up the opposite as the null hypothesis and find evidence to disprove that. it can be said that the alternative hypothesis is usually a thing we're trying to prove or find out about, while the null hypothesis is the opposite or status quo. Note: 1. The hypotheses always about the population parameters, not the sample values or statistics Two: the null hypothesis usually refers to the status quo - the thing we're trying to find evidence against. it generally represents no effect. And three, the null hypothesis should include a statement of equality and the alternative should not. Step 2: Significance Decide on the level of significance unless there is a good reason not to, people generally use 0.05 as the significance level. also known as the alpha value. The significance level is the probability that you will say that the null hypothesis is wrong when really it is correct. This is known as a type 1 error. Step three Take a sample from a population to provide the statistics you need. Step four Calculate the p-value. This is almost always done by a computer package Step five Use the p-value to decide whether to reject the null hypothesis if the p-value is less than significance level you chose earlier, you will reject the null hypothesis. The sample has given you evidence that the null hypothesis is wrong. We will now go through the step-by-step with an example. Helen sells Choco-nutties Her brother, Luke is taking a marketing class and tells her that people will buy more choconutties from her if she gives away a free gift with each packet Helen is skeptical so decides to gather data to see if it's true. She decides that for the next month she will try out some days where she offers a free sticker with each packet of choconutties and some when she doesn't. From the month of trials she will have a sample that can be used to draw conclusions about days in the future. The population in this instance is all days of selling choconutties. The sample is the days that occur in the next month. This is not a random sample but it is all we can do in this instance. To make the assignment of sticker or no sticker random, Helen will toss a coin each morning, and if it is heads she will offer a free sticker. She will keep track of her sales for each day She asks you to help her do the analysis. The first step is to decide on the hypotheses. There are two different circumstances sometimes known as treatments. Offering a sticker and not offering. The null hypothesis is that there is no difference in the sales for the two treatments. the statistic of interest is the mean or average value of daily sales. The null hypothesis can be written H 0 The alternative hypothesis is written: Helen thinks the sales could go up or down as a result of offering a free sticker. Written in mathematical terminology we use myu, a greek letter to represent the population mean. These are the mean sales for all days that choconutties are sold. These are different from the sample means which are the values we calculate from our data. The subscript free sticker, or no sticker indicates whether we're talking about the population mean for the days when the free sticker is offered or when it is not. Written mathematically it looks like this: which rearranges to Note that there is an equals sign in the null hypothesis. Similarly the alternative hypothesis looks like this We do not know what the values of the population means are but we will use information from the sample to get sample means which was then help us make inferences about the population means and the difference between them. There is a not-equal-to sign in the alternative hypothesis. This means we are interested in differences from zero in both directions. This is called a two-tailed test or exploratory hypothesis. If Helen was sure that the sales would not go down, and was only interested in whether they went up or stayed the same, the hypotheses would look like this: Written in mathematical terminology the hypotheses look like this This can be rearranged to this if Helen is looking for evidence that sales will increase due to the free sticker. This is called a one-tailed test or directional hypothesis It has a greater-than sign. We will stick to the two-tailed test for now. Step Two Decide on the level of significance We choose alpha = 0.05 Step 3: Helen goes ahead with her plan and provides 23 days of sales figures, 13 of which were for days in which she offers free stickers and ten when she did not We put the figures in the spreadsheet and use Excel to draw histograms of the data and calculate the appropriate p-value The process for calculating the p- value is shown on a separate video called Two means t-test in Excel. The results look like this: The mean sales for the free sticker days is $301.92. While the mean sales for the no-free-sticker days is $265.83 Step 4. You see that the p-value for the two-tailed test is given as 0.01998 which rounds to 0.02. Step 5. The calculated p-value of 0.02 is less than the level of significance which is 0.05, so you reject the null hypothesis. Helen would like to know what that means so you explain that the data she has collected indicates that there is a difference in mean sales depending on whether she offered a free sticker or not. If she had chosen a one-tailed test then the p-value for that would be point 0.00999 or rounded to 0.01. The hypotheses are made about the population. We collect sample data to draw an inference about the population. We know whether there is a difference between the sample means. We use information about the samples to decide using the p-value whether there is evidence to say that there's a difference between the population means. You might like to watch understanding the p-value to help your understanding OR two-means test in Excel to learn how to go about doing this test using the Excel analysis tool kit.