Mean And Standard Deviation

Minitab is the leading global provider of software and services for quality improvement and statistics education. Quality. Analysis. Results. For more information visit Minitab.com This podcast is available at KeithBower.com Hello, today I’m going to talk about the mean and the standard deviation: What they are and why we use them in practice. Well, the population mean, “mu” – the 12th letter of the Greek alphabet gives us an idea of the central tendency. We can estimate mu by using the sample mean, xbar. So in essence we take a smaller, representative sample from this population, add up the individual values and then divide it by the sample size. To get an idea of how spread out the data are, the true population standard deviation, sigma – the 18th letter of the Greek alphabet - can be estimated by the sample standard deviation, which we quite frequently call “S”. So to compute S we take the sum of the squared deviations from the mean, divide it by n-1 and then take the square root of all that (otherwise it would be the sample variance that you’d have computed.) It gives us an idea of how spread out these data are, on average. Of course, when I say sigma here – the true population standard deviation… for those of you who might be going through the Six Sigma® quality improvement initiative, don’t confuse the 2 terms – sigma in your world is related to the capability of a process. Sigma, in this language, is dealing with how spread out the data are for the true population itself. I’ll refer you to my website for more information on calculating capability indices. Now, if you’ve got the mean and the standard deviation, and you know that the population that you’re sampling from is [well modeled by] a Normal distribution, then the great thing is that the mean and the standard deviation (we call them the joint sufficient statistics) – I only need to know the mean and standard deviation and I can completely characterize this Normal distribution. So, let’s say if you tell me that you’ve got a mean of 62 and a standard deviation of 26, then I could look at a particular point on this X-axis, and work out the probability of a value falling above it, or below it, or whatever we may be interested in. So regardless of the amount of data that you’ve collected, you would only need to know the mean and the standard deviation. With that in mind, if you do have a Normal distribution, and you go plus/minus 1 standard deviation, then approximately 68% of all the values would fall inside there. Go plus/minus 2 standard deviations, it’s about 95% of all the data, plus/minus 3 standard deviations, it’s about 99.7%. So you can remember that for a Normal distribution: 68 / 95 / 99.7 Frequently I find that people – when they want to get an idea of where the data should be falling, and if the assumption of Normality is OK, then they might use the mean and then go plus/minus “x” standard deviations, like 2 standard deviations, 3 standard deviations, whatever it may be. I tell you, I don’t like that approach because it doesn’t use the amount of data that were collected in the computation – it’s just using the mean and standard deviation. What I think is a better method would be the use of a statistical tolerance interval. So for example you could say you are 95% confident that 99% of the data would fall inside this particular region. For more information on that I refer you to the podcasts on tolerance intervals, and also the information in my blog as well. So I hope this has been useful… mean looking at central location, standard deviation looking at how spread out the data are. Of course, if you’ve got any questions on this or anything else, please feel free to email them to me through my website, KeithBower.com. For more information on statistical methods for quality improvement, visit KeithBower.com