Tip:
Highlight text to annotate it
X
Minitab is the leading global provider of
software and services for quality improvement
and statistics education.
Quality. Analysis. Results.
For more information visit Minitab.com
This podcast is available at KeithBower.com
Hello, today I’m going to talk about the mean and the standard deviation:
What they are and why we use them in practice.
Well, the population mean, “mu” – the 12th letter of the Greek alphabet
gives us an idea of the central tendency.
We can estimate mu by using the sample mean, xbar.
So in essence we take a smaller, representative sample from this population,
add up the individual values and then divide it by the sample size.
To get an idea of how spread out the data are,
the true population standard deviation, sigma
– the 18th letter of the Greek alphabet -
can be estimated by the sample standard deviation,
which we quite frequently call “S”.
So to compute S we take the sum of the squared deviations from the mean,
divide it by n-1 and then take the square root of all that
(otherwise it would be the sample variance that you’d have computed.)
It gives us an idea of how spread out these data are, on average.
Of course, when I say sigma here – the true population standard deviation…
for those of you who might be going through the Six Sigma® quality improvement initiative,
don’t confuse the 2 terms – sigma in your world is related to the capability of a process.
Sigma, in this language, is dealing with how spread out
the data are for the true population itself.
I’ll refer you to my website for more information on calculating capability indices.
Now, if you’ve got the mean and the standard deviation, and you know that
the population that you’re sampling from is [well modeled by]
a Normal distribution, then the great thing is that the mean and the standard deviation
(we call them the joint sufficient statistics) – I only need to know the mean and standard
deviation and I can completely characterize this Normal distribution.
So, let’s say if you tell me that you’ve got a mean of 62 and a standard deviation
of 26, then I could look at a particular point on this X-axis, and work out the
probability of a value falling above it, or below it, or whatever
we may be interested in.
So regardless of the amount of data that you’ve collected, you would only need
to know the mean and the standard deviation.
With that in mind, if you do have a Normal distribution, and you go plus/minus
1 standard deviation, then approximately 68% of all the values would fall inside there.
Go plus/minus 2 standard deviations, it’s about 95% of all the data,
plus/minus 3 standard deviations, it’s about 99.7%.
So you can remember that for a Normal distribution: 68 / 95 / 99.7
Frequently I find that people – when they want to get an idea of where the data
should be falling, and if the assumption of Normality is OK, then they might
use the mean and then go plus/minus “x” standard deviations, like 2 standard deviations,
3 standard deviations, whatever it may be.
I tell you, I don’t like that approach because it doesn’t use the amount of data that
were collected in the computation – it’s just using the mean and standard deviation.
What I think is a better method would be the use of a statistical tolerance interval.
So for example you could say you are 95% confident that 99% of the data would
fall inside this particular region.
For more information on that I refer you to the podcasts on tolerance intervals,
and also the information in my blog as well.
So I hope this has been useful… mean looking at central location,
standard deviation looking at how spread out the data are.
Of course, if you’ve got any questions on this or anything else,
please feel free to email them to me through my website, KeithBower.com.
For more information on statistical methods for quality improvement,
visit KeithBower.com