Tip:
Highlight text to annotate it
X
The normal distribution is a theoretical concept of how large samples of ratio or interval
level data will look once plotted. Since many variables tend to have approximately normal
distributions it is one of the most important concepts in statistics. The normal curve allows
for probabilities to be calculated. In addition, many inferential statistics require that data
are distributed normally. If your data is not normal be careful what statistical tests
you use with it. In a normal distribution, measures of central
tendency including the mean, median and mode all fall at the same midline point. The mean,
median and mode are all equal. The calculation of these measures of central tendency are
covered in another video. Normal distributions share several key features.
They are unimodal, meaning that there is only one peak in the distribution.
When divided at the mean a normal distribution takes the form of a symmetrical bell-shaped
curve. Standard deviations are used to measure how
much variation exists in a distribution. Low standard deviations mean values are close
to the mean whereas high standard deviations mean that values are spread out over a large
range. In a normal distribution approximately 34%
of scores fall between the mean and 1 standard deviation above the mean. Therefore, based
on it's symmetry, approximately 68% of scores fall between 1 standard deviation above and
1 standard deviation below the mean; approximately 95% of scores fall between 2
standard deviations above and 2 standard deviations below the mean;
approximately 99.7% of scores fall between 3 standard deviations above and below the
mean. Z scores are used to measure how many standard
deviations above or below the mean a particular score is. These scores allow for comparison
and probability calculations. Not all samples approximate a normal curve.
To understand more about distributions it is important to understand modality, symmetry
and peakedness. A distribution can have more than one peak.
The number of peaks contained in a distribution determines the modality of the distribution.
Most distributions are normally distributed and have only one main peak, meaning they
are unimodal. However, it is possible to have distributions with two or more peaks. Distributions
with two peaks are bimodal. Distributions with more than two peaks are multimodal.
Symmetry and modality are independent concepts. If two halves of a distribution can be superimposed
on each other where each half is a mirror image of the other, the distribution is said
to be symmetrical. Sometimes data are not symmetrical. If the
peak is off centre one tail of the distribution will be longer than the other, meaning it
is skewed. Skewness is a measure of the symmetry of distributions.
Pearson's skewness coefficient provides a non-algebraic, quick estimation of symmetry.
Recall that Normal distributions are symmetrical and bell shaped. In a perfect distribution
the skewness coefficient will be equal 0 because the mean equals the median.
Positive skewness means there is a pileup of data to the left leaving the tail pointing
to the right side of the distribution. The tail has been pulled in the positive direction.
The data is skewed to the right. In this case the Mean is to the right of the median. Interestingly,
positive skews are more common than negative ones.
Negative skewness means there is a pileup of data to the right with a long tail on the
left side. The tail has been pulled in a negative direction. In this case the Mean is to the
left of the median. To remember the meaning of a positive and
negative skew think of pulling on tails. Remember that the tail points towards the direction
of the skew. The mean is also pulled in the direction of the long tail of the skew.
Kurtosis is a measure of the shape of the curve. It measures if the bell of the curve
is normal, flat, or peaked. Since it's calculation is tedious it is typically done by a computer.
Using Fisher's measure of kurtosis a normal distribution would receive a coefficient of
0 and be called mesokurtic. If the calculation of excess Kurtosis results
in a large positive number the distribution is too peaked to be considered normal. This
type of data is called leptokurtic. The curve is taller and skinnier than a normal distribution.
The beginning of the word kind of sounds like leapt so think of a skinny guy who leapt high
in the air. If the calculation of excess Kurtosis results
in a negative number it is too flat to be normal. It would be called platykurtic. The
curve is shorter and fatter than a normal distribution. One way to remember this is
that the beginning of the word kind of sounds like a flat plateau.
If a distribution is skewed there is no need to calculate kurtosis since the distribution
is already not normal. Thank you for watching. Please subscribe and
explore more of my videos. Let me know what you found helpful and what other information
you may need. I look forward to reading your comments.