Tip:
Highlight text to annotate it
X
I want to build on what we did in the last
video a little bit.
Let's say we have two random variables.
So I have random variable x.
And let me draw its probability distribution.
And actually, it doesn't have to be normal.
But I'll just draw it as a normal distribution.
So this is the distribution of random variable x.
This is the mean.
The population mean of random variable x.
And then it has some type of standard deviation.
Actually, let me just focus on the variance.
So it has some variance right here for random variable x.
This is x, the distribution for x.
Let's say we have another random variable.
Random variable y.
Let's do the same thing for it.
Let's draw its distribution.
And let me draw the parameters for that distribution.
So it has some true mean, some population mean for the random
variable y.
And it has some variance right over here.
And I've drawn it roughly normal.
Once again, we don't have to assume that it's normal.
Because we're going to assume, when we go to the next level,
that when we take the samples, we're taking enough samples
that the central limit theorem will actually apply.
But with that said, let's think about the sampling
distributions of each of these random variables.
So let's think about the sampling distribution of the
sample mean of x.
Let's say the sample size over here is going
to be equal to n.
So what is that going to look like?
Well it's going to be some distribution.
And we're assuming that n is a fairly large number.
So this is going to be a normal distribution.
Or it can be approximated with a normal distribution.
Let me shift it over a little bit.
I'm going to draw it a little bit narrow.
Let me draw the mean.
So the population mean of the sampling distribution is going
to be denoted with this x bar, that tells us the distribution
of the means when the sample size is n.
And we know that this is going to be the same thing as the
population mean for that random variable.
And we know from the central limit theorem that the
variance of the sampling distribution or, often called
the standard error of the mean, is going to be equal to
the population variance divided by this
n right over here.
And if you wanted the standard deviation of this, you just
take the square root of both sides.
Let's do the same thing for random variable y.
Let's take the sampling distribution
of the sample mean.
But here, we're talking about y, random variable y.
And let's just say it has a different sample size.
It doesn't have to be a different one.
But it just shows you that it doesn't have to be the same.
So it has a sample size of m.
Let me draw its distribution right over here.
Once again, it'll be a narrower distribution than the
population distribution.
And it will be approximately normal, assuming that we have
a large enough sample size.
And the mean of the sampling distribution of the sample
mean is going to be the same thing as the population mean.
We've seen that multiple times.
And its variance for the sample means, or the standard
error of the mean.
Actually, this isn't the standard error.
Standard error would be the square root of this.
So if I called this standard error of the
mean, that's wrong.
The standard error of the mean is the square root of this.
It's the standard deviation.
This is the variance of the mean.
Don't want to confuse you.
So the variance of the mean here is going to be the exact
same thing.
It's going to be the variance of the population divided by
our sample size.
And everything we've done so far is complete review.
It's a little different, because I'm actually doing it
with two different random variables.
And I'm doing it with two different random
variables for a reason.
Because now I'm going to define a new random variable.
We could just call it z.
But z is equal to the difference
of our sample means.
It's equal to the x sample mean minus the y sample mean.
So what does that really mean?
Well, to get a sample mean, or at least for this
distribution, you're taking n samples from this
population over here.
Maybe n is 10.
You're taking 10 samples and finding its mean.
That sample mean is a random variable.
Let's say you take 10 samples from here and you get 9.2 when
you find their mean.
That 9.2 can be viewed as a sample from this distribution
right over here.
Same thing if this right here is m.
Or if m right here is 12.
You're taking 12 samples, taking its mean.
And that sample mean, maybe it's 15.2, could be viewed as
a sample from this distribution.
As a sample from the sampling distribution.
So what z is, z is a random variable where you're taking n
samples from this distribution up here, this population
distribution, taking its mean.
Then you're taking m samples from this population
distribution up here, taking its mean.
And then finding the difference between that mean
and that mean.
So it's another random veritable.
But what is the distribution of the z?
So let's draw it.
Well there's a couple of things we
immediately know about z.
And we kind of came up with this in the last video.
Instead of writing z, I'm just going to write the mean of x
bar, which is a sample from the sampling distribution of
x, or the sample mean of x, minus the sample mean of y.
We saw this in the last video.
In fact, I think I still have the work up here.
Yeah, I still have the work right up here.
The mean of the difference is going to be the
difference of the means.
The mean of the difference is the same thing is the
difference of the means.
So the mean of this new distribution right over here
is going to be the same thing as the mean of our sample mean
minus the mean of our sample mean of y.
And this might seem a little abstract in this video.
In the next video, we're actually going to do this with
concrete numbers.
And hopefully it'll make a little bit more sense.
And just so you know where we're going with this, the
whole point of this is so that we can eventually do some
inferential statistics about differences of means.
How likely is a difference of means of two samples, random
chance or not random chance?
Or what is a confidence interval of the
difference of means?
That's what this is all building up to.
So anyway, we know the mean of this
distribution right over here.
And what's the variance of this distribution?
We came up with that result in the last video.
If we're taking essentially the difference of two random
variables, the variance is going to be the sum of those
two random variables.
And the whole point of that video is to show that it's not
the difference of the variances, it's
the sum of the variances.
The variance of this new distribution-- and I haven't
drawn the distribution yet-- The variance of this new
distribution, I'll just write x bar minus y bar, is going to
be equal to the sum of the variances of each of these
distributions.
The variance of x bar plus the variance of y bar.
Actually, let me just draw this here.
Just so we can visualize another distribution.
Although, all I'm going to draw is another normal
distribution.
Let me scroll down a little bit.
So the mean over here, the mean of x bar minus y bar, is
going to be equal to the difference of
these means over here.
I don't have to rewrite it.
Let me draw the curve.
And notice, I'm drawing a fatter curve than either one.
And why am I doing that?
Because the variance here is the sum of the variances here.
So we're going to have a fatter curve.
It's going to have a bigger variance, or a bigger standard
deviation than either of these.
So then we have some variance here, variance of
x bar minus y bar.
Now what are these, in terms of the original population
distribution?
We came up with those results right over here.
We know what the standard deviation is.
We know that this thing is the same thing as the variance of
the population distribution divided by n.
We've done this multiple, multiple times.
What's this going to be equal to?
This is right here is the same thing as the variance of our
population distribution.
And the x just means this is for random variable x.
But there's no bar on top.
This is the actual population distribution, not the sampling
distribution of the sample mean.
So that divided by n.
And then if we want the variance of the sampling
distribution for y, let me do that in a different color.
I'll use blue, because that was what we were using for the
y random variable.
That's going to be equal to this thing over here.
And we've done this multiple times.
Same exact logic as this.
The population distribution for y divided by m.
And so once again, I'll just write this out front.
This is the variance of the differences
of the sample means.
And now if you wanted the standard deviation of the
differences of the sample means, you just have to take
the square root of both sides of this.
You take the square root of this, you get the standard
deviation of the difference of the sample means is equal to
the square root of the population distribution of x.
Or the variance of the population distribution of x
divided by n plus the variance of the population distribution
of y divided by m.
And this is just neat.
Because it kind of looks a little bit
like a distance formula.
I'll throw that out there as we get more sophisticated with
our statistics and try to visualize what all of this
kind of stuff means in more advanced topics.
But the whole point of this is, now we can make inferences
about a difference of means.
If we have two samples, and we take the means of both of
those samples and we find some difference, we can make some
conclusions about how likely that
difference was just by chance.
And we're going to do that in the next video.