Tip:
Highlight text to annotate it
X
>> Hello. Welcome
to this tutorial.
One of the simplest ways
to organize your data is
to sort it.
You can sort it from high
to low or low to high
in either direction
but in the end,
you have a
ranked distribution.
One of the benefits
to a ranked distribution is
that you can calculate the
range of the data
so if the oldest car is 16
and the newest car is 1 year,
our range would be 16 minus 1
or a range of 15 years.
When you have several cars,
it may still not be helpful
to list all of the ages
of those cars.
In some cases,
it might be better to list --
instead create a simple
frequency distribution.
With the simple frequency
distribution,
you have two columns.
The column
on the left represents each
unique response.
In this case,
each unique response is age
of car.
So we can see
that people have cars
that are 1 year old,
2 year old, 3 year old,
all the way
up to 16 years old.
It's possible
that someone might not have
had a car that was 15 years
old and if so,
we wouldn't have seen 15
listed in the left most column
as a value that occurred.
However, in this case,
all the possible ages were
covered from 1 to 16.
In the right most column,
you're going
to see the frequency for each
of those responses.
So if I look
at the bottom row,
there's a car that's 1 year
and there's only 1 of them.
In terms of cars
that are 2 years old,
there are two people, if that
and for cars
that are 3 years old,
there again two people.
So the column
to the left tells you the --
in this case,
the age of the car
and in the column
on the right,
tells you how many students
have a car
that particular age.
So a simple frequency
distribution lists each
of the responses that occurred
and that's the left
most column.
And then the right most column
list is frequency.
With a simple frequency
distribution, it's fairly easy
to calculate the range as well
as to identify the mode.
To calculate the range
since simple frequency
distribution lists all the
values that occurred from high
to low, we can take a look
at the top
and see the high value
which is 16 and at the bottom
of the table,
the low value, 1 year.
And calculate that range
as 16 minus 1 giving us 15.
Also, we can identify the mode
that is the value
that occurred most frequently.
So the most frequent response
was people reporting
that their car was 9
years old.
There were 12 people
who reported their car was 9
years old.
So a simple frequency
distribution further organizes
the data making it easy to see
for each response,
how many times it occurred.
In this pretty good of a case,
we're recording the age
of cars.
If we had gone to the mall
and recorded ages of people,
we might've gotten unique
responses anywhere from 1,
to your 1 year old,
all the way up to 99,
to the 99 year old.
And with so many possible
responses from 1 to 99,
our simple frequency table
could be extremely long
so long as not really
to be much benefit.
In fact, the current simple
frequency distribution
that we have here talking
about car age,
it has 16 rows in it.
That's pretty long for people
to take a look at.
In that case,
you might instead want
to group these values
and this is called a group
frequency distribution.
So the table
that you see here,
the age of the cars are
grouped using interval size
of 3.
The bottom class interval
shows us that there are cars
that from 0 to 2 years of age,
that there's 3 of them.
And for cars that are 3
to 5 years in age,
why there's 10 of those.
And for cars that are 6
to 8 years old, there are 20
and so on.
Each one of these rows is
referred to as a
class interval.
So we have, in this case,
6 class intervals,
0 to 2 is one class interval,
3 to 5 is another class
interval, and so on.
When you look
at the group frequency
distribution,
that left most column is going
to be talking
about in this case the age
of cars.
That is we're talking
about what were the
responses made.
So some responses were 1 year
old and some responses were
two years old and there's 3
of those cars.
What you see boxed are the 0
to 3, the 6 and 9, the 12
and the 15,
those represent the lower
apparent limits
for each class interval.
The 2, the 5, the 8, the 11,
14, 17, those are the upper
apparent limits
for each class interval
and finally,
the right most column
that list the frequency
for each of those
class intervals.
When we look
at a class interval
such as cars that are 9
to 11 years in age,
there were 30 cars that fell
into that class interval.
That means
when you calculate the
frequency of cars
that were 9 years old plus
those that were 10 years old
plus those
that are 11 years old,
there were 30 cars that fell
into that class interval.
So 9 is the lower apparent
limit for the class interval,
11 is the upper apparent limit
for the class interval
and 30 is the number
of cars in it.
You hear this term
apparent limit.
Apparent limit means
that the limits are
in the same unit
as the measurement.
That is when we asked people
how old their cars were,
we asked how many years old
and so our limits
that we're providing, the 9,
as a lower limit
and the 11 is the upper limit,
those are the apparent limits.
They're in the same unit
as the values
that reported in here.
Well when it's time
to create your table,
you're going to want
to make sure
that you labeled your columns.
Notice that my left most
column is labeled age of car
and this is where I'm going
to be putting the survey
responses, one year, two year
and so on.
The right most column is
labeled number of cars
and that's the number of cars
that are in each class
interval and your table should
also have a title
and this is how old are
the cars.
When you create a group
frequency distribution,
you want to select an interval
size that'll provide between 5
and 11 class intervals.
This particular table,
if you count each of the rows,
has 6 class intervals.
Anything less than 5 it's hard
to identify any real patterns.
Anything more than 12,
it starts become unwieldy
so somewhere between 5
and 11 is ideal
so that you can identify any
possible patterns
in the data itself.
Okay what we're looking
at right now is all the data
as a ranked distribution.
And you can see it's still
fairly overwhelming
to see all the [inaudible]
lists out there as opposed
to the simple frequency
distribution we looked
at earlier
and especially the group
frequency distribution helped
us to identify patterns much
more easily.
But we're looking at the range
of distribution here
because we want to find
out what is the range
of our data.
And our ranked distribution
gives us the high value
and the low value
and so we can calculate the
range as 15.
And there's a span
of 15 years.
Now we want
to choose an interval size
that will give us somewhere
between 5 and 11 class
intervals and there's no easy
solution to this.
Literally say okay,
if my range was 15,
what if I divided by 2?
How many class intervals would
I get?
Seven point 5,
well that could work.
It's between 5 and 11.
What if I use interval sizes
of 3?
Well 15 divided 3
that would give me
approximately 5
class intervals.
Well that's good.
Now what about 15 divided 4
with interval size as 4?
Well that would give me 3
point 7 five class intervals
and again,
that would be too small
in this case
because it'd be less
than the 5
to 11 range we want.
We're going to go ahead
and work with the class
interval size of 3
and that groups up the years
and it's within that 5
to 11 range.
On an exam or test,
most likely I will be giving
you the interval
size directly.
Okay, so we know we're going
to work with the interval size
of 3.
Well, what will be the limits
for our bottom class interval?
What will be our lower limit?
What will be our upper limit?
Well here's an important thing
to remember anytime
that you create a group
frequency distribution.
You want it to be readable.
You want it
to be easily understandable
and an important way
to achieve this is
to make sure
that your lower apparent
limits, for every class
interval, that they're
multiples of the
interval size.
Okay, so our interval size
is 3.
So that means our lower
apparent limit needs
to be something like a 0
or a 3 or a 6 or a 9.
Some value that's a multiple
of 3.
That's where we need to start.
Okay so we're back
at our ranked distribution.
I'm looking at all the data
and I'm going to focus
on the smallest value.
Well the smallest value is 1.
That is there was a person
who reported their car was
only 1 year old.
Now my bottom class interval
needs to include that value
of 1 within the range.
I can start with 1
as my lower apparent limit
if it's a multiple of 3
but 1 is not a multiple of 3.
Zero is. Three times 0 is 0.
Three is, 3 times 1 is 3.
Six is, 3 times 2 is 6 but 1,
one is not a multiple of 3.
So literally I'd say okay my
lowest value is a 1.
Is it a multiple of 3?
Nope. And so I try the next
smaller number.
What about 0?
Is 0 a multiple of 3?
Yes and then I can stop.
If 0 hadn't been a multiple
of 3, I would try it
for the next smaller value,
what about negative 1
and negative 2?
I would've just worked all the
way down till I finally found
a multiple of 3.
So that's a key thing
to remember.
You take your lowest value
and you say is it a multiple
of my interval size?
And if it is, you're good
to go and if it's not,
say well what about 1 less
than my smallest value?
And you just work your way
down until you find something
that is a multiple.
All right.
So back to the group
frequency distribution.
My lower apparent limit
for that bottom class interval
is 0.
Right? The youngest car was 1
years old and that wasn't a
multiple of 3.
Zero was so we started with 0.
And that's important
to be remember
because often the easiest
mistake for anyone to make is
to say well 1's my lowest
value so I'll start with 1
but the problem is you end
up creating a table that's not
as easy to read and yes I know
that if you start with 0
and there are no cars
that are 0 years old,
you might say well why am I
doing that?
And the benefit, again,
is that you're going to end
up creating a table that's
much more readable
by doing so.
Okay so bottom class interval,
0 to 2.
Notice it's 0 to 2
because there's 0, there's 1,
and there's 2.
The class interval size is 3,
right?
Zero, 1, 2, 0, 1,
2 that's 3 there
and how many cars are
between 0 to 2?
There's 3 of them.
Well at this point I know
that all of my class
intervals,
that the lower apparent will
be multiples of 3.
So if my bottom class
interval, the lower apparent
is a 0, the next class
interval up it's going to be 3
and the next one
up is going to be 6.
Next one up will be a 9.
Then a 12 and a 15.
And that's one of the things
that you can just kind
of check after you've created
a group
frequency distribution.
Just look at those lower
apparent limits and say,
are they all multiples
of my interval size?
And if they are you're good
to go, keep going on
and if they're not,
stop and figure
out what you might do instead.
Then you can go ahead
and finish creating your group
frequency distribution.
Again, as you look
at this you can see how the
lower apparent limits are all
multiples of 3
and because you know your
multiples of 3,
this table is much easier
to read.
You can see that it starts
with 0 to 2 and goes all the
up to 15 to 17.
The oldest car was actually
only 16 but again, we're going
to go to 17
because our interval size
is 3.
You list all the class
intervals beginning
with the smallest,
0 to 2 all the way
up to the 15 to 17.
If a middle class interval had
a frequency of 0,
you would still put it
in there.
You would just say it's
frequency was 0.
Would I ever want a class
interval that was 18 to 20?
No, since that's
above the eldest car
so to speak.
It would be of no benefit
to add a bunch
of class intervals
above that were all zeroes.
So you'd begin
with the class interval,
continue your minimal value
and you stop when you get
to the class interval
containing your maximum value
making sure
that your lower apparent
limits are all multiples
of the interval size.