Tip:
Highlight text to annotate it
X
All right, now we're gonna go back to the...
problem we did in recommender systems of
k'th Nearest Neighbor of looking at the 1000 points,
and we'll just browse through some clustering of that.
Before I bring up PlotViz, let's just get some
screen dumps. I should say that the...
we're gonna look at two things for the 1000 points,
we're gonna look at the... actually three things,
we're going to look at the original labeling of Harrington,
one particular clustering, and finally...
and that was into 3 clusters, and that's just to show
what happens... we'll cluster it into 28 clusters.
And here is... on the left is the... Harrington's original...
no, on the right is Harrington's original
assignment of points to clusters.
And on the left is what happens when you
assign the points using... K-means...
and as I pointed out, we have here a center,
there's a yellow center here, the spheres,
a brown center for the brown cluster,
and a purple center for the purple cluster.
I did not mark centers for the Harrington original labels.
And you can see they're actually pretty different and...
that's true always of this type of... set of points.
These are a set of points where there are
actually no clearly identified clusters.
You can chop them up into three or more regions,
each... where the points in each region are near each other,
and that actually... for instance, in a lot of
recommendation systems, is all you need.
Cuz you're trying to find... classifications,
and then all you need to do for a classification,
especially when you have a lot of clusters, 28,
when the regions are relatively small for this problem,
if you have more points, you need more than 28 clusters.
But when you have a lot of clusters, those actually
divide the region up into relatively small sub-regions,
and if something is near a point in the
sub-region, you know it's pretty near.
And it doesn't really matter how you do it.
So that's a pretty important point about clustering.
There are some cases where see they've drawn
where the clusters are very clear, and some cases
like this where you can divide it into clusters
and those clusters can be done, actually,
in some other different ways because
of... there are no clear separations.
So if we go to the next slide,
actually I've... in this case I have got it
the other way around with the...
original labels now on the left and the optimal on the...
right with the optimal having the centers.
And actually for this... this is just the same data rotated.
And for this rotation we can actually find that
the original labels, which are actually pretty clear,
this is effectively the clustering where the z-axis
is out of the board, so you're actually projecting on
x and y for this... for the original labeling,
that's a very clear clustering for the new so-called
optimal labeling, which is actually technically
a smaller distortion and a better clustering for this
particular orientation, that's not true.
So that's no longer so clear.
But that shouldn't be held against it, a few
cluster points in 'N' dimensions when, in this case,
it's three, and you project it into two dimensions,
those clusters do not have to be clearly
separated in two dimensions, cuz the separation
that makes the clustering can be in the other dimension.
So let's just look at that in the PlotViz, cuz PlotViz,
being three-dimensional, will help us do that.
[pause]
So here we have PlotViz.
[pause]
Actually, I need to tile these windows.
And now I have a nice PlotViz with...
two clusterings here. I now have the...
reclustering, which is the so-called optimal case
on the left, and the original on the right.
[pause]
This is the case where the z-axis is...
essentially perpendicular to the board, and so
the original labeling is pretty clearly clustered.
I should point out, if you're a... if you want to learn
how to use PlotViz, you need to... when you
take the files that are given to you,
you need to do the following things.
You go through the... you load each...
you go to PlotViz and you... load each file
and you go through and click 'axes'.
You don't really want a picture of some axes
in the middle of the graph, so you turn that off.
You go into the section that size 'glyphs'.
These use glyphs, them little shapes that represent
the points, cuz then the points show up more clearly,
and you want to put auto-orientation to be 'true'...
[pause]
and that's what I've done there. It was already
set to be true, so it didn't actually do anything
when I turned it false from true again.
If I rotated it you'll see a difference if I turned it false.
And the other key thing is we want to... we're gonna
try to make these two 3-D plots rotate together,
so for each of the windows we need to set the camera...
Use SyncCamera to be 'True', that's what I've set there.
All right, now let's go for it. We'll go up to the view,
set the view to full-screen, maybe we'll get rid of
this thing here. And now we have... the two cases,
the two sets of clusterings, and we can try to
examine how they look by just rotating them,
and you can see they're... actually rotating together.
[pause]
And now we've found the case where the optimal clusters,
the one on the right, are actually relatively well-separated,
but the... one on the left, the original labeling,
is not well separated. We can see here the centers,
which are now spheres, and... by the way,
those original slides we looked at were just
screen dumps of this. As we rotate, you can see
they go in and out of... now we got back to a
pretty good separation for the original... labels,
but not so good for the... optimal, and we can...
as we examine it, we can sort of see what's going on.
By looking at this, we can find out the characteristics
of this dataset, and whichever way we do it,
it's relatively clear that these datasets are not
separated naturally. They're just a set of 1000 points
which fill a region, while not the total region,
but a significant part of the region, and if we
cluster it you do see these nice compact regions.
Each cluster's much smaller than the original dataset,
but we can do that in many ways because
there's no clear separations to guide us.
All right, so that's the end of that
particular set of PlotVizes.
[pause]
We go out of full-screen, and we'll get rid of that
particular picture and we can actually go back to the...
[pause]
powerpoint. Let me get myself a... pointer again.
[pause]
Here's my laser pointer, this little red thing.
We have... this is just the third view of... which is...
what we just did by... interactively with PlotViz,
and this is just the third view showing the optimal
on the... left and the original on the right.