Tip:
Highlight text to annotate it
X
ELLEN SPERTUS: The main problems in computer science
is that there's too much information out there.
The amount of information grows exponentially at least,
but our ability to handle information doesn't.
And the specific problem looked at is how do we help
users of social website find information
that interests them?
Specifically how can users of Orkut find communities--
that's the term that's used for discussion groups, like
Usenet groups or Yahoo groups, or how can people find other
users they might want to meet.
So I was able to start working with Orkut shortly after it
launched, and this shows in the first ten months, there
was exponential growth of members.
And a low exponential, but still exponential growth of
these communities.
Anybody could create them.
They weren't very organized.
So the question is, how do you help people find communities
of interest to them?
I wanted to be able to have current community rank
recommendations, where if you were viewing a certain
community, you would be told what other communities you
might be interested in.
And here's an example of some recommendations.
Any guess of what community this is for?
AUDIENCE: [INAUDIBLE]
ELLEN SPERTUS: I see some nods.
Anyone want to guess?
AUDIENCE: Unix programmers.
ELLEN SPERTUS: OK, not UNIX programmers.
This is for the geeks community.
And the way these similarities were figured was with implicit
collaborative filtering.
What collaborative filtering means is it is a distributive
approach, where we look at lots of people's behavior.
And it was implicit, because we weren't asking people to
rate anything explicitly.
We were just observing their behavior, specifically which
communities they joined.
And we worked from the premise that two communities would be
similar if they had lots of members in common.
So here's some terminology I'll be using.
We consider each community to be a set of members.
We might talk about a base community, B, Wine, for wine
lovers, and a related community R, like Linux.
You'll see later why I use that.
And what we want is a measure of similarity.
How similar?
How good a recommendation is R for B. And one thing you want
to look at it is how does the membership between the two
groups overlap.
Are there a lot of people in the wine community who are
also in the Linux community?
So, here's an example, We have the pizza community.
And two communities.
I'm not showing the names up with with
these different overlaps.
Which one's a better recommendation.
AUDIENCE: I would recommend pizza because it's [INAUDIBLE]
ELLEN SPERTUS: Ah, OK, so someone in the audience said
she'd recommend pizza to the people in the upper left.
AUDIENCE: Upper Right.
ELLEN SPERTUS: Oh, upper right.
Excuse me.
I was actually asking the reverse question.
So if the base community is pizza, should we recommend the
one on the right or the one on the left?
Yeah?
AUDIENCE: I'd say the one on the right, because even though
it's a smaller group, you have a larger portion of the
population.
ELLEN SPERTUS: OK, so--
AUDIENCE: With me, it's a circular group.
ELLEN SPERTUS: Yes.
Someone very clever in the audience said he'd recommend
the group on the right, because even though it's a
smaller overlap, it's a bigger percentage.
So the obvious answer which this audience did, when it did
the one with bigger overlap, which would have been Linus.
At the time I did this research, Linux was the most
popular community, so it had the biggest overlaps with just
about everywhere.
I told this to a friend of mine who worked at Amazon, and
he said, oh, yes, the Harry Potter problem.
So whatever book someone looks at, the one most people they
purchase with it is Harry Potter, but iut's not very
useful to always be recommending that.
So group size is also a factor.
I've shown that here as the base and related community.
Also these relations are asymmetric.
There are some similarity in measures in computer science
that works both ways.
A is similar to B as B is to A, but that's
not the case here.
So, if you look at the Stanford community, the
Stanford class of 2006 community isn't a very good
recommendation.
Let me show you my terminology.
The numbers in parentheses are the size of the groups.
And what you see here is that there's an overlap of 47
members who belong to both of the communities.
So for the Stanford class of 2006, if someone belongs to
that group, but not the Stanford one, that might be a
good recommendation.
Or you could argue, that you could say no,
that's just too obvious.
So that starts the question of how do we tell which
recommendations are best?
So the relationship is possibly asymmetric.
I didn't have a very theoretical computer scientist
education, so I walked over to Google labs and search, asked,
actually said, UW Grads, what formula do I use if I overlap
the size of two groups to figure out the similarity?
And everybody gave me a different function.
And I'll go through the weeds, and what I decided to, and we
got in some arguments about which ones would be best, but
then decided let's implement them all and see.
So the simplest one is L1 normalization.
You can think of this in terms of vectors, where each member
is a dimension, and there's a 1 in that you dimension, if
that user belongs to that community, or, I find it
easier to think of it as set notation.
Just take the overlap between the two communities, and
divide it by the product of their sizes.
And suddenly to notice about this is this heavily penalizes
large communities.
If the community is twice as big, you better have twice as
big an overlap.
Another measure is L2 normalization.
In the vector space, this would be the same as the
cosign distance.
And this is the same as L1 except for this square root
sign in the denominator.
So you take the overlap and you divide it by the square
root of the product.
So this penalizes large communities less heavily.
Another measure that is used is mutual information.
Specifically, I'm looking here at positive correlation, which
is the upper left corner of this matrix.
What this formula shows is how well membership in the base
community predicts membership in the related community.
And then there's another version, where we have
negative correlation in the blue, which is how well does
non-memerhisp in the base community predict
nonmembership in the related community.
Some of you will be familiar with Saltan's term frequency
inverse document frequency measure, and that's used to
say that two documents are similar if they have the same
words in common.
And we can use that but instead say two communities
are similar if they have the same users in common.
Another one-- this kind of made the most sense to me
intuitively, was LogOdds which looks at how much likelier a
member of the base community is to belong to R than a
nonmember of the base community.
So switching back to the Harry Potter, how much likelier is a
purchaser of the Lion, the Witch and the Wardrobe to buy
Harry potter than people who haven't bought the Lion, the
Witch and the Wardrobe.
That tells you whether to recommend it.
This actually yielded the same rankings as L1.
So, just for fun, we decided to invert it
and see what we got.
So we used this version.
So going into this, we didn't know would there be
significant differences among the measures, or would they
all give similar rankings?
Which one would users prefer?
Which measures would be best?
And would there be a partial or total
ordering of the measures.
So here's some recommendations for the I Love Wine community.
You can see the I Love Wine community has 2,400 members.
And in the upper left corner, you can see, it overlaps 33
members with Ice Wine, out of 51 members of
the Ice Wine Community.
And something to notice.
Remember, I said that L1 heavily penalized large
communities?
And you can see that here because L1 is recommending
small communities, and 2 is recommending larger ones.
That's the one with the square root in the denominator.
And then the other algorithms have bigger yet communities.
So what's the best recommendation?
I saw some people taking wine before, so I know some of you
are wine lovers.
AUDIENCE: [INAUDIBLE]
It looks like Japanese food.
ELLEN SPERTUS: OK, so someone said Japanese food.
AUDIENCE: Red wine, if they don't already know about it.
ELLEN SPERTUS: OK someone said red wine if they don't already
know about it.
AUDIENCE: Must be Linux, or you wouldn't mention it.
ELLEN SPERTUS: Professor Lazowska thinks it's Linux.
OK, and there's a lot of different things you can say.
You could say the small group certainly because you might
not have heard about them.
And, of course, the only way to find out is to do an
empirical test, to see which users prefer.
So for an experiment, we precomputed the top twelve
recommendations for each of the six similarity measures.
And then we set up an experiment--
the exact details are in the paper--
so that when a user views a community page, we do a # on
the community id and user id, and select a pair of measures,
maybe L1, and at LogOdds to compare, and we interleave
those, and then we track what does the person click on?
And we only look at new users, because we had other
recommendations in the past, and we didn't want those to
influence it.
OK, so the next question is how do you
interpret the results?
There were six possible cases.
On the left I have somebody's status in the base community.
You can be viewing the Geeks community even if you're not a
member of it.
So I divided the member and nonmember cases.
That's the left column.
Across the top, I have their relation to the
community they click on.
So the M is for if they were already a
member of that community.
N is for if they didn't belong to that community and they
don't join it after clicking on it.
J is if they didn't belong to it but they click and join it.
So it's six possible cases.
I have the big N's and little n's to make them easier to
tell apart.
So, which of these measures, which of these squares do we
care most about?
AUDIENCE: [INAUDIBLE]
ELLEN SPERTUS: OK, I'm hearing J.
AUDIENCE: [INAUDIBLE]
ELLEN SPERTUS: Member to J?
AUDIENCE: [INAUDIBLE]
ELLEN SPERTUS: That's what we thought, too.
That if somebody is a member of a base community, they had
a community they didn't belong to and join it, that's the
sign of a recommendation.
And it's really unclear how to interpret the other ones.
If somebody is a member of the community and they clicked on
something and don't join it, does it mean it was a useful
link because they clicked on it?
Or was it just a distraction, that they clicked on it, and
it wasn't what they hoped it would be.
And you can ask similar questions about
each of these squares.
So we measured all of those, but we decided the primary
feature would be member to join conversion.
So we ran this experiment and we generated 4 million of
these recommendation pages and we got 900,000
clicks on those links.
And here I'm featuring the conversion rate.
You see if somebody belongs to a community, and clicks on a
recommendation, there's a 54% chance that they join the
community that they click on.
If they are in a community they don't belong to when they
click on a link, there's only a 17% chance
that they join it.
Overall it's a 34% chance.
So for the analysis, we want to take each click, and
remember, each click questions was choosing among two
different algorithms, and if L1 rated that item more highly
that the user clicked on, we give a point to L1.
If LogOdds rated it more highly, we
give a point to LogOdds.
And doing that, we actually got a total
order on our results.
So clicks leading to joins, L2 gave the best results.
L!
was next to worst. And all of these were statistically
significant, except for the single arrow, as opposed to
the double arrow.
AUDIENCE: So these are member or overall?
ELLEN SPERTUS: These are just looking at member of base
community joins a community, in the top row.
Excuse me, clicks leading to join.
That's ambiguous.
I'll have to to check my numbers If you look at all
clicks, L2 wins.
The order's the same, except that L!
jumps forward.
So, after this we were wondering
about positional effects.
We show things in scored grid of the positions that people
are more likely to click on.
Our first experiment couldn't tell us anything about that
because we were putting the best recommendations first.
So what we did is we generated new recommendations, just
using L1, and we showed them to different users in
different orders.
And then we tracked a million and a third clicks, to see
where people were more likely to click.
So we show one, two or three rows of recommendations.
When we showed a single row, anyone want to guess where
people were most likely to click?
Don't pay too much attention to these pictures because
someone else would see a different order.
So any guesses?
AUDIENCE: Center.
ELLEN SPERTUS: The center?
AUDIENCE: First.
ELLEN SPERTUS: First. At first left to right, we have a lot
of Iranian users.
Basically--
AUDIENCE: [INAUDIBLE]
ELLEN SPERTUS: Let's see.
OK.
Yeah, the center got the most. And the leftmost got next,
although this was not statistically significant.
AUDIENCE: Did you ever test for a size of image?
ELLEN SPERTUS: I did not test for size of image.
Nor did I separate out the different language speakers.
There's all sorts of interesting
things that can be done.
But since we were doing random orders, the images would
cancel out.
OK, what about for two rows, and here's another question.
What are these the recommendations for?
AUDIENCE: [INAUDIBLE]
ELLEN SPERTUS: Sorry.
AUDIENCE: [INAUDIBLE]
ELLEN SPERTUS: Of UW students, no?
AUDIENCE: [INAUDIBLE]
You took computer science.
ELLEN SPERTUS: Actually Washington State.
These are the Washington State recommendations.
And what position do you think people are most
likely to click on?
Again don't pay too much attention to these pictures.
AUDIENCE: Bottom
ELLEN SPERTUS: I guess the bottom center.
AUDIENCE: Upper right.
ELLEN SPERTUS: Upper right.
Upper left.
Top center.
OK, the top row got more.
The top right actually got the most than the top center.
And this was highly significant statistically.
Going to three rows.
This is for a fantasy and science fiction book club.
Any guesses what people did here, where they clicked the
most. OK, someone said middle.
AUDIENCE: Upper.
ELLEN SPERTUS: Upper.
AUDIENCE: [INAUDIBLE]
ELLEN SPERTUS: Middle right.
OK, here's what we found, that upper left actually got the
most.
AUDIENCE: [INAUDIBLE]
ELLEN SPERTUS: Now again, it doesn't matter what those
pictures are.
Yeah, so someone theorized when there's so much
information, people just stop at the first one.
Although, I can't tell you why.
I think its probably what happened.
We got a lot of reactions from users about these related
community recommendations.
We got hundreds of emails a day.
We had something that they could click on requesting that
we add certain recommendations.
And we got a lot of angry emails from people who created
communities.
AUDIENCE: [LAUGHTER].
ELLEN SPERTUS: So some are angry because here we were
adding something to their community page that they
didn't have any control over .
And then people were angry about some specific
recommendations.
From recommendations for communities that they did not
think were similar or tasteful or appropriate.
And I sympathized with them.
We heard from them about amusing recommendations.
So for the C++ community, we recommend a community for
people who don't understand women.
AUDIENCE: [LAUGHTER[
ELLEN SPERTUS: And for chocolate, we recommended PMS.
AUDIENCE: [LAUGHTER]
ELLEN SPERTUS: Not all the recommendations were amusing.
I created the years group and see if you can guess which
link I didn't like.
AUDIENCE: LINUX chicks.
ELLEN SPERTUS: So, it was this one.
So I actually implemented the feature to allow community
owners to delete a recommendations.
In Google, we do not do things by hand.
The Google community owner's unhappy, she needs to fix it
for everybody.
Of course, she can fix it for her community.
So we had a feature where someone could, a community
owner could remove recommendations at their own.
And evidently, I wasn't the only person, then.
This was the first one that was removed.
I wasn't the only person to want this.
In just over a week after it was released, 60,000
recommendations were deleted and 260,000 were added.
So this was a very popular feature.
And an open question is how do these compare with automatic
recommendations?
Is it more useful to the users.
It makes the community owners much happier, but is this
benefiting the users.
In some cases, people deleted communities that they saw as
competing with then, or were very similar to them.
So that would be interesting to find out.
There's many possible future research areas.
One idea would be just flipping things around, and
instead of determining similar communities based on common
users, figuring out similar users based on common
communities.
And one question would be is it useful?
So, for example, there were a total of nine users who belong
to these three communities that I belong to.
So how could that be used.
It can tell me you might want to meet these people.
They share some of your interests.
Or someone looking at my page.
Orkut is mostly used for dating.
Interested in Ellen.
Sorry, she's married.
Try Jessica.
And even though it was based on a social network, I didn't
really make use of that information.
You can imagine, taking into account distances in the
social network.
We have demographic information.
Should we count people from the same country, Should we
use that more in recommendations?
Should we use it less?
What about different ages?
Brazil was and continues to be most popular country.
So you'd often get recommendations for
communities in languages you might not speak.
So I'd like to acknowledge my co-authors, Mehrun Sahami, and
Orkut B. and the Orkut team, pictured there.
It's grown a little bit larger now, but they kept the site
running while I got to do this fine experiment.
And for more information, I have the paper up here and
I'll be making it available.
You can also get it and other papers from
labs.google.com/papers.
I wrote a kind of silly article about for Orkut media.
It's referenced in the paper.
The URL is there.
I wrote a number of silly columns on Orkut data mining,
because we had the data I wanted to test.
Are blondes really--
on Orkut, you can rate, how sexy, trusty, cool people are.
So we had the data.
I wanted to find out
scientifically, are blondes sexier?
Are gay people more cool?
So you can find out my results about that.
And, also Google is hiring, including at
this Kirkland office.
They wanted [INAUDIBLE] a whole lot.
So you can find out more about that by talking to any Googler
here at google.com/jobs.
And I just want to let you know about next week's talk,
which Alma Whitten will be giving, and
that's a week from today.
And now I'd be happy to answer any questions.
Yes.
AUDIENCE: So yes, when you were considering the problem
of common membership in two different sets or two
different groups, rather, basically, two users who had
common membership in two groups would contribute the
same no matter what.
Did you consider by any chance, like giving more
points to one who was more active in
both groups,for example?
ELLEN SPERTUS: That's a very good question.
The question was you treated two users belonging to the
same two groups the same.
Did you think about weighting it, by which
users were more active.
That that would count as a stronger link toward a
stronger recommendation.
Something else you could consider is if someone
belonged to fewer communities, If there's someone who belongs
to a thousand communities maybe you should weigh each of
those connections less than someone who just belongs to
four communities and participates in all of them.
We did not do that.
Our hypothesis was we'd be able to get good enough
recommendations doing things this way.
An I think that was the case.
But, I think that would be valuable.
Yes.
AUDIENCE: How about, going back a little bit.
As a user, the user experience-- one of the
difficulties of getting a recommendations is it takes up
a fair amount of real estate and time to review them, and
they're potentially useful, but perhaps is there was a
prefiltering ahead of time, so that the user could say these
are areas I'd like to see recommendations in, instead of
arbitrarily selecting 2 or 9 out of 10,000 Have you ever
though about doing it that way where the user has a little
bit more [INAUDIBLE]?
ELLEN SPERTUS: OK, so the question was about giving the
user more ability to specify what they'd like
recommendations about.
And that would certainly be a good thing to do Since this
was expensive to compute, we needed to precompute
everything.
And the total number of communities by the total
number of communities is already a pretty big number,
so we left it at that.
Something that would be very interesting to do would be per
user recommendations.
In fact, that's something that Amazon has.
They have these per item recommendations.
When you're looking at one book page, you might like this
other book.
But they also have recommendations
personalized for users.
And if one of my counterparts at Amazon wanted to give a
talk about that.
How to compute efficiently.
I'd find that very interesting.
I'm not sure I've fully addressed your questions.
AUDIENCE: No I was thinking about instead of having
10,000, you divide those 10,000 user groups into maybe
30, 40 general areas, and then you've the one user group
against by 20 out of your total population.
ELLEN SPERTUS: OK so the question was about--
AUDIENCE: You could have technical.
You could have consumer.
You could have like popular media.
You can have like political views and discussion.
And then the user can say, well, I really don't care
about popular media or music but I would like to see what
other technical recommendations you have.
ELLEN SPERTUS: OK, so the question was to be able to
group the recommendations by category.
AUDIENCE: So the user can select which category.
ELLEN SPERTUS: So the user can select what category your
interests and so they might not be interested in politics
but maybe they're interested in food.
As you can see, some of those relations were crossed lines.
You know someone likes pizza and you recommended LINUX.
That could be a bad recommendation.
One thing we considered was clustering, where we divide
things up into groups.
And that would be a perfectly good thing to do, valuable.
We just didn't do it.
AUDIENCE: I worry a little bit about when you recommend here,
you're going to be merging the membership of the groups.
And so you won't have the distinct difference of why you
want two different groups, when they then have all the
same set of members and their conversations are going to
tend to be the same, you've lost the advantage.
Do you think about possibly disadvantage to the community
with these recommendations?
ELLEN SPERTUS: OK, the question was, with these
recommendations, this will make communities have more
similar memberships with each other, so it
might reduce diversity.
That different communities would become more similar.
Well, It'll also tend- there will be a feedback effect with
the recommendations.
Because if you start by saying these two communities are kind
of similar because they have some members in common, then
they're both features on each other's page.
They're going to have more and more alike.
So that is something that could happen.
And it's not something that we've measured.
In the back.
AUDIENCE: So if someone is in one of these groups, would
they find a very similar group on [INAUDIBLE].
Why bother generating it, [INAUDIBLE]?
ELLEN SPERTUS: OK, so the comment was maybe you don't
want to recommend something too similar, because people
should be recommended something different from what
they already have access to.
And I'd say--
and we were measuring empirically, so it could be
the similar is the wrong word.
That when we made a recommendation that somebody
clicked through and joined.
Maybe it's not because it's similar, Maybe because it's
something different.
One example, would be people who go to the Wine community
thinking it's about wines, not an
emulator, the LINUX program.
And if they go there for the wrong reason, then they might
be happy to see a link to LINUX or to LINUX wine, which
is a subject that isn't similar but
it's what they wanted.
So that's hard to measure.
That's an interesting question.
Other question.
AUDIENCE: Like the Harry Potter problem.
It's like you might sell a lot of Harry Potter books, but
everybody already knows about Harry Potter.
There's no point in telling them.
So these smaller groups would be just more interesting to
receive information.
These other groups, you don't even know who's online so you
know [INAUDIBLE].
ELLEN SPERTUS: OK so the comment is that even though
you might sell books by recommending Harry Potter for
everything, it's something people know about.
It might be more valuable to recommend things they didn't
know about, so I'm going to go back to the recommendations.
So, with this I Love Wine community, where we have the
small quirky communities that you might not now exist, Iced
Wine, Pinot Noir, those are the smaller communities, and
what we found empirically was that L1 produced some of the
least popular results.
But you could imagine why you'd be able to crank it to
be able to ask for small communities or big
communities.
There's a sort of trade off between the likelihood that
somebody will like something, and novelty.
Are you going to present something that'll really
excite them and is different from things
that they knew about.
AUDIENCE: All this, also is--
so this is somewhat of a tangent.
But different communities behave differently at
different sizes, so I was just wondering when you herd people
go to these smaller communities, it's actually
going to change them.
And I was wondering if you saw any of that or had any way of
measuring that at all?
ELLEN SPERTUS: OK.
So the point was that a community behavior
depends on your size.
If you start with a small community and you have a lot
of people joining it, then that
might change the community.
And we did have problems with that.
We had a user who was very unhappy because her feminine
sexuality group, Cliturgy, was listed as a recommendation for
a bunch of groups that she didn't think were appropriate,
that she thought were merely obscene.
So I don't know Google, so I'll put this online.
So instead of having her feminist membership, she was
having these people from this bigger group with different
values from her join the community, and she shut down
the community because of that.
And when I found out about that, it may
have been too late.
She may have deleted it.
But I talked with her and that's when I had it manually
removed until we could fix it, because it was changing, it
was changing the community, it was ruining the community,
from her point of view.
Of course, do we count the creator of the community as
being more important that the users group,
both with their clicks?
Yes.
AUDIENCE: So you said you added a control for community
owners to control.
What was recommended to the people viewing their page.
This would be an example where the opposite permission
basically, to control where your pages or where your group
is recommended to.
ELLEN SPERTUS: OK, so the comment was we allowed
community owners to control the recommendations appearing
on their page, why not let them control which pages their
community was recommended on.
And we did not do that, although it's something that
the Cliteragician would have liked.
But that, you'd just have to rely on people emailing
community owners that there would be link swaps.
So, my guess would be that if someone doesn't want to be
listed, there's plenty of people who want to be in that
position, who'll be asking for it.
Something else we could have done would be let community
members make suggestions about what should be related, but we
decided to centralize it with the community owner, and users
could email them.
Yes?
AUDIENCE: So the fundamental motivator to include this
feature in the site, was that to--
what was the goal behind that?
Why include this feature at all?
ELLEN SPERTUS: OK, the question was why include this
feature at all?
What was our goal?
And I'd say the goal was to help the users find the
information of interest to them.
AUDIENCE: Why?
ELLEN SPERTUS: Why?
Because we want them to use the site and
to enjoy using it.
So right now, people might create a community identical
to one that already exists.
At first, we made it so you couldn't call communities with
the same name.
That was done to prevent a landgrab.
But I think we decided it wasn't a good decision.
But we thought our users would be happier, get more benefit
out of the site, if they were able to find communities of
interest to them.
And, at that time, there wasn't
a good search mechanism.
When someone created a community, they put it into
one of 29 different categories.
But when you have hundreds of thousands, and then millions
of communities, that's not a good enough hierarchy.
AUDIENCE: I want to know how long this whole process from
after you got back your multiple regarding the
algorithm to use, and once you made a decision to cut and go,
how long did this whole process take?
ELLEN SPERTUS: OK, the question was how long did this
process take?
It was pretty fast. It was over a period of months.
The date service hitting the paper.
This was done during Orkut's first year, 2004.
And I was working on other projects, and we had to wait
for the people to click, so swapping
in different processes.
But Orkut, the team, was kind of in a start-up mode, so we
were able to push things quite quickly.
AUDIENCE: So on that chart, where would you see the effect
of this project?
Would it show up, at which point in that chart?
Would this have an effect--
this went out to everybody, right?
ELLEN SPERTUS: Right.
The recommendations went out to everyone.
The question is where does it show up in the chart.
We did the first experiment in July.
It's not clear to me that this had a significant effect, so I
guess, I mentioned the possibility that if people
could find related communities, they might not
create new ones.
And the data does not bear that out.
AUDIENCE: Size of community though, might be effected.
ELLEN SPERTUS: OK.
The comment was that the size of
community might be effected.
Yes.
AUDIENCE: You mentioned you tweak the recommendations.
How often do you do that?
ELLEN SPERTUS: Well, for the experiment, we did it once.
It would have been too difficult to measure if things
were changing.
And actually I'm not responsible for that part of
the system anymore, so I can't tell you how
often it's done now.
But since community owners have been able to do the
recommendations, that sort of fulfilling that hitch.
What we implemented was something where we actually
generated more recommendations that could fit, and we showed
them to the community owner, and they could get rid of
some, and get more suggestions.
In back.
AUDIENCE: You were talking about [INAUDIBLE]
making the changes, getting [INAUDIBLE]
how it acts now [INAUDIBLE] very different than when you
first started [INAUDIBLE].
ELLEN SPERTUS: OK, the point was that even if it didn't
affect community creation then it was a very good site than
it is years later, and it could be that there'd be a
different effect now.
That's true.
And it's just wonderful the different
things that can be measured.
It is great for me to be able to--
all these different researchers were claiming that
this is the similarity measure that you should
use or that one is.
So there's a whole bunch of questions I wasn't able to
answer, but I was able to do some comparisons.
But you're right.
There's all sorts more that can be done here.
And I'm leaving you with more questions than I'm answering.
But as I said, Googl's hiring.
AUDIENCE: [LAUGHTER].
ELLEN SPERTUS: Yes.
AUDIENCE: So to have these similarity measures, you
expose your customers to a early set of
the different results.
Did you give any thought as to whether or not there can be a
way to evaluate these offline, without actually doing a
lengthy user basing experiment.
ELLEN SPERTUS: OK, the question was we used our users
to do this evaluation.
Is there a way we could have done it offline?
Do you mean like with hired evaluators?
AUDIENCE: Right, is there a way to evaluate the goodness
of a recommendation without [INAUDIBLE].
ELLEN SPERTUS: OK, is there a way to evaluate the goodness
of a recommendation without subjecting users to this.
I think no.
I'm not an expert in evaluation.
But I think the best measure of what's good for users is
seeing what users do, what they click on.
And maybe we could have done it for a
shorter period of time.
I don't know if we needed to do it as long, in order to get
statistically significant result.
But I think doing it as an actual user test with many
clicks was valuable here.
I think there was enough benefit to our users to
justify it.
All of them were getting recommendations.
And we work at bringing out which measure we think give
the best recommendations for our users, and then implement
that and publish it.
AUDIENCE: I find it interesting because there's a
lot of research on our [INAUDIBLE] systems, that have
various metrics of how good their results [INAUDIBLE]
popular with actual users.
So, I was curious, what's your opinion as to the the actual
value of the end-user interaction versus a more
academic approaches to evaluating recommender
systems.
ELLEN SPERTUS: The question was, for some evaluations of
recommender systems, they don't think of what the users,
the actual users do.
They have some independent evaluation.
And I just reviewed a paper, and criticized it for having
separate evaluators.
The evaluators can't know the intent of the person
initiating the query.
So I think it skews the results to hire people to do
the evaluation.
And I've done that, and I had someone giving evaluations for
pages in a language they didn't speak.
But even when the people are competent and trying their
hardest, they just can't know what's most relevant.
You know, we looked at those, which was better,
ice wine or red wine.
The only person who can decide that is the actual user.
And that's my opinion.
AUDIENCE: What about an approach like sampling some of
the people that have actually joined the communities based
on the recommendations and asking them more sort of
qualitative questions about is this a community that you
thought was good?
Or are you actually participating in this
community that you joined?
Just to kind of get more of that.
In privileging just at joining as the measure that says that
this is successful, the recommendation is successful.
And that's true in a certain sense but in the long term is
that really usable.
Is that person really just joining just
lots and lots of groups?
ELLEN SPERTUS: OK, that was a very good question.
It was should we just be looking at
communities people join?
Wouldn't it be better if we could get some qualitative
information from them about why they joined it, or see
whether they keep using it.
And I agree, those would the valuable.
We didn't want to impose on the users, requiring
more work for them.
And while it would have been interesting to see what those
people kept using the communities, the team was busy
about keeping up with the exponential growth, or trying
to, that we didn't measure it.
That would actually be interesting.
And we look closer at the data, we do see some
interesting effects mentioned in the paper.
There were some types of communities that people
clicked through to a lot, that didn't join.
And, I'll give a hit.
If you join a community, its picture gets
listed on your page.
OK, so all of the communities with the lowest conversion
rates were what's called adult.
And none of the communities with the highest
conversation rate were.
So this is a case where people were clicking through.
They have been clicking through the same
community every day.
They weren't joining it.
So it was a good recommendation, but by just
measuring joins, we weren't capturing that.
Let's see, how're we doing on time?
MAIL VOICE: Do you want to take a couple more questions?
ELLEN SPERTUS: Yes.
AUDIENCE: As she was asking about the motivation behind
this, and you stated it was to get people to use the site
more, did you find it that this was accomplished by this?
In terms of usage per user?
Did that increase after this?
ELLEN SPERTUS: OK.
the question was our goal was to help users get more benefit
from the site, or use the site more, and did they, in fact,
use it more.
And we couldn't really measure it, because, well I taught
about exponential growth, but I didn't understand it until I
worked at Orkut, and it really grew exponentially.
The sit would work great with 1,000 users, and
then we'd get 10,000.
And we'd have to fix some bottleneck and then it would
work great, and so we had 50,000 users, so there was
enough problem accessing the site, that how many page views
users had wasn't under our control.
OK.
Yes.
AUDIENCE: Have you considered any factor other than position
on the page that effect the click-through rate, other than
the matching reunion of the two membership part?
What effects behavior, like as far as the size of the icon or
the colors used or anything like that, did you test
anything other than position?
ELLEN SPERTUS: The question was if did we look at any
factors other than position in click-through, like the size
of the icon or the colors that were used.
We did not do that.
We could have done that.
Some communities have no picture, and it
just says no picture.
And I didn't include any on the slides because they're not
very interesting to look at.
we didn't break those out separately.
And see how much likelier someone is to click if there's
a picture, if there's a good picture, if there's a big
picture, if there's a menu colored,
that's a good question.
Not only are there openings at Google, I think there's
actually openings on the Orkut team.
AUDIENCE: [LAUGHTER].
MAIL VOICE: Alright, well let's thank Ellen.