Tip:
Highlight text to annotate it
X
Hello, we now get to the second lecture...
on recommender systems. This covers...
some more examples, mainly from Yahoo!...
and then goes into the detailed collaborative filtering
as the most... best-known algorithm and also does clustering.
[pause]
Here we have a... multitude of Informatics fields.
Remember we're doing Commerce and Lifestyle Informatics.
This is a repeat of a slide of... last lecture.
Just telling you the types of things we're matching;
people to products... people to people,
people to jobs or employers, and people to queries.
Or more precisely to... the results of queries on the Web.
[pause]
Here's a discussion, remember we go to Wikipedia
quite often... cuz it often has good discussions of things.
And it gives you four examples, which we've actually
mentioned three of them already. Amazon,
of course, is incredibly well-known as a shopping site.
Pandora is notable cuz it uses the properties
of its items, its songs or artists, the so-called
Music Genome Project, which classifies...
[pause]
different pieces of music so you can relate the music
from the content, not from the... just from the ratings.
Last.fm is a similar... goal, but it uses...
a technique... which is nearer the collaborative filtering
idea of looking at the rankings of other users.
And we already discussed last lecture what Netflix does,
which also uses... the rankings of other users.
Although it uses quite a bit of the content to actually
present you multiple different... types of recommendations.
[pause]
One example, which actually mentioned which we can
go into more detail based on... the slides at this...
from this online site, is Google News...
which is a personalized system which gives you...
suggested things to read about... customized to your interest.
Now some of that interest you can... go through
and do yourself. I know you can select, if you want...
different... sites, I select a university I used to be at,
California Institute of Technology,
I select the field of Physics and so on.
There are many different... possible selections.
So that's user-selected personalization.
The other personalization comes... from...
intrinsically looking at the properties
of the different news items and relating them.
This is why things like Latent Dirichlet Allocation
and other types of technologies are used.
So it... Google News is not the only...
it's just one way of presenting news.
Other news sites such as, obviously,
CNN and New York Times have a more
traditional view of news where the news is
lovingly selected, at least a lot of it, by editors.
And that gives you also important value,
but just different value.
[pause]
So... the recommendation in Google News
come from both the personalization of the current user,
what they clicked on and what they like,
and also the history of larger communities.
This is the basic collaborative filtering, the community side.
And remember, as we discussed, this is an example
where the actual generation of a new site must be very fast.
So probably a fraction of a second, and it... has to
react immediately to the user's request to bring up a site.
And it has to cope with a constant stream of new items.
Visually Google was actually... tend to be a little behind,
but now it's... when I go to that website,
it appears to know the latest things that have happened,
and is not obviously behind the times.
[pause]
And... we will come to this concept of
model-based and memory-based later on.
They classify different algorithms, and...
the method I'm personally familiar with,
a fellow called Hofmann who has done a lot of
very interesting work and now... when he was at universities,
first in Germany and then in Dartmouth...
he did a lot of improvements to a method called
Probabilistic Latent Semantic Indexing...
which is one of these methods that match items
based on their content. Typically for news
we have the items... websites, or web articles...
and you would match them based on the so- called
bag of words that they contain. And this allows you
to find latent similarities. You can group things
together which aren't obviously grouped together
based on similarity in their word distribution.
[pause]
And of course all of this runs on MapReduce to...
make it run in parallel and get good performance.
[pause]
Another example here from another...
talk given by SAS is that optimizing pricing
in retail... you need obviously to be...
make good profits, you need to always offer
what the users want. Which means you'd better
get rid of the things they don't want. And so that requires
a careful... optimization in price. If you reduce the price
too much, you'll lose too much money. If you reduce
the price too little, you will not clear your apparel.
[pause]
Points out here in the past this was often done by intuition,
but now it can be done by actual analytics.
This slide here points out the magnitude of the problem.
100 million decisions... need to be made on pricing...
cuz we have so many stores and so many products.
And that gives you many terabytes of data each week...
[pause]
which correspond to looking at the last two years.
And this gives you the number of units sold, the price,
and so on, and how much inventory you actually have,
and... actually what you did to get rid of the...
to popularize the item. And this is...
I pointed out an optimization problem here,
we're trying to optimize... two variables...
[pause]
two... functions. One is the money that the store gets
and the other is... minimizing the amount of
unsold product, cuz you obviously will get nothing
if a product is absolutely totally unsold.
Well, unless you can return it to the manufacturer.
[pause]
And there're lots of things these depend on. The time of year...
[pause]
how... exciting the product is, whether it's
just come out or whether it's rather mature.
And also what your history of promotion is.
People don't buy things if you always discount
and then you don't discount a particular item.
So that's an important issue.
And here's some rules of thumb,
such as 80% is the lowest possible markdown,
and also these psychological things like $1.99
is a lot less than $2 in peoples' mind.
And also you have to disentangle various competing effects.
And finally as we see on some data,
we present the following: within any one store,
the sales data are pretty sparse, and so you can't
make a very good prediction based on that.
Here's this example of noisy data at the store...
product level. This is one store and one product,
and you can see the units sold... on a monthly basis.
These are probably weekly basis...
on a weekly basis are measured in the ones and twos.
And here you're trying to make predictions on
the amount sold as a function of the price.
[pause]
So... you have to...
try to group this data together based on groupings within
products or groupings within... geographical groupings,
or maybe just other types of groupings.
If two stores are in different geographical locations
but have the same type of... customer, then they probably
should be related to... they could be joined together.
So anyway, you need to aggregate data
in an intelligent way, and here's an example of
aggregated at the region level... and... showing
the relationship between price and sales volume.
This can be used to then predict what price
you should set if you wish to increase the sales.