Tip:
Highlight text to annotate it
X
SHIRLEY GAW: Hi.
My name is Shirley, and I'm a software engineer working in
the Paris office, and I'm on the YouTube Data API team.
PHILIPP PFEIFFENBERGER: And I'm Philipp Pfeiffenberger.
I'm also a software engineer working in the Paris office,
and I'm working on the semantic
annotation of YouTube videos.
SHIRLEY GAW: So in this session, I'd like to go over
how I taught YouTube that my baby is cute, but more
generally, how you as a content creator can improve
semantic annotations for your videos and channels.
And also, how you as application developers can
discover content on YouTube based on Freebase topics.
So Philipp's team's been working on automatically
annotating resources such as videos and channels with
Freebase topics.
Now, if you're not familiar with Freebase, it's an open,
crowd-sourced Knowledge Graph where nodes corresponds to
real world things-- so for example, a person or a place
or a song--
and each of these nodes has a unique ID.
So in the latest version of the YouTube Data API, you can
actually look up, for example, a YouTube video and see what
Freebase topics have been automatically
associated with it.
Furthermore, you can supply a Freebase topic ID and see what
YouTube resources are related to that topic.
So in the session, I'm going to talk about why you should
care about the quality of these video annotations--
so how it's being used, and also how you can improve the
quality of the annotations for your content.
Philipp will go into my specific example explaining
how we arrive at those video annotations and then other
signals that go into the annotations process.
Understanding that, we'll go back to my first example and
explain how I was able to improve the annotations.
And then, we'll walk through an integration between the
Freebase API for getting related topics--
getting topics in general, actually--
and the YouTube Data API to find content on YouTube.
And then finally, Philipp's going to talk about work that
his team's been doing that will give you more Freebase
annotations, and hopefully we'll see it in
the API real soon.
So why should you care about the quality of video
annotations?
Well actually, at YouTube, this is one of the signals we
use for surfacing content and organizing it.
So for example, if you're looking at the Home page, it's
one of the signals that we use for featuring content--
also when you do search and some special features.
Also specifically, we're using it as a signal in our video
and channel recommendations, and this is how some external
partners have been using it in the Topics API.
So for example, Interesante is focused on Latino users and
culturally-relevant content for those users.
So they look in Freebase, find related topics for things that
people are interested in, find content on YouTube, and then
suggest that to their users as things that they can add in
their collections.
Showyou is more general, in that it's
showing internet videos.
And when you're watching something in the latest iPhone
app, it'll show you the topic it thinks the video is about.
It's using the Topics API as one of the signals.
And then from there, it can suggest other internet videos
related to that topic.
Seevl is actually specializing in music recommendation.
So say your friend likes an obscure band.
If you're using the YouTube data as your music source,
then you can actually find more YouTube videos
related to that band.
So now that you understand why video annotations and quality
video annotations are important for content, let's
go into my favorite example.
So this is my daughter.
And if you're a human being, what would you say
this video is about?
Be nice.
[LAUGHTER]
AUDIENCE: Self-discovery.
SHIRLEY GAW: Self-discovery as kind of a step in development.
AUDIENCE: I'd say it's about mirrors.
SHIRLEY GAW: Mirrors.
So you can talk about some things in the scene,
particular people in the scene, describing
what you see there.
What does YouTube think of this video?
So actually, we only expose this in the
Topics API for now.
So you don't see it on the site.
What you do is make an HTTP GET request to googleapis.com.
The latest version of YouTube.
We look at the videos collection for a
specific video ID.
And now, we say that we're interested in the video
annotations by saying in the response that we want
part="topicDetails."
So what did YouTube you think this video was about?
There's no green, which means it doesn't think anything
about my video.
Of course, working on the Topics API, that's clearly not
good enough for the mother.
So I uploaded a second version of this video, and I made sure
that there were some high quality annotations.
I did a few tweaks and a little search engine
optimization, which I'll discuss later.
But now, when we query this new video, it's
the exact same content.
But now, we get two topics, which are shown in green.
We can go to Freebase, append the Freebase ID, and you can
see the first thing.
The first topic is that it's about an infant.
The second topic, cuteness.
I like to conclude that I taught
YouTube my baby is cute.
PHILIPP PFEIFFENBERGER: Thanks, Shirley.
So we've received a lot of questions from developers--
exactly how videos are annotated and how the
annotations that we export for videos should be interpreted.
In order to shed some light onto those questions, I want
to walk through the annotation process using
a few example cases.
Well, before we can start annotating anything, we want
to look at what kind of data we have available for the
annotation process.
We want to list these in the order of their availability.
So first and foremost, we have the text metadata.
At the time of upload, the uploader will insert some
text-- a title, description, and so on--
and we have this immediately available.
A few minutes after upload, we'll have extracted some
audiovisual features that we can use to classify the video.
And finally, if the video's popular enough, there may be
some context, both on the open internet and on YouTube, that
we can help to further guide annotation.
So I want to illustrate these with one example video for
each data type.
But as I do this, we should keep in mind that for your
average YouTube video, we try to make use of all
three types of data.
I want to start with Shirley's video, which was just uploaded
and only has text metadata to work with at this point.
So I'm going to walk through the annotation process using
Shirley's video, the text, and the entities as an example.
And as I do this, you should keep in mind that also for the
other data sources, the process I walk
through is the same.
So we know that the text metadata is
provided by the uploader.
It includes title, description, any tags that the
uploader included at that time.
And we also know that text and concepts have a many-to-many
association.
That is, text is ambiguous.
The concept of infant can be communicated by the word
"baby," by the word "infant," by the word "toddler," so on.
And likewise, the word "baby" without any sort of context
can refer to a Justin Bieber song, can refer to an infant,
can refer to a number of other things.
For that reason, we depend on the text to be consistent to
allow us to correctly dereference what concepts are
mentioned therein.
Now, even though we depend on the uploader to give us this
text metadata and we have to correlate this with other data
sources, it's really valuable, because it's available to us
immediately.
So now, to walk through the process.
I'm not sure why it's not showing up.
OK, we're back.
Sorry about that.
All right, so we have some text metadata.
And luckily, we have enough of it to correctly dereference
all the concepts within it.
From Shirley's video, her description, and her title, we
were able to extract the concepts of mother, daughter,
infant, mirror, and so on.
Now, you see that some of these are shown in bold.
That's because we assign a score to each concept based on
its prevalence in the metadata.
For this simplified example, I'm simply giving twice the
score for any entities that show up in the title.
So now, we have some entities.
We have some weighting.
But we can't quite go ahead and say infant and mirror are
what this video's about because they
showed up in the title.
In order to figure out what the central entities for the
video are, we use links between concepts.
We extract these from the open internet, where we learned
that mother and daughter, daughter and infant, infant
and cute tend to co-occur, and we can build a support graph
between these concepts.
So now, we have weighted entities.
We have a support graph.
How do we figure out which entities are
central to the video?
That's where we go to scoring and thresholding.
So first, we give each entity one point simply for existing,
two points for existing in the title in this example.
And then, we give one additional point for each
entity that links to that entity.
So infant gets three more points, because three other
entities link to it.
Mirror gets one more point.
And after applying some thresholding, we determine
that the central entities for this video
are infant and cute.
Now, that's all fair and well when we have
really good text metadata.
But we're not always that lucky, especially in
the world of gaming.
Oftentimes, we get gaming videos that mention
characters, that mention levels, but
don't mention the game.
If you look at the video on the right, you see "Book raid
on the nether!
Working Draft." If you're a human that plays games, you
know it's Minecraft.
But looking just at the metadata, you're kind of lost.
Luckily, this is a gaming video.
And luckily, for gaming videos, we're guaranteed
predictable lighting conditions.
And we're also guaranteed some static features--
status bar, fonts, and so on and so forth--
so we can train classifiers to help us identify these games.
Now, this is great for games, but it's usefulness rapidly
decays when you generalize to the general
content we have on YouTube.
If, for example, I were a really great set of
classifiers, and I told you that some video featured a man
in a top hat with a beard, a man with a t-shirt, flip flops
in a mall, you wouldn't be able to deduce that this might
be a modern-day parody involving Abraham Lincoln.
Those types of nuances are lost when you use classifiers.
However, because this is available just minutes after
upload and applicable for some verticals, it remains in
active development for us.
But sometimes we don't even have audiovisual features to
work with or no classifiers to match.
So in the case where we have a video that lacks both good
metadata and distinct audiovisual feature--
like the video on the right titled "Me at the zoo"--
if we had a good classifier, we might get "elephant" and we
might get "man." That's about it.
However, if we look at the context of the video--
that is, the discussions going on in the comments, the web
pages that it's embedded in and what those web pages are
about, and the overall user engagement--
then we can try to figure out what is
notable about this video.
And we can figure out that it's notable because Jawed
Karim is in it, and he's one of the co-founders of YouTube,
and this was the first video ever uploaded to YouTube.
Now, individually, all of these signals are hideously
noisy, as you can imagine.
But on the aggregate, once you have enough of them, they
become really powerful for really popular videos that we
otherwise don't know much about.
Now, you can't quite rely on this as uploaders, because you
probably won't have every video reach 10 million views.
But if you're consumers of the API, you can deduce from this
that a video with a lot of views is probably going to
have more confident annotations than a video with
just a handful of views.
So now that we know how annotations work, I'm going to
give it back to Shirley, who's going to show us what she did
on her second upload to get better annotations.
SHIRLEY GAW: So let's talk about my
favorite YouTube not-star.
If you recall from the beginning of the session, I
had two versions of the exact same video.
The first version had no annotations, and the second
version had two annotations.
Now, what happened in between?
The first thing to recall is that, for some reason, this
video is not as popular as I would expect.
And we don't have any blogosphere love and no
comments, so that means we can't use the
video context signal.
Likewise, we don't have any audiovisual feature matching,
so in this particular case, we're relying entirely on
video metadata.
So if we're doing that, then we need to have
the video be public.
Otherwise, it's not caught by our
video annotations pipeline.
Also, if we're relying on text metadata, that text metadata
should say something.
So the title should actually be a concise description of
the content of the video, and we should be adding supporting
text and tags.
So the second version of video is the same content.
But it's a public video, and we're adding metadata to
support what we say that the video is about.
PHILIPP PFEIFFENBERGER: So I've made a few references to
centrality and central annotations without really
defining what it is, and that's because it's a very
narrow but powerful concept.
We consider a video's annotations to be central if
these applications are complete--
that is, given the annotations, you can figure
out what the video is about--
if they're specific--
that is, if you can't replace any of these entities with a
more specific entity that still
refers to the same concept--
and if they're compact--
that is, if you can't remove any of these entities and
still completely describe the video.
For example, if we had a video of this talk and we wanted to
annotate it, you would likely choose the entities "semantic
annotation," "Google I/O 2013," and "YouTube API,"
because these would be complete-- you know what the
video's about--
specific-- we can't find anything more
specific for any of them--
and compact.
You can't really remove any of them and still understand what
it's about.
However, if we had a single entity for this talk in
Freebase, you would gladly remove all three of these and
replace them with that one entity, because it would still
be complete, specific, and definitely compact.
You could use Freebase to still figure out that this was
a talk at Google I/O, this was about YouTube API, and so on.
A layman's way that we've kind of put this into words is an
annotation is likely to be part of a set of central
annotations if you would include the name of the entity
in a one sentence description of the video.
And separately, if you are curating a YouTube channel
about this topic, would you choose this video as a
canonical example about that entity?
If you answer both of those questions as yes
independently, then this is likely an entity that's part
of the central set of annotations.
It follows then that relevant is not central.
A video of this talk annotated with an entity for Moscone
Center, that entity would not be part of a central set of
annotations, because we can remove it and still
describe the talk.
And also, of course, related is not central.
Android API would not be part of a central set of
annotations.
So what does that mean for developers?
Well, it means that for your average YouTube video, you can
expect one to three very specific, very narrow
annotations that should completely describe the video.
Even though these are very specific and narrow, you can
use the structured data in Freebase, which is available
as a downloadable data dump, to get more information about
these entities.
Also, you should know that more popular videos will
probably have more confidence in annotations, because we
have more signals to help us annotate them.
And if you're an uploader, you should use precise titles and
cohesive descriptions that talk about what the video's
about to help us give you the correct entities from the
start and so that the other signals simply dovetail those
correct entities we started from.
So not just to talk about what you can do with these narrow
entities, but I actually kind of want to show you what is
possible with the central annotations
that we have today.
I'm not going to talk about what the demo does before I go
into it, but you should know that this is not
a rules-based demo.
That is, it's agnostic about what kind of entity I'm
currying for, and it's about maybe 60 lines of Python.
So say, for example, I want to explore origami on YouTube.
I don't know what origami's about except it's about
folding paper, but I want to learn more.
So I look up the entity on Freebase.
And how are videos categorized if I use entities?
Well, it turns out that origami on YouTube tends to be
different shapes of organisms, and I can use this to learn
how to fold origami.
So I can start learning folding roses, cats.
And after I've spent enough time indoors folding origami,
I want to go outside and travel a little bit.
So I want to look up what do we have
about travel on YouTube.
Probably videos categorized into countries and cities.
And sure enough, we have travel videos about different
countries and different cities and so on.
After a lot of traveling and walking and sightseeing, I'm
probably hungry.
I want to check out some local cuisine.
So I look up what videos we have about cuisine on YouTube,
how we can structure those.
And probably, we're going to see different ingredients and
different types of food.
So this is actually relatively straightforward.
All I'm doing is I'm searching for 100 videos that were
centrally annotated with origami.
And then, I take all of the other entities that those
videos were annotated with, and I use Freebase to look up
their notable types.
I then create clusters of notable types.
Say, for example, I had a video annotated with origami
and cat, another one origami and bat.
Cat and Bat would fall into the
canonical type of organism.
And then, I choose the largest set of notable types, and then
I present the entities along with the videos that were
annotated with them.
So I'm going to give it back to Shirley, who's going to
walk us through the API calls that were made behind the
scenes to make this happen.
SHIRLEY GAW: So just to summarize, we're fetching
content on YouTube based on a specific Freebase topic ID.
That's selected from the user using the
Freebase suggestion widget.
Once we have a topic ID, we can discover content on
YouTube using universal search.
So for YouTube, that means that we can find videos,
channels, and playlists.
In this particular case, we're interested only in the videos.
So suppose that the user selected
origami, like shown earlier.
Then, we want to find content on YouTube related to origami,
and we get a bunch of videos--
100 in this case.
Now, we want to see, are these videos about
more than just origami?
We want to find what are the other central topics.
So we've go to the video's list service, find those
central topics, and then look up the notable types and
cluster based on that, which is the coloring that he showed
in the demo.
So first of all, Freebase suggestion widget.
This is just something that you can find going to this
website, and they'll give you instructions for embedding it
in your web page.
Users type in text their suggested entries.
Now, you get a Freebase machine ID.
Let's switch over to API Explorer.
So this is a nice way of being able to play with Google APIs
for different products.
So developers.googl e.com/apis-explorer.
You select the YouTube Data API, so the latest public
version of our Data API.
You'll see these are different resources and operations on
those resources.
In this case, we want to discover content on YouTube
related to a specific topic.
And then, once we find those videos that we're interested
in, let's get more information about them
through videos list.
So let's start off with finding content on YouTube.
The nice thing about the API Explorer is that it explains
what all the different parameters are using this text
on the right.
And you can see what all of them possibly are and how
they're used.
And then, if it's in red, it means it's a required
parameter for the request.
Now, I've pre-filled out this form.
And recall the user selected a specific suggested topic ID,
topic, and then we get the Freebase machine ID.
Oops, that's a bit overkill.
We say we want to find videos, type="video" on
this specific topic.
I'm using origami as the example.
And we want the IDs of the resources, and for this demo,
I'm going to show you the metadata as well.
Now, Philipp's demo uses 100 videos and fetches that.
In the case of the API, we can only get 50 at a time, so
we're going to have to do two calls to get 100 videos.
I'm going to execute this request again.
You'll see here this is the HTTP GET request that you need
to make, and it pretty-prints the format of the results--
so different origami videos.
We can then select these video IDs.
If you recall, we have videos related to a specific topic.
Now, we're going to see what other essential
annotations there are.
So I have preselected three of these results, and we go to
the videos list service up here, GET just like in the
example I showed before with the baby.
In this case though, I'm just filling it out in a form in
the API Explorer.
So part="topicDetials."
Comma-separated video IDs here.
Execute the request.
And you see in the response that some videos have not just
origami as a central topic, but they have other topics
associated with them as well.
Now to see what these topics are, we can use the Freebase
Search API.
So Freebase has another service to be able to play
with filters.
So api-examples.freebaseapps.com.
I pre-filled in this particular
Freebase machine ID.
And you can see in the request that this video was actually
also about an octopus, so origami octopus, and that its
notable type is organism.
So we can group videos that are about organisms.
PHILIPP PFEIFFENBERGER: Cool.
So what does this tell us, except that it's possible to
build topic-agnostic applications?
Well, even though these entities are really specific
and really narrow, you can use Freebase to get more
information about each of the entities we
annotate a video with.
Also, if you're building an application
that's domain specific--
say, for example, a movie site--
you can use Freebase to look up the director of a movie, to
look up actors in a movie, and then to cluster videos based
on whether there are interviews, trailers, and so
on and so forth.
And of course, if you think this is really cool and you
want to play with it more, we've got a Codelab on Friday
where both Shirley and I will be TAing.
And we'll guarantee you that if you go there, you're going
to walk away with a working app that works with the Topics
API and the Freebase API.
And it's movies-based, so it's pretty cool.
OK, so we know we have some decent annotations that we can
do some cool stuff with, but our work is far from over.
We take quality really seriously, and we use human
evaluations to assess the centrality of entities.
We take raters, knowing their language, pairing them with
the language of the video, and ask them, given a video, is
the entity off-topic, relevant, or central?
And doing this, we've been able to reduce the number of
off-topic annotations over the past year while maintaining
coverage across the YouTube corpus.
Now aside from raters, we depend on your feedback.
So if you see off-topic annotations, or you see
systematic patterns of off-topic annotations, please
file a ticket using gdata-issues.
If you have more general questions about annotations,
you can reach out via YouTubeDev, and we'll get them
answered for you.
So there are some classes of problems that are pretty
nefarious that we're battling that I want to highlight for
just a second.
Common knowledge is one of them.
We assume that everyone knows what a daughter is, what a
mother is, what a baby is.
And therefore, we don't put the stuff on Wikipedia to a
great extent.
So for these really common concepts, the machine
repositories of knowledge that we have are actually pretty
barren, which makes it more difficult for us to annotate
these common concepts.
Similarly, new topics.
When "Harlem Shake" first got popular, we happily annotated
the video with "Harlem Shake, a dance introduced
in 1981." Not right.
After a week and a few days, we had an entity for Harlem
Shake the internet meme, but that's not good enough in a
world where memes tend to rise and fade within days,
Also, local facts.
If you upload a video simply entitled "Hiro's Sushi
Restaurant," we're going to have a really hard time
figuring out which Hiro's Sushi restaurant it is,
because there are thousands across the country.
However, in this case, it was titled "Hiro's Sushi
Restaurant--
Sedona, Arizona." So we can probably use this to figure
out which restaurant was actually mentioned, assuming
we have an entity for this restaurant in Freebase.
Lastly, overlapping names in the same concept space are
pretty tricky.
One of our partners, seevl, pointed out to us that there
was a little-known band called Nirvana that we annotated
wrong all the time.
And we were lost, because we know Nirvana really well.
We grew up with Nirvana.
And it turns out that there's a 1970s British psychedelic
band called Nirvana that we just didn't annotate when we
really should have.
Again, with more metadata, we can do a better job at
disambiguation.
If we have a video entitled "Nirvana--
In Bloom," which is one of their songs from the 1990s
band, we can get the right band without a problem.
But if we have "Nirvana--
Live in Bristol," and nothing else, and it's the 1970s band,
we're going to guess that's it the 1990s band, because it's a
safer guess without any other information.
And we'll get it wrong in that case.
So aside from fixing bugs, which is always fun, there's
some really exciting stuff that I'm working on right now
that I'm really excited about getting into.
One of them is relevant annotations.
So we've heard the cries for more entities per video, and
we're addressing it using relevant annotations.
So just like central annotations, relevant
applications are their own class of annotations.
They're entities that are relevant to the video and
would be of interest to someone watching the video.
So, for example, in Shirley's video, most likely "mirror"
would be relevant, at least.
Likewise, if we had a video of a live concert, the location
of the concert, band members that are featured in the
video, would also be relevant.
Now, relevant is not related.
A different band in the same genre would not be relevant.
Likewise, relevant is not low confidence
or low quality central.
It's its own distinct class of annotations that you can use
knowing that they're relevant.
Similarly, we'll be exposing a taxonomy of annotations that
we've established internally.
At this point, if we had a video of a tennis match, we'd
be happy annotating with the names of the tennis players,
the name of the tournament, and if we have any other
information, maybe the year of the tournament and so on.
However, even though you can get the information about it
that these are tennis players and it's about tennis from
Freebase, we want to offload some of that by exploiting
taxonomy and telling you explicitly this is a video
about tennis, about racquet sports, and about sports.
So to kind of drive this home a little bit, I took a
screenshot of a video on the right.
"DVF (through Glass)." DVF stands for Diane Von
Furstenberg.
And I want to ask you, looking at this video, what do you
think would be the central entities, relevant entities,
and taxonomy entities?
Any brave takers?
AUDIENCE: Fashion Week.
PHILIPP PFEIFFENBERGER: Fashion Week.
For central, relevant, or taxonomy?
Relevant?
OK.
Anyone else for central or taxonomy?
AUDIENCE: Glass.
PHILIPP PFEIFFENBERGER: Glass.
Very good.
For central, I'm guessing?
Anyone else?
Going once, going twice.
SHIRLEY GAW: Heard something in the audience.
AUDIENCE: Events.
PHILIPP PFEIFFENBERGER: Events.
For relevant?
AUDIENCE: For taxonomy.
PHILIPP PFEIFFENBERGER: Yeah, that could be.
OK.
So for this example, which I didn't actually annotate,
central, we actually annotated.
And we had Google Glass and Diane Von Furstenberg.
Relevant would be New York Fashion Week, because this is
where this video was shot.
And then taxonomy, we'd probably put it into gadgets
and technology, because it's primarily about Google Glass.
But similarly, events probably could also fall into the
taxonomy classification.
So hopefully, this answered your questions and maybe even
raised some new ones that we would love to
hear at this point.
[APPLAUSE]
PHILIPP PFEIFFENBERGER: Thank you.
SHIRLEY GAW: Thank you.
AUDIENCE: So I was curious about Minecraft example.
So how do you seed your data for that and sort of how
expansive that?
So would you have every game ever made in there?
PHILIPP PFEIFFENBERGER: I wish.
AUDIENCE: Or are you specifically listing which
things you care about?
PHILIPP PFEIFFENBERGER: I wish we had every game
ever made in there.
That would be a dream of mine.
We've got some things that we're classifying for.
I can't disclose the list of things, because it's not very
well defined.
But it's something that we're expanding to increase coverage
on, because it was a big hit in the first iteration, and we
definitely want to go further on it.
AUDIENCE: Hi.
I'm Lek Lek Mai from Yale University.
And we have a lot of videos that have topics that are not
covered in Freebase, but we have our own semantic
repository.
So what you would you suggest for the best way of connecting
all that up?
PHILIPP PFEIFFENBERGER: That's a good question.
I would reach out maybe to someone on the Knowledge team.
SHIRLEY GAW: Freebase has a session here too.
PHILIPP PFEIFFENBERGER: Yeah.
Definitely reach out to the folks on Freebase.
And people working in Knowledge in general, I think,
would be really great contacts for that.
Regrettably, we're only consumers of these knowledge
repositories, and we don't get to fully administer
what goes onto them.
SHIRLEY GAW: But this does come up as a sparseness in the
Knowledge Graph for specific use cases, and one of the
things to do is being able to contribute to that graph.
But since you've already had something more developed, you
might want to just directly ask Freebase how you can
contribute that information.
AUDIENCE: OK.
Could I ask one more quick question?
PHILIPP PFEIFFENBERGER: Of course.
AUDIENCE: So you talked about terms and vocabularies that
you're using.
And just wanted to ask, have you looked at other services
that have vocabularies in broader terms and narrower
terms, like the Getty Vocabulary, for example, and
utilizing those?
PHILIPP PFEIFFENBERGER: No, I have not.
That sounds really interesting though.
AUDIENCE: OK.
Maybe after the session.
PHILIPP PFEIFFENBERGER: Yeah, definitely.
AUDIENCE: Hi there.
I'm Jarom McDonald.
I'm from Brigham Young University.
One question that I had--
and you may have quickly glossed over it.
But if you have any further details, it'd be really
interesting.
Other types of YouTube annotations, the interactive
clicks and the questions that are in and so forth are able
to link temporally and spatial to your video, whereas it
looks like a lot of the things you can do with the Topics API
is just for the video as a whole.
And do you see any ability, either now or eventually, to
be able to link topics to individual temporal moments in
the video or spatial areas of the video?
PHILIPP PFEIFFENBERGER: That's a really good idea.
To be honest, at this point, we really want to get it right
for the video as a whole.
But having that finer granularity would definitely
be an asset.
AUDIENCE: Thanks.
SHIRLEY GAW: Thanks for the suggestion.
PHILIPP PFEIFFENBERGER: Yeah.
AUDIENCE: Hey.
So you guys were talking about the text metadata for a little
while and how that's kind of like the first line of defense
since you have it earliest.
And maybe I just missed it, but do you guys also work with
comments and user data afterwards as that comes in?
PHILIPP PFEIFFENBERGER: Yes, exactly.
So that's part of the context of the video.
So we look at the comments of the video, and we look at all
the web pages where the video appears.
And then, we also extract concepts from the comments and
from the web pages and try to figure, OK,
well, what's the overlap?
And sometimes, there can be some pretty
funny stuff that happens.
When I was first playing with this, there was a video of
"Another One Bites the Dust," and one of the entities that
kept coming up was Kim Jong-il.
And I'm like, why is that?
And it turns out that when Kim Jong-il passed away, in the
forums, people kept embedding this video.
So individually, these sources you can kind of forget about.
But again, once you have enough of them, and once you
have enough data, you can see what kind of things emerge
from them after doing some filtering.
AUDIENCE: Thanks.
AUDIENCE: Hi.
I was wondering if you also did speech recognition on the
video themselves to get extra text from that.
PHILIPP PFEIFFENBERGER: That's a good question.
So we've looked at a number of things--
speech recognition, transcripts, and
so on and so forth.
And what tends to happen is if, for example, you have a
video of the State of the Union, and you do speech
recognition, you'll figure that it's about the economy,
it's about jobs, it's about current events, and
so on and so forth.
But you miss that it's the State of the Union.
So it's something that we might look in again as a
guiding signal.
But by itself, oftentimes what's mentioned in the video
isn't necessarily what the video is about,
except in a few cases.
AUDIENCE: Hi.
So actually, that's kind of a related question to what I
want to ask.
A lot of the stuff you're doing seems to be where you
have an explicit word or something like infant that
represents a concept, and then you go and
find that in Freebase.
Is there, I guess, potential for expansion using something
like WordNet, where if you don't recognize maybe one of
the words in it, going and finding something that might
be semantically related?
Or is that kind of too noisy, I guess, for your approach?
PHILIPP PFEIFFENBERGER: I'm not familiar with WordNet.
But if there are words that we don't recognize, we basically
just don't treat them.
Like if you have the name of someone who we don't recognize
or who doesn't match any sort of concept,
then that's just skipped.
SHIRLEY GAW: So WordNet, I'm more familiar with that.
So it would be like synonyms and antonyms of that concept.
I don't know if it's actually a source for
the Knowledge Graph.
I can't specify for that one.
But in the case of the baby video, it was really tough.
Because "baby" can be, as he said, a Justin Bieber song.
So I did actually have to look at a data dump from Freebase
and see what kinds of concepts would support
the topic of baby.
So that's actually where it would come in.
I don't know if WordNet would do that for you, but you could
definitely use Freebase and the related concepts to
support that.
PHILIPP PFEIFFENBERGER: And we do support synonyms.
So for example, DVF would probably dereference to Diane
Von Furstenberg, especially with other
things supporting it.
So as long as you mention the concepts in the video and you
help us dereference it to get to the right ones, even if
there are synonyms, we try to be smart enough
to allow for that.
AUDIENCE: Thank you.
AUDIENCE: Hey.
You said in your simple case that you were just weighting
title as double the description.
Obviously, the world is not a simple case.
What machine learning approaches are you taking to
work out what these weightings should be?
PHILIPP PFEIFFENBERGER: We try a lot of things.
I can't speak to the actual approaches that we use in
detail, but it's definitely not the simple case.
I'm sorry, I can't answer to more detail on that.
Anyone else?
OK.
Well, we'll also be hanging out in the Sandbox for a few
minutes after this talk if you have questions you want to ask
one on one.
And thank you for attending this talk.
SHIRLEY GAW: Thanks very much.
[APPLAUSE]