Google I/o 2013 - Semantic video annotations in the youtube topics api - Theory and applications

SHIRLEY GAW: Hi. My name is Shirley, and I'm a software engineer working in the Paris office, and I'm on the YouTube Data API team. PHILIPP PFEIFFENBERGER: And I'm Philipp Pfeiffenberger. I'm also a software engineer working in the Paris office, and I'm working on the semantic annotation of YouTube videos. SHIRLEY GAW: So in this session, I'd like to go over how I taught YouTube that my baby is cute, but more generally, how you as a content creator can improve semantic annotations for your videos and channels. And also, how you as application developers can discover content on YouTube based on Freebase topics. So Philipp's team's been working on automatically annotating resources such as videos and channels with Freebase topics. Now, if you're not familiar with Freebase, it's an open, crowd-sourced Knowledge Graph where nodes corresponds to real world things-- so for example, a person or a place or a song-- and each of these nodes has a unique ID. So in the latest version of the YouTube Data API, you can actually look up, for example, a YouTube video and see what Freebase topics have been automatically associated with it. Furthermore, you can supply a Freebase topic ID and see what YouTube resources are related to that topic. So in the session, I'm going to talk about why you should care about the quality of these video annotations-- so how it's being used, and also how you can improve the quality of the annotations for your content. Philipp will go into my specific example explaining how we arrive at those video annotations and then other signals that go into the annotations process. Understanding that, we'll go back to my first example and explain how I was able to improve the annotations. And then, we'll walk through an integration between the Freebase API for getting related topics-- getting topics in general, actually-- and the YouTube Data API to find content on YouTube. And then finally, Philipp's going to talk about work that his team's been doing that will give you more Freebase annotations, and hopefully we'll see it in the API real soon. So why should you care about the quality of video annotations? Well actually, at YouTube, this is one of the signals we use for surfacing content and organizing it. So for example, if you're looking at the Home page, it's one of the signals that we use for featuring content-- also when you do search and some special features. Also specifically, we're using it as a signal in our video and channel recommendations, and this is how some external partners have been using it in the Topics API. So for example, Interesante is focused on Latino users and culturally-relevant content for those users. So they look in Freebase, find related topics for things that people are interested in, find content on YouTube, and then suggest that to their users as things that they can add in their collections. Showyou is more general, in that it's showing internet videos. And when you're watching something in the latest iPhone app, it'll show you the topic it thinks the video is about. It's using the Topics API as one of the signals. And then from there, it can suggest other internet videos related to that topic. Seevl is actually specializing in music recommendation. So say your friend likes an obscure band. If you're using the YouTube data as your music source, then you can actually find more YouTube videos related to that band. So now that you understand why video annotations and quality video annotations are important for content, let's go into my favorite example. So this is my daughter. And if you're a human being, what would you say this video is about? Be nice. [LAUGHTER] AUDIENCE: Self-discovery. SHIRLEY GAW: Self-discovery as kind of a step in development. AUDIENCE: I'd say it's about mirrors. SHIRLEY GAW: Mirrors. So you can talk about some things in the scene, particular people in the scene, describing what you see there. What does YouTube think of this video? So actually, we only expose this in the Topics API for now. So you don't see it on the site. What you do is make an HTTP GET request to googleapis.com. The latest version of YouTube. We look at the videos collection for a specific video ID. And now, we say that we're interested in the video annotations by saying in the response that we want part="topicDetails." So what did YouTube you think this video was about? There's no green, which means it doesn't think anything about my video. Of course, working on the Topics API, that's clearly not good enough for the mother. So I uploaded a second version of this video, and I made sure that there were some high quality annotations. I did a few tweaks and a little search engine optimization, which I'll discuss later. But now, when we query this new video, it's the exact same content. But now, we get two topics, which are shown in green. We can go to Freebase, append the Freebase ID, and you can see the first thing. The first topic is that it's about an infant. The second topic, cuteness. I like to conclude that I taught YouTube my baby is cute. PHILIPP PFEIFFENBERGER: Thanks, Shirley. So we've received a lot of questions from developers-- exactly how videos are annotated and how the annotations that we export for videos should be interpreted. In order to shed some light onto those questions, I want to walk through the annotation process using a few example cases. Well, before we can start annotating anything, we want to look at what kind of data we have available for the annotation process. We want to list these in the order of their availability. So first and foremost, we have the text metadata. At the time of upload, the uploader will insert some text-- a title, description, and so on-- and we have this immediately available. A few minutes after upload, we'll have extracted some audiovisual features that we can use to classify the video. And finally, if the video's popular enough, there may be some context, both on the open internet and on YouTube, that we can help to further guide annotation. So I want to illustrate these with one example video for each data type. But as I do this, we should keep in mind that for your average YouTube video, we try to make use of all three types of data. I want to start with Shirley's video, which was just uploaded and only has text metadata to work with at this point. So I'm going to walk through the annotation process using Shirley's video, the text, and the entities as an example. And as I do this, you should keep in mind that also for the other data sources, the process I walk through is the same. So we know that the text metadata is provided by the uploader. It includes title, description, any tags that the uploader included at that time. And we also know that text and concepts have a many-to-many association. That is, text is ambiguous. The concept of infant can be communicated by the word "baby," by the word "infant," by the word "toddler," so on. And likewise, the word "baby" without any sort of context can refer to a Justin Bieber song, can refer to an infant, can refer to a number of other things. For that reason, we depend on the text to be consistent to allow us to correctly dereference what concepts are mentioned therein. Now, even though we depend on the uploader to give us this text metadata and we have to correlate this with other data sources, it's really valuable, because it's available to us immediately. So now, to walk through the process. I'm not sure why it's not showing up. OK, we're back. Sorry about that. All right, so we have some text metadata. And luckily, we have enough of it to correctly dereference all the concepts within it. From Shirley's video, her description, and her title, we were able to extract the concepts of mother, daughter, infant, mirror, and so on. Now, you see that some of these are shown in bold. That's because we assign a score to each concept based on its prevalence in the metadata. For this simplified example, I'm simply giving twice the score for any entities that show up in the title. So now, we have some entities. We have some weighting. But we can't quite go ahead and say infant and mirror are what this video's about because they showed up in the title. In order to figure out what the central entities for the video are, we use links between concepts. We extract these from the open internet, where we learned that mother and daughter, daughter and infant, infant and cute tend to co-occur, and we can build a support graph between these concepts. So now, we have weighted entities. We have a support graph. How do we figure out which entities are central to the video? That's where we go to scoring and thresholding. So first, we give each entity one point simply for existing, two points for existing in the title in this example. And then, we give one additional point for each entity that links to that entity. So infant gets three more points, because three other entities link to it. Mirror gets one more point. And after applying some thresholding, we determine that the central entities for this video are infant and cute. Now, that's all fair and well when we have really good text metadata. But we're not always that lucky, especially in the world of gaming. Oftentimes, we get gaming videos that mention characters, that mention levels, but don't mention the game. If you look at the video on the right, you see "Book raid on the nether! Working Draft." If you're a human that plays games, you know it's Minecraft. But looking just at the metadata, you're kind of lost. Luckily, this is a gaming video. And luckily, for gaming videos, we're guaranteed predictable lighting conditions. And we're also guaranteed some static features-- status bar, fonts, and so on and so forth-- so we can train classifiers to help us identify these games. Now, this is great for games, but it's usefulness rapidly decays when you generalize to the general content we have on YouTube. If, for example, I were a really great set of classifiers, and I told you that some video featured a man in a top hat with a beard, a man with a t-shirt, flip flops in a mall, you wouldn't be able to deduce that this might be a modern-day parody involving Abraham Lincoln. Those types of nuances are lost when you use classifiers. However, because this is available just minutes after upload and applicable for some verticals, it remains in active development for us. But sometimes we don't even have audiovisual features to work with or no classifiers to match. So in the case where we have a video that lacks both good metadata and distinct audiovisual feature-- like the video on the right titled "Me at the zoo"-- if we had a good classifier, we might get "elephant" and we might get "man." That's about it. However, if we look at the context of the video-- that is, the discussions going on in the comments, the web pages that it's embedded in and what those web pages are about, and the overall user engagement-- then we can try to figure out what is notable about this video. And we can figure out that it's notable because Jawed Karim is in it, and he's one of the co-founders of YouTube, and this was the first video ever uploaded to YouTube. Now, individually, all of these signals are hideously noisy, as you can imagine. But on the aggregate, once you have enough of them, they become really powerful for really popular videos that we otherwise don't know much about. Now, you can't quite rely on this as uploaders, because you probably won't have every video reach 10 million views. But if you're consumers of the API, you can deduce from this that a video with a lot of views is probably going to have more confident annotations than a video with just a handful of views. So now that we know how annotations work, I'm going to give it back to Shirley, who's going to show us what she did on her second upload to get better annotations. SHIRLEY GAW: So let's talk about my favorite YouTube not-star. If you recall from the beginning of the session, I had two versions of the exact same video. The first version had no annotations, and the second version had two annotations. Now, what happened in between? The first thing to recall is that, for some reason, this video is not as popular as I would expect. And we don't have any blogosphere love and no comments, so that means we can't use the video context signal. Likewise, we don't have any audiovisual feature matching, so in this particular case, we're relying entirely on video metadata. So if we're doing that, then we need to have the video be public. Otherwise, it's not caught by our video annotations pipeline. Also, if we're relying on text metadata, that text metadata should say something. So the title should actually be a concise description of the content of the video, and we should be adding supporting text and tags. So the second version of video is the same content. But it's a public video, and we're adding metadata to support what we say that the video is about. PHILIPP PFEIFFENBERGER: So I've made a few references to centrality and central annotations without really defining what it is, and that's because it's a very narrow but powerful concept. We consider a video's annotations to be central if these applications are complete-- that is, given the annotations, you can figure out what the video is about-- if they're specific-- that is, if you can't replace any of these entities with a more specific entity that still refers to the same concept-- and if they're compact-- that is, if you can't remove any of these entities and still completely describe the video. For example, if we had a video of this talk and we wanted to annotate it, you would likely choose the entities "semantic annotation," "Google I/O 2013," and "YouTube API," because these would be complete-- you know what the video's about-- specific-- we can't find anything more specific for any of them-- and compact. You can't really remove any of them and still understand what it's about. However, if we had a single entity for this talk in Freebase, you would gladly remove all three of these and replace them with that one entity, because it would still be complete, specific, and definitely compact. You could use Freebase to still figure out that this was a talk at Google I/O, this was about YouTube API, and so on. A layman's way that we've kind of put this into words is an annotation is likely to be part of a set of central annotations if you would include the name of the entity in a one sentence description of the video. And separately, if you are curating a YouTube channel about this topic, would you choose this video as a canonical example about that entity? If you answer both of those questions as yes independently, then this is likely an entity that's part of the central set of annotations. It follows then that relevant is not central. A video of this talk annotated with an entity for Moscone Center, that entity would not be part of a central set of annotations, because we can remove it and still describe the talk. And also, of course, related is not central. Android API would not be part of a central set of annotations. So what does that mean for developers? Well, it means that for your average YouTube video, you can expect one to three very specific, very narrow annotations that should completely describe the video. Even though these are very specific and narrow, you can use the structured data in Freebase, which is available as a downloadable data dump, to get more information about these entities. Also, you should know that more popular videos will probably have more confidence in annotations, because we have more signals to help us annotate them. And if you're an uploader, you should use precise titles and cohesive descriptions that talk about what the video's about to help us give you the correct entities from the start and so that the other signals simply dovetail those correct entities we started from. So not just to talk about what you can do with these narrow entities, but I actually kind of want to show you what is possible with the central annotations that we have today. I'm not going to talk about what the demo does before I go into it, but you should know that this is not a rules-based demo. That is, it's agnostic about what kind of entity I'm currying for, and it's about maybe 60 lines of Python. So say, for example, I want to explore origami on YouTube. I don't know what origami's about except it's about folding paper, but I want to learn more. So I look up the entity on Freebase. And how are videos categorized if I use entities? Well, it turns out that origami on YouTube tends to be different shapes of organisms, and I can use this to learn how to fold origami. So I can start learning folding roses, cats. And after I've spent enough time indoors folding origami, I want to go outside and travel a little bit. So I want to look up what do we have about travel on YouTube. Probably videos categorized into countries and cities. And sure enough, we have travel videos about different countries and different cities and so on. After a lot of traveling and walking and sightseeing, I'm probably hungry. I want to check out some local cuisine. So I look up what videos we have about cuisine on YouTube, how we can structure those. And probably, we're going to see different ingredients and different types of food. So this is actually relatively straightforward. All I'm doing is I'm searching for 100 videos that were centrally annotated with origami. And then, I take all of the other entities that those videos were annotated with, and I use Freebase to look up their notable types. I then create clusters of notable types. Say, for example, I had a video annotated with origami and cat, another one origami and bat. Cat and Bat would fall into the canonical type of organism. And then, I choose the largest set of notable types, and then I present the entities along with the videos that were annotated with them. So I'm going to give it back to Shirley, who's going to walk us through the API calls that were made behind the scenes to make this happen. SHIRLEY GAW: So just to summarize, we're fetching content on YouTube based on a specific Freebase topic ID. That's selected from the user using the Freebase suggestion widget. Once we have a topic ID, we can discover content on YouTube using universal search. So for YouTube, that means that we can find videos, channels, and playlists. In this particular case, we're interested only in the videos. So suppose that the user selected origami, like shown earlier. Then, we want to find content on YouTube related to origami, and we get a bunch of videos-- 100 in this case. Now, we want to see, are these videos about more than just origami? We want to find what are the other central topics. So we've go to the video's list service, find those central topics, and then look up the notable types and cluster based on that, which is the coloring that he showed in the demo. So first of all, Freebase suggestion widget. This is just something that you can find going to this website, and they'll give you instructions for embedding it in your web page. Users type in text their suggested entries. Now, you get a Freebase machine ID. Let's switch over to API Explorer. So this is a nice way of being able to play with Google APIs for different products. So developers.googl e.com/apis-explorer. You select the YouTube Data API, so the latest public version of our Data API. You'll see these are different resources and operations on those resources. In this case, we want to discover content on YouTube related to a specific topic. And then, once we find those videos that we're interested in, let's get more information about them through videos list. So let's start off with finding content on YouTube. The nice thing about the API Explorer is that it explains what all the different parameters are using this text on the right. And you can see what all of them possibly are and how they're used. And then, if it's in red, it means it's a required parameter for the request. Now, I've pre-filled out this form. And recall the user selected a specific suggested topic ID, topic, and then we get the Freebase machine ID. Oops, that's a bit overkill. We say we want to find videos, type="video" on this specific topic. I'm using origami as the example. And we want the IDs of the resources, and for this demo, I'm going to show you the metadata as well. Now, Philipp's demo uses 100 videos and fetches that. In the case of the API, we can only get 50 at a time, so we're going to have to do two calls to get 100 videos. I'm going to execute this request again. You'll see here this is the HTTP GET request that you need to make, and it pretty-prints the format of the results-- so different origami videos. We can then select these video IDs. If you recall, we have videos related to a specific topic. Now, we're going to see what other essential annotations there are. So I have preselected three of these results, and we go to the videos list service up here, GET just like in the example I showed before with the baby. In this case though, I'm just filling it out in a form in the API Explorer. So part="topicDetials." Comma-separated video IDs here. Execute the request. And you see in the response that some videos have not just origami as a central topic, but they have other topics associated with them as well. Now to see what these topics are, we can use the Freebase Search API. So Freebase has another service to be able to play with filters. So api-examples.freebaseapps.com. I pre-filled in this particular Freebase machine ID. And you can see in the request that this video was actually also about an octopus, so origami octopus, and that its notable type is organism. So we can group videos that are about organisms. PHILIPP PFEIFFENBERGER: Cool. So what does this tell us, except that it's possible to build topic-agnostic applications? Well, even though these entities are really specific and really narrow, you can use Freebase to get more information about each of the entities we annotate a video with. Also, if you're building an application that's domain specific-- say, for example, a movie site-- you can use Freebase to look up the director of a movie, to look up actors in a movie, and then to cluster videos based on whether there are interviews, trailers, and so on and so forth. And of course, if you think this is really cool and you want to play with it more, we've got a Codelab on Friday where both Shirley and I will be TAing. And we'll guarantee you that if you go there, you're going to walk away with a working app that works with the Topics API and the Freebase API. And it's movies-based, so it's pretty cool. OK, so we know we have some decent annotations that we can do some cool stuff with, but our work is far from over. We take quality really seriously, and we use human evaluations to assess the centrality of entities. We take raters, knowing their language, pairing them with the language of the video, and ask them, given a video, is the entity off-topic, relevant, or central? And doing this, we've been able to reduce the number of off-topic annotations over the past year while maintaining coverage across the YouTube corpus. Now aside from raters, we depend on your feedback. So if you see off-topic annotations, or you see systematic patterns of off-topic annotations, please file a ticket using gdata-issues. If you have more general questions about annotations, you can reach out via YouTubeDev, and we'll get them answered for you. So there are some classes of problems that are pretty nefarious that we're battling that I want to highlight for just a second. Common knowledge is one of them. We assume that everyone knows what a daughter is, what a mother is, what a baby is. And therefore, we don't put the stuff on Wikipedia to a great extent. So for these really common concepts, the machine repositories of knowledge that we have are actually pretty barren, which makes it more difficult for us to annotate these common concepts. Similarly, new topics. When "Harlem Shake" first got popular, we happily annotated the video with "Harlem Shake, a dance introduced in 1981." Not right. After a week and a few days, we had an entity for Harlem Shake the internet meme, but that's not good enough in a world where memes tend to rise and fade within days, Also, local facts. If you upload a video simply entitled "Hiro's Sushi Restaurant," we're going to have a really hard time figuring out which Hiro's Sushi restaurant it is, because there are thousands across the country. However, in this case, it was titled "Hiro's Sushi Restaurant-- Sedona, Arizona." So we can probably use this to figure out which restaurant was actually mentioned, assuming we have an entity for this restaurant in Freebase. Lastly, overlapping names in the same concept space are pretty tricky. One of our partners, seevl, pointed out to us that there was a little-known band called Nirvana that we annotated wrong all the time. And we were lost, because we know Nirvana really well. We grew up with Nirvana. And it turns out that there's a 1970s British psychedelic band called Nirvana that we just didn't annotate when we really should have. Again, with more metadata, we can do a better job at disambiguation. If we have a video entitled "Nirvana-- In Bloom," which is one of their songs from the 1990s band, we can get the right band without a problem. But if we have "Nirvana-- Live in Bristol," and nothing else, and it's the 1970s band, we're going to guess that's it the 1990s band, because it's a safer guess without any other information. And we'll get it wrong in that case. So aside from fixing bugs, which is always fun, there's some really exciting stuff that I'm working on right now that I'm really excited about getting into. One of them is relevant annotations. So we've heard the cries for more entities per video, and we're addressing it using relevant annotations. So just like central annotations, relevant applications are their own class of annotations. They're entities that are relevant to the video and would be of interest to someone watching the video. So, for example, in Shirley's video, most likely "mirror" would be relevant, at least. Likewise, if we had a video of a live concert, the location of the concert, band members that are featured in the video, would also be relevant. Now, relevant is not related. A different band in the same genre would not be relevant. Likewise, relevant is not low confidence or low quality central. It's its own distinct class of annotations that you can use knowing that they're relevant. Similarly, we'll be exposing a taxonomy of annotations that we've established internally. At this point, if we had a video of a tennis match, we'd be happy annotating with the names of the tennis players, the name of the tournament, and if we have any other information, maybe the year of the tournament and so on. However, even though you can get the information about it that these are tennis players and it's about tennis from Freebase, we want to offload some of that by exploiting taxonomy and telling you explicitly this is a video about tennis, about racquet sports, and about sports. So to kind of drive this home a little bit, I took a screenshot of a video on the right. "DVF (through Glass)." DVF stands for Diane Von Furstenberg. And I want to ask you, looking at this video, what do you think would be the central entities, relevant entities, and taxonomy entities? Any brave takers? AUDIENCE: Fashion Week. PHILIPP PFEIFFENBERGER: Fashion Week. For central, relevant, or taxonomy? Relevant? OK. Anyone else for central or taxonomy? AUDIENCE: Glass. PHILIPP PFEIFFENBERGER: Glass. Very good. For central, I'm guessing? Anyone else? Going once, going twice. SHIRLEY GAW: Heard something in the audience. AUDIENCE: Events. PHILIPP PFEIFFENBERGER: Events. For relevant? AUDIENCE: For taxonomy. PHILIPP PFEIFFENBERGER: Yeah, that could be. OK. So for this example, which I didn't actually annotate, central, we actually annotated. And we had Google Glass and Diane Von Furstenberg. Relevant would be New York Fashion Week, because this is where this video was shot. And then taxonomy, we'd probably put it into gadgets and technology, because it's primarily about Google Glass. But similarly, events probably could also fall into the taxonomy classification. So hopefully, this answered your questions and maybe even raised some new ones that we would love to hear at this point. [APPLAUSE] PHILIPP PFEIFFENBERGER: Thank you. SHIRLEY GAW: Thank you. AUDIENCE: So I was curious about Minecraft example. So how do you seed your data for that and sort of how expansive that? So would you have every game ever made in there? PHILIPP PFEIFFENBERGER: I wish. AUDIENCE: Or are you specifically listing which things you care about? PHILIPP PFEIFFENBERGER: I wish we had every game ever made in there. That would be a dream of mine. We've got some things that we're classifying for. I can't disclose the list of things, because it's not very well defined. But it's something that we're expanding to increase coverage on, because it was a big hit in the first iteration, and we definitely want to go further on it. AUDIENCE: Hi. I'm Lek Lek Mai from Yale University. And we have a lot of videos that have topics that are not covered in Freebase, but we have our own semantic repository. So what you would you suggest for the best way of connecting all that up? PHILIPP PFEIFFENBERGER: That's a good question. I would reach out maybe to someone on the Knowledge team. SHIRLEY GAW: Freebase has a session here too. PHILIPP PFEIFFENBERGER: Yeah. Definitely reach out to the folks on Freebase. And people working in Knowledge in general, I think, would be really great contacts for that. Regrettably, we're only consumers of these knowledge repositories, and we don't get to fully administer what goes onto them. SHIRLEY GAW: But this does come up as a sparseness in the Knowledge Graph for specific use cases, and one of the things to do is being able to contribute to that graph. But since you've already had something more developed, you might want to just directly ask Freebase how you can contribute that information. AUDIENCE: OK. Could I ask one more quick question? PHILIPP PFEIFFENBERGER: Of course. AUDIENCE: So you talked about terms and vocabularies that you're using. And just wanted to ask, have you looked at other services that have vocabularies in broader terms and narrower terms, like the Getty Vocabulary, for example, and utilizing those? PHILIPP PFEIFFENBERGER: No, I have not. That sounds really interesting though. AUDIENCE: OK. Maybe after the session. PHILIPP PFEIFFENBERGER: Yeah, definitely. AUDIENCE: Hi there. I'm Jarom McDonald. I'm from Brigham Young University. One question that I had-- and you may have quickly glossed over it. But if you have any further details, it'd be really interesting. Other types of YouTube annotations, the interactive clicks and the questions that are in and so forth are able to link temporally and spatial to your video, whereas it looks like a lot of the things you can do with the Topics API is just for the video as a whole. And do you see any ability, either now or eventually, to be able to link topics to individual temporal moments in the video or spatial areas of the video? PHILIPP PFEIFFENBERGER: That's a really good idea. To be honest, at this point, we really want to get it right for the video as a whole. But having that finer granularity would definitely be an asset. AUDIENCE: Thanks. SHIRLEY GAW: Thanks for the suggestion. PHILIPP PFEIFFENBERGER: Yeah. AUDIENCE: Hey. So you guys were talking about the text metadata for a little while and how that's kind of like the first line of defense since you have it earliest. And maybe I just missed it, but do you guys also work with comments and user data afterwards as that comes in? PHILIPP PFEIFFENBERGER: Yes, exactly. So that's part of the context of the video. So we look at the comments of the video, and we look at all the web pages where the video appears. And then, we also extract concepts from the comments and from the web pages and try to figure, OK, well, what's the overlap? And sometimes, there can be some pretty funny stuff that happens. When I was first playing with this, there was a video of "Another One Bites the Dust," and one of the entities that kept coming up was Kim Jong-il. And I'm like, why is that? And it turns out that when Kim Jong-il passed away, in the forums, people kept embedding this video. So individually, these sources you can kind of forget about. But again, once you have enough of them, and once you have enough data, you can see what kind of things emerge from them after doing some filtering. AUDIENCE: Thanks. AUDIENCE: Hi. I was wondering if you also did speech recognition on the video themselves to get extra text from that. PHILIPP PFEIFFENBERGER: That's a good question. So we've looked at a number of things-- speech recognition, transcripts, and so on and so forth. And what tends to happen is if, for example, you have a video of the State of the Union, and you do speech recognition, you'll figure that it's about the economy, it's about jobs, it's about current events, and so on and so forth. But you miss that it's the State of the Union. So it's something that we might look in again as a guiding signal. But by itself, oftentimes what's mentioned in the video isn't necessarily what the video is about, except in a few cases. AUDIENCE: Hi. So actually, that's kind of a related question to what I want to ask. A lot of the stuff you're doing seems to be where you have an explicit word or something like infant that represents a concept, and then you go and find that in Freebase. Is there, I guess, potential for expansion using something like WordNet, where if you don't recognize maybe one of the words in it, going and finding something that might be semantically related? Or is that kind of too noisy, I guess, for your approach? PHILIPP PFEIFFENBERGER: I'm not familiar with WordNet. But if there are words that we don't recognize, we basically just don't treat them. Like if you have the name of someone who we don't recognize or who doesn't match any sort of concept, then that's just skipped. SHIRLEY GAW: So WordNet, I'm more familiar with that. So it would be like synonyms and antonyms of that concept. I don't know if it's actually a source for the Knowledge Graph. I can't specify for that one. But in the case of the baby video, it was really tough. Because "baby" can be, as he said, a Justin Bieber song. So I did actually have to look at a data dump from Freebase and see what kinds of concepts would support the topic of baby. So that's actually where it would come in. I don't know if WordNet would do that for you, but you could definitely use Freebase and the related concepts to support that. PHILIPP PFEIFFENBERGER: And we do support synonyms. So for example, DVF would probably dereference to Diane Von Furstenberg, especially with other things supporting it. So as long as you mention the concepts in the video and you help us dereference it to get to the right ones, even if there are synonyms, we try to be smart enough to allow for that. AUDIENCE: Thank you. AUDIENCE: Hey. You said in your simple case that you were just weighting title as double the description. Obviously, the world is not a simple case. What machine learning approaches are you taking to work out what these weightings should be? PHILIPP PFEIFFENBERGER: We try a lot of things. I can't speak to the actual approaches that we use in detail, but it's definitely not the simple case. I'm sorry, I can't answer to more detail on that. Anyone else? OK. Well, we'll also be hanging out in the Sandbox for a few minutes after this talk if you have questions you want to ask one on one. And thank you for attending this talk. SHIRLEY GAW: Thanks very much. [APPLAUSE]