Nips 2011 Learning Semantics Workshop - Towards more human - Like machine learning of word meanings

>> I'm sorry, because that was [INDISTINCT] with other word [INDISTINCT] it's a lot of [INDISTINCT] >> It sounds ridiculous. >> No. No, no, no. That was--that was last minute. I'm just--I'm curious how many of [INDISTINCT] in other class I give them already. All right, it's over break. I think we over a little--it's a little bit of [INDISTINCT] I would say about it. So the reason why I'm interested between word meanings in semantics because it can [INDISTINCT] signs and absolute sensible to [INDISTINCT] but the whole [INDISTINCT] you can be accomplish in that submits your learning because I think when you say how we've learn semantics it forces a state to change in large and brief idea of learning. There's this paradigm if you don't missed this goals on this basically. It's this idea that learning as statistics on the grand scale, picking data, high dimensions, finding a structures. Supervise, unsupervised learning if follow that and finding structure in high dimensional data. And we know how to capture this in [INDISTINCT] organization. We know how to document some of this and nothing--that was on your way. Through this we know how to relate neuron works in the brain, and that will get us about. And this kind of map taking technique outside of neuroscience context has been really useful in building for the real world useful systems that you know start to realizing you know it's kind of like promise of AI. Whether it's in face of [INDISTINCT] or Google you know it's--its kind of place but its worth still in recognizing particular when talking about learning semantics that--that Google we might not a authority but originally as AI system. But its--it's you know basically the world's best AI semantic system that anybody could imagine if you--if you went back to the early days of their life. I'm impressed with that achievement. Or you know meeting most recently the idea on lots of system for claiming jeopardy which means some acquit to the world's champion at this game. These are systems that take real word language data and definitely get some kind of meaning out of it. But I want to highlight with some of the invitations of these systems are because they sort of, you know, point it towards the need to actually understands semantics. Semantic is going to require enlarging our tool in assistive of learning. So in just a couple of my favorite examples here, Google's tell me correction, again you might think of this as a semantic exercises, it's statistical pattern recognizing. It's, you know, [INDISTINCT] technology which is [INDISTINCT] of a lot of applying [INDISTINCT]. It works amazingly will. I can take their name instead of Google version of that query. You can all read it. Google map we reads it gets you a useful answer. But that take something like this. You know how to read this? It's kind of hard to read but... >> Anyway this is you--something... >> Something, yeah. Let me rearrange the order of the words. >> Are you agreed this [INDISTINCT] >> All I did between here and here was rearrange the order of the words and of course you might have see people send these things around my email. These are cute examples. It turned that you can read and rearrange the letter pretty much massively within words. And people can still read it if you preserve two things you got preserve this sequential structure of the sentence. And has it to be a meaningful sentence. So I'm going to say this was if you are intuitive abilities or natural cognitive abilities for, you know, finding the signal and noise of language. You might not thought it still in correction as a semantic exercise but I think that's a very concrete way to see it. First, Google has nothing to say. I've be--and once and again this is impressive. Is anybody watched this thing on TV? >> Yeah. >> The--let's take it to the other one. To me, the nearest thing about you can watch this on YouTube or I just going to--repeating [INDISTINCT] this number some like--okay. I mean some article they just--they get [INDISTINCT] >> Now, you proof by [INDISTINCT] >> That's how you'd be count but--and you have this. Okay. So, the key things we look at here is, look at the second [INDISTINCT] there's need of--I'll show you their first choice and then the second and third one. Didn't quite with that one but the most certain to see to me is how often it is the case that--the cause weather--and we pause in here. Regardless of weather--yes, watching as the right answer is the second and third choices are often make they'll sense at all. They are like incompletely semantically anomalies. The best thing in this screen... >> The first modern crossword puzzle is published & Oreo cookies are introduced. Ken? >> KEN: What is the 20's. >> No. Watson? >> WATSON: What is 1920's. >> No. Ken just said that. >> It has no speech record to response with us. We have to know what's going to have people say. That maybe true to you. >> To push one of these paper products is to stretch established limits. Brad? >> BRAD: What is the envelope? >> Good. >> BRAD: alternating to 800. >> Stylish elegance or students who all graduated in the same year. Watson? >> WATSON: What is chic? >> No. Sorry. Brad? >> What is class? >> Class, you got it. >> Okay. Anything--you can watch that I just picked around--well, I don't know if it's particular or bad but you can see cases where it's definitely the same because they're not--they're not just like sort of wrong answer but there just like, you know, answers with no cue or whatever--whatever you thinking. So what's the gap here? Well, yeah. Well, that's in too much about how you builds the systems which are really impressive change, I don't want to diminish that. I am [INDISTINCT] of scientist or to trying to understand the computational terms. How children are able to learn the abstract meaning of words. And what I taught of you here is just to give a--to observe of a--of some of the words that we know that people in our field doing. Trying to get it something really interesting things about how humans learn words. And I guess it title to talk with more human like machine learning. Most of the things I'm going to tackle here are really very useful machine learning techniques. They're more like using the language of machine learning and stretching it to tackles the human learning of semantic concepts. And ways that hopefully there will be so inspired for machine learning people if I move more in this direction in kind of applications that we care about the method. So some of the things that your interested on our actual, we can do a one child learning, how we can learn the meanings of words from very few examples. And how that supportive by what we call it learning to learn. This is something I taught about in one of the morning workshops. And I'll discover a very quick to over that. But--then [INDISTINCT] of instruction like how we can learn abstract concept or types of concepts. Concepts that have been of course with wordings but don't actually how to direct perceptional core. It doesn't mean they are grounded. What will see is the--these becomes are grounded but they're kind of cognitive grounded. They're grounded as part of the goal they played in intuitive theory. And that theory grounds out the perception. We're very interested in learning context sensitive language like we'll show you--anything just very simple attitude for example that you might take for granted because thinking on what they mean. It turns out some quick solve to describe what they mean. You mean to use interesting concept programs to describe how they work. For function words, words to get it like, "the or every or three or of." Words that don't see to have any direct reference. But they already blew of conversational semantics that help give language at the level of the sentence. It's mean--I think that's probably enough. Well, let see if your knowledge, I will try to give to a tool and emphasize a couple themes, ideas, you know. Ideas that we think, you know, we talk our thesis it can explain how human children were able to learn these things. The idea of basic learning over holistic models with certain kinds of forms, these were go beyond the simplest kind of machine learning tool. If they often have rich charitable structural, we often capture the generative process. Each of the calls of process of the world in retro based then we currently uses in tackling the machine learning. And then lets, you know, really good of that. The heart of semantics is they have this conversational language of thought here. And I think that's a themes it's--I didn't see a lot of the [INDISTINCT] certainly a theme that I think people have been emphasizing certainly the last and it's really what brings us here. Just one other kind of motivating example I'll come back to this at the end. I'm very interested in the origins of--ultimate origin of knowledge. And I think it highlights some of the issues in semantic acquisition if people in developmental psychology have focused on looking an analogies between children learning and the origins of scientific theories. So if you ask yourself, you know, how did Newton learn or come to this knowledge like the universal off gravitation or how did mental cognitive stereo genetics. You know, they observed data and particularly in mental skills where Robert is familiar I believe in Newton's case where the astronomical data is pretty noisy. There are times that statistic reference, right? You can't just deductively re basing to [INDISTINCT] the data. You also can't just crunch them through somewhere you know there's no stars of no. These are context optimization problem that leads to this being the output, you know, the best description for orbits and planets, right? Or this theory, you know, specifying the idea of genes and the legal and domain recessive in whole sort of, you know, combinatorial structure that men will came up with there. Now, there is something more like program induction and this a concept that is you know, it's a sort of scary one but I think many of us were interested learning semantics sooner or later come down to it, come back to it, right? So that somehow you could imagine that there's some kind of program that describes the semantic content of one of these theories like, you know, the theory of natural selection of this. The origins of species or something you could--you could express Newton Law of the kind of program that's maybe of holistic program or you can explain a noted data. And certainly, it handles genetics as a classic kind of called holistic program where you--you have accurate functions and the probabilities on the things you don't know and like we believe [INDISTINCT] the original genetic state of calculations. You start of before you cross several generations. And then you compare different hypotheses which correspond to different models in this phase of holistic programs. You compare their [INDISTINCT] on the data and some you know, broadly speaking basing approach where you have prior depends if they're simpler, shorter programs and you have the light in there which is how well we uses to pass the program can capture the distribution of your observations. That this gives you a kind of scoring function and then you face this terribly difficult and terribly scary search problem. But what, you know, like it's our best understanding that's, you know, anyway you think about the origin of scientific knowledge. I think that when we start you know, you start to think about as semantics, we're going to be coming back to think like that. So I'll go very quickly through so these words or these things in fact I could probably just skip it by want to just highlight how working on these problems started to get as thinking in terms of program reduction. We want to understand how, given these few examples like that, you can--you can learn those word, against, you know, it's someone you call it's already slide before but again, it's not very hard to see but those several two first as you can see, allowing that [INDISTINCT] and our left is in too far. The one below it isn't. The one below that, the third one down isn't but the one to the right probably is. And we can think about what's going on because we're forming some hypothesis that captures the structure of these objects something like in evolutionary program. It doesn't have to have natural selection in it but something like, you know. If you saw the--seems some of that are coalescent based are clustering models some kind of [INDISTINCT] tree structure that explains how these objects might be derived and whether this some kind of branching process that captures this super categories like you can see all of here are some kind of funny plants and that corresponds to highly over branch these two phase down here are maybe correspond to this branch. And by picking this--by taking this cluster one, you know, unlabelled set of objects and deriving something like this kind of a horrible causal genetic model. You can generate of hypothesis based of [INDISTINCT] given by the words with now [INDISTINCT] on branches of this tree. And that turns out to be a reasonable way that have to have children learning generalize one of our few examples in this some further insight. But I don't want to dwell on this. I want to focus on the questions of where the hypothesis based come from. Like how do we know what is the relevant features of objects to explain but something like a tree structured model. Well, this is the kind of thing that--again I called up some in earlier morning talk and one of the other workshops. And I won't go into that much but this is where the Russell, [INDISTINCT] and I have been doing also [INDISTINCT]. We've been trying to understand something about how you can learn what features count for word learning. And it's--again this idea of a tree structured kind of model that captures classes and super classes to be able to say some--there's some sense in which similar categories have similar similarity meant for some. Using this idea, we can have, you know, learn from multiple related examples of different classes and we think this is a--this is a basic thing that even children are able to do. It's--there's been really interesting work and development of psychology explains something about how trying to capture how children's realizing this is--I'm trying to compress the whole topic with couple of slides probably nothing makes sense to anybody. But, anyway, did really interesting work and development that psychologists have done in trying to--trying to explain to explain how children's word learning accelerates very dramatically around [INDISTINCT], let's say, between one and a half and two years of age. And this kind of learning to learn in a higher [INDISTINCT] model is a--I think compelling way. We found some structure. I think I'll have to say check out this--that--this [INDISTINCT] Ross and I really interested in that. And come to a--come to our first sort of program reduction. And the keys which is to say well, it's one thing to be learning to learn in higher behavior model. But how do you get the idea that you should be building some kind of a tree-structured representation at all? And then, this is where but Charles had to and I did. That we're trying to describe very simple kinds of programs that could capture the unscratched structural form of different kinds of representations. To be--to be able to say, what we see something like objects which are, you know, it was animals or plants, they have some features and to recognize that something like tree structure. This is it. The tree, let's learn from it, the data set of animals and their features at--is--it is the best way to capture the structure their. Not as insidious going in with the higher clustering [INDISTINCT] in. And just sorting things horrifically seeming that looking for. But actually trying to learn but for this data set something like tree structures the right way to capture what's going on? And with Charles, I came out with the cleverly way to do that, it's like we didn't miss the giving simple kinds of graph grammars which are very simple kind of programs for growing up structures and then by doing inferences in higher [INDISTINCT] model, where the top level is some kind of a very simple growth for growing graphs. And then that's basically generates prior on models which are at that level below that which means some of the tools of [INDISTINCT] graphical models as the standard principles of ideas for you to find some kind of smooth distribution of what features already in the objects with a covariance structure of the galaxy in process that corresponds to the inverse of the graphical [INDISTINCT] or fancy lane with for basically setting. Things that are nearby in the graph tend to have the same features. That's the way if we give inferences in this hierarchical model to learn not only the best see, tree structure graph but to learn the very ideal tree by scoring at the top level. This quality will be different programs will grow in different kinds of structural forms for representation. And you'll find that seem kind of framework to say blowing data being in this case was this how US Supreme Court judges voted [INDISTINCT] lots of cases. And there you discover a quality of different structure, this linear left ray structure with more liberal judges over on the left and the more conservative ones over on the right. You can take cross products of simple structural forms and get for example, let's take that distances between cities on the blow when you get this cross of a chain in a ring that's sort of cross product of two simple graph grammars which corresponds to latitude with longitude or you take for example data on phases where we generate that these with the phase of size in program which very the reason that masculinity gender dimension of the faces and they are the [INDISTINCT] to discover that across of two chain structural models gives this is the best way to describe what's going on that data. There reason I want to highlight this some semantic point of view is the same with, you know, If we want to learn representations, we could do on supervised learning in any kind compressive systems with the use of [INDISTINCT]. We could use [INDISTINCT] machine, whatever. But having a model like this, extracts representations with these abstract structural form but this semantically meaningful, right? It--in Biology, these internal nodes of a tree those have means like fish or mammal, right? In this case, we will talk about [INDISTINCT]. We talk about left and right. We're talking about the parts of the structure. Those are more abstract concepts that in order to--you can understand what those words mean, you have to know what you're talking about this one dimensional structure or magnitude or longitude. What are those words mean? They refer to the abstract parts of these form, they don't correspond to individual concepts or nodes that we started of with. But it's sometimes the possible meanings of those words are discovered by these kind of [INDISTINCT]. Or you know, these dimensions here. Or we talk about, you know, masculinity or we talk about black or white. It pretty shows dimensions, right? Listen, again, those words matter and their semantics are in the sense you know, this is in a word learning model. But it's a model that's able to discover the kinds of concepts. There could the semantics of those abstractions. So, this is more fit you know, subsummarizing stuff that mediate talking about few years ago. And now, what we've been doing more recently is really gaining more at language and that's why I just want to tell about we work with a couple of students from our group. One comes from a thesis by Lauren Smith. Who is studying learning context-sensitive meanings, in the particular case of real adjectives? Since I wasn't here this morning I'm not sure if people already talk to us. Anyway, they really talk about adjectives like tall. I don't want to reverse that familiar example of it. But it's pretty interesting, you know what I mean? These are very simple words like tall, short, heavy, light, strong, good and we might think that their meanings should be pretty simple too, right? What we mean tall is, basically we all know that tall, we see it right? Okay. The higher, the taller. That's what we think. But think about the context-sensitivity, right? The sense of tall think of fit. The sense of height that corresponds to being tall if you are a tree is different that if you're a boy, right? A tall boy wouldn't be very boy for a tree or a tall building, right? The tall tree wouldn't be very tall for a building. So that's so what tall means has some kind of inherent context-sensitivity. Something like this. Like, you know, tall tree, tall, tall the [INDISTINCT] something like greater than the mean value for tress on the dimension of height. And you could write this in some kind of simple program in that you know, I could--I am not expert [INDISTINCT]. So excuse me if there is a mistake here but basically what we're saying is tall has some function which was taken in instance in a class. And say, "Is it true that this thing x is tall relatives to that class?" And that then, we're just describing formal content there which is something like check if the height of x is greater than the meaning of a height of some samples. So we want to assume he have some function and sample instances of the class, say you do that 20 times, we take a little statistical sample of the height. We take the mean and we plot if x is greater than the mean. Now, I go to total to write it out. Like it's because I think many of you might look at this definition and say, "What is that the right definition of height? And you're of--of tall?" What do you think? >> Much greater. >> Yeah, might be much greater. Then we might say greater than one stand deviation [INDISTINCT] It's not just not enough to be in the top 50%, right? So then we could write that down which is slightly more complicated, right? Well, you know, maybe it's one, maybe it's .5, we don't know. Maybe there's just a free parameter here that has [INDISTINCT] along with the structure of the program where it turns out when you start looking at this makes you do the experiments in psychophysics. It seems like some of kind of more sub ordinal statistic [INDISTINCT] maybe it's more like greater than the 65th quintile of trees under dimension and height. You could even write that down too. And we did experiments which [INDISTINCT] the details of the experiment but basically, giving people various distributions of place such as showing you some of the stimuli and they had to pick out the tolerance, right? And then we test it various smalls what tall could be. Is it defined by some number of standard deviations about the meaning? Is it defined by some sort of non-parametric quantum statistic? We also found a lot of success with cluster moles and we could see him like what tall, like, mean is cluster the objects along with dimension of height and then find the highest cluster and that's the tolerance. And we were able to, not really distinguish very clearly between those models and with these--one of these experiments but later experiments were able to kind of teach us. This is somewhat messy story like a lot of semantics but the point I want to drag your attention to is that need to be learned with hypothesis based it looks like these things. Now, supposed you want to--you want to go beyond as well, one of the interesting reasons we see when we look at great [INDISTINCT] is there--well, there's are all classic works that happens form and we might want to be able to capture that semantic structure right so the whole classic words which have a meaning like this relative some class c, it's greater than some quantile on some dimension. And we can write down that each direction, right? It's high level kind of [INDISTINCT] and I think there's evidence like he is actually learning that in the sense that once they sort of get the idea of how these attitudes work. We can now learn new great [INDISTINCT] for new dimensions very quickly because they understand how they work where it takes something like good or strong which are particularly interesting ones where unlike tall like for taller dimension is fixed, part of learning, right? Like refers to the dimension height but good it certainly has a category specific dimension, right? Like a good man is different from a good cheeseburger. It's different from a good conference, right? In each phase you--the class doesn't just stress by the reference set that you're going to be computing with it. It also specifies what's relevant variables for man, you know, a good man is something maybe like ethical rest for cheeseburger in some dimension of taste and for conference, it's some dimension of intellectual associate stimulation or whatever, right? And so good is, it's a [INDISTINCT] kind of words, strong as kind of like that, like even more obstructed. So these are hired but, you know, approachable challenges, I think. Here's some orbit seen on the LCD has been doing, he wrote it very nice pieces looking at a bunch of different sort of aspects of learning functional language. One case like that, you look at it as the case of learning to use basic number of words like one, two, three and so on, partly because there's been a lot of incurable work recently in the last decade or two in common science looking at how kids learn these words. So the kind of task that we do with kids here is you might say, "Now, can you count the balloons?" And pretty early on by age two and half or smart kids barely will do that, smart kids in our society. So they'll go, they'll learn to count your team, one, two, three, four, five, six, right? And if there were ten, we might know it's ten, we may go [INDISTINCT] to the content. But there's evidence that, that's just really an alerter team not a real semantic obstruction and one way to test whether the kids understand what number or words mean is the so called give and task. So you ask the kid, " Can you give me three balloons?" So if you take a kid where there's no trouble counting up to three or ,certainly, six and you say, "Can you give me three balloons?" Well, at the age of which kids first are able to do the counting thing, they are now very good at that. They might give you a random number where in typical age, where you say two years, ten months, kids are at a stage of what's called being one-knower. That means if you say, "Can you give me one balloon? " The'll give you one balloon. But if you say, "Can you give me two or three or four balloons." They'll just give you some arbitrary number more than one. That's not--it's a little bit more [INDISTINCT] that more a lot of what happens. And then, a few months later, kids become a two-knower right around age three [INDISTINCT] hypothetical ages. So if you say, "Can you give me one balloon?" They can give you one. You say, "Can you give me two." They'll give you two. But if you say "Can you give me three." They'll give you some number more than two, sometimes three sometimes four and so on. And then they're just three numbers that you give which takes another few months. And you could imagine going on like that four-knower, five-knower and soon or later, they'll get to you, whatever, but doesn't work that way. There's this really interesting lead of obstruction that happens typically after the three numbers stage, occasionally, there's three, four number stage where kids suddenly get all the other word meanings. Not--this isn't the same as kind of discovering sweet infinity, the idea that there's no largest number. It's more than all numbers routine are counting routine. So all the numbers that they can access when they go on to provide the same [INDISTINCT] so on. Now, they understand how to fix those cardinalities in this given task. So somehow, we want to explain this learning career. Why it takes actually relatively long time writing at this characteristics sequence of knower stages and then there's this interesting kind of lead of obstruction to the--what's called the CP over cardinal principle of literature. So the latest [INDISTINCT] is he said, you know, starting to give something that looks like a real language of thought here, give various kinds of prevalent functions that can be useful for expressing numerical knowledge, of--you know, write all of these in lambda calculus which will allow you to express various kinds of stages of knowledge so you can write down a three-knower as it observe higher function you could write down a--so that's the cardinal principle knower up there which is a recursion so, basically, you count down the calculus and you map that course only to the set sizes. You can now add most of weird things like a two not one-knower or a two-unknower or even have other, you know, other kinds of languages have counts of which is more like singular, plural, just one, you know, anything more than two is or anything more than one is two. The reason why this is interesting for my [INDISTINCT] point of view is because people have emphasized there's this sort of, you know, it's kind of [INDISTINCT] stimulus if you like that there's this, you know, the possible way to map words on the card nowadays is not really determined by the data and we wanted to be--how [INDISTINCT] space was able to express not the only the actual lexicons that we could see that many other possibilities. And so then on Steve did various stimulations where you imagine to know if there were--stimulates the child who's, you know, getting mostly true but sometimes noisy data of an adult or a cabinet speaker, you know, referring to the cardinality of a set and the--they basically have a [INDISTINCT] program already in set here where you have a fire over those expressions which is basically a medium of description kind of prior and then the likelihood which is how well is explained the observation taking to count the possibility of noise which could come for the fact that you're getting data, you know, possible some is using the words incorrectly but it's also, it's more often the case that, you know, you might have--if you don't really understand the pragmatics of how these words are used, you might have to say these four glasses here but somebody could say, "Well, I'm goingI to take two glasses." Like you could use a number that isn't necessary just referring to the obviously cardinality of set. We wanted to build in that kind of robotics. Now, the national figure of any Asian learning Christian should be asking here is, you know, if you're going to define a prime over all these programs and learn in that way, you could just possibly work or how could you make it constructible as not some simple context recommendation. And it's--we don't have it using solutions to that what we're dealing is a kind of thing that probably most people of Asian Learning would blanch out because we're going kind of MCMC in defining a [INDISTINCT] grammar over these expressions. So I won't really go to the detail. The basic of this is, this MCMC is [INDISTINCT] and is the same difference out of the basically as in the church as far as the programming language which involves making basically having a--having a program [INDISTINCT] traits and then making a proposals where you take some cut off that program traits at some point and then [INDISTINCT] of resample a new possibility and be accept and reject in the [INDISTINCT] it's amazing that it works at all it's not very efficient, we obviously need to do better but it's enough with all these sorts of tasks you're able to affect that these search is a very large space of lexicons where some, you know, the number of possible number lengths here that--these album searches that searches, you know, tons of thousands of processes. And then we look through learning [INDISTINCT] that we look at a function of the amount of data that the child proceeds what concerns the learning, what order, well, you know, the whole acts we do, the things this learner doesn't seem what children do mainly the highest rank sort of number lexicon initially is the one-knower and there's a stage of two-knower and then the stage of three-knower and then grab it quickly you get the CP-knower. And the others were the four, five and six-knowers and so on. Just--at this point, basically, are too complex relatively to the [INDISTINCT] of the data where that CP-knower with the lead to that refers to the function winds up being the best candidate. So if you [INDISTINCT] it's a kind of [INDISTINCT] needs probabilistic programming [INDISTINCT] and this is--this is really the first model in the cognitive science [INDISTINCT] they were able to explain these characteristics number of learning code. Steve apply the same kind of approach to learn quantifiers and want me to go through the details but basically, we need to find a similar kind of landing calculus representation for substandard separation of quantifiers like none every, some in most basically did the same kind of basic learning set-up and was able to show that this qualifiers could be learned to get similar kinds of learning career. Now, the data on the order of which children learned is, this--the qualifiers is not as clear it has been the study of such intensive case studies but now that we have models of [INDISTINCT] is motivating language development research to go out and check on this. How much time do I have? >> Ten minutes. >> Ten minutes? Okay. So the last sort of set of things I want to come back to is what I was trying to motivate at the beginning, it's hard for me the motivation to think about program induction but also it's--is this idea that is one of the most the deepest in quantum ideas in development of Psychology that sits between language acquisition and conceptual development more generally. It's sometimes called the theory theory. This idea that children's knowledge is organized into these contributive theories that are affects systems of concepts, they are kind of like scientific theories, it's not like--and that what words mean of these particular words that refer to aspect concepts that don't have a direct conceptual grounding but are really important to how children think about the word as they label the concepts of thinking how they learn to think, how they learn to understand things that we don't directly observe but we learn about through our combination of observation and with basic interaction with others. So this idea that we need to be able to capture, you know, hypothetical candid word meanings the sometimes abstracts in intuitive theory. And we got that originally by starting to look at, you know, learning call long as the sort of the stuff we're all [INDISTINCT] you directed a sort of graphs that can capture calls relations, this is sort of a QMR live network, if you're familiar with it, classic disease symptom once but there's an extra set of variables which capture these risk factors. And the standard from [INDISTINCT] called the learning approach would be to hook down the prior [INDISTINCT] which we have just possible directly set the graphs on these 12 variables and observe patients who are samples from this called law. And you could try to do call the learning from sample data. It's a really hard though until learning to do structure like this and you completely [INDISTINCT] with the generic prior and then worked it because me and others we're doing on just purely on counseling, didn't think that we were sending semantics. But I'll show you why we came to that. We said well maybe you could do causal learning much better if you had a high level hypothesis, some kind of abstract, intuitive theory of this domain which divides the variables into these several classes, we're calling behavior, diseases and symptoms. And if you know that--let say, this variables working, [INDISTINCT] smoking, some other not categoric behaviors and flu, bronchitis, lung , some other disease, category of headache, fever, coughing are symptoms. And you have this--the [INDISTINCT] that says, all colleagues go symptoms to diseases and diseases to symptoms. And so that--you have--you'll only going to be learning at that fits that high level sema, that's a hugely useful constraint for causal learning. It cuts down the hypothesis phase from hypotheses--space of all possible [INDISTINCT] on 12 variables which is super exponential writing the number of nodes are very--something like 521 Zillion, really, I think. But if you have that high level knowledge, it cuts it down to only a hundred and thirty one thousand and learning is much more efficient. Now, the reason why this is interesting from machine learning point of view, is that we don't just have to while ahead in some kind of, hand full of knowledge. But we came up with the way to learn that, actually. This is where that Tom Griffiths and Charles Kemp and I started doing of the actual machine learning site is done by the [INDISTINCT] is part of his undergrad master thesis. So what he worked out was basically a way to like sort of freeze the time kind that left out the most of demand but basically, he worked out a way to--I'm going [INDISTINCT] to this one. He worked out a way to define like a non parametric based kind of, Chinese restaurant process, prior over groupings of variables which subject to the grouping. One of which is shown up on top then that gives you a prior on graphs here. It's basically just a kind of pro holistic version of what you can see up there. And then you do inference at both of these levels where at the top level, you're make your--for the--trying to decide which class of different variables go in. And then subject to that clustering up there, that it's prior on graphs and so then you're doing inference at this level basically looking and looking at possible errors which way the causal errors go. So the inference here consists of MCMC at these two levels just to the two discreet variables of class membership of the variables and which exist consists in the graph. And it's quite remarkable how well this is--this is able to work. It's--it doesn't, you know, we didn't figure out how to scale, that's a very large problems. But what we showed was how to take, you know, what these relatively small graphs which is still very, very hard to learn from like here's a two--a two layer sort of disease symptom, graphing them all which require something like a thousand samples to learn with high accuracy if you--if you're just learn the generic [INDISTINCT] but if you allow results to learn at this higher level where you consider the possibility that the nodes could be divided into this groups and then after it's [INDISTINCT] on graphs, then you can learn from, you know, order of magnitude, less data maybe about 80 samples and even just 20 samples from that graph, you could identify those abstract concepts, you can figure out by the first six variables are kind of the causes and the next ten are effects. And that's--and you'll get that kind of getting the big picture first is a very human like kind of learning. But it's also about semantic, right? Because words like diseases and symptoms, those are words that are important for our intuitive concepts of, you know, intuitive theories in medicine and well, I think they're not nodes in the graphical model about they're labels for these more abstract concepts and if you want to understand how those words get learned, we need ways of generating the word meanings with this kind of hierarchal based in learning is able to do. It's sort of even greater extension, this is some worth that no one could [INDISTINCT] did. Where they supposed [INDISTINCT] we described that the abstract knowledge that share across different, called the networks was some kind of first order logic theory that basically winds up capturing the sort of [INDISTINCT] intervention semantics have caused, in a sense, we call this learning to be causal because this is basically a learner who knows about the [INDISTINCT] graphical model but doesn't understand the [INDISTINCT] intervention semantics of what it really means to be a cause, some sort of--sort of a aero breaking semantics and I won't go through details. But basically, by searching over a space of [INDISTINCT] that characterize what it means to be an intervention, this systems is able to learn how interventions work and use that to learn better causal learnings. Within a sense, it's kind of learning with the word cause means. The last example I want to give is again sort of--sort of core theory, theory stuff. It's--the--there's this term from Philosophy, Philosophy of Science and Philosophy of Language called conceptual role semantics which I won't go through the scope because time is short. But basically, there's--because I that words to take their meaning by referring to concepts which has some kind of abstract relation to role, you can see this very clearly in the classic scientific theories like what is force and mass [INDISTINCT] energy or momentum, those words cannot be defined if--as Philosophy of Science would say in observation language. There's no way to sort of break definition in terms of, you know, axis and x dots for what forces you connect, define acceleration at but you can't define force with mass. Another hand, the way you define force with mass is you say F=ma and then you observed some objects in motion and you can--you can parse that, you can lead off where the forces and masses are like indeed. In another words, the words are defined by their meaning and that's conceptual assisted. So this is his idea of an intuitive theory that specifies these laws, these concepts. When I was getting in [INDISTINCT] genetics, I mean, I went through this, you know, quickly but it's that--that's kind of thing we're talking about like these concepts like gene, [INDISTINCT] recessive dominant, veno type, geno type. Those are typical words in biology which, you know, what do they mean? They--you can't define them in terms of teapot, right? They're--it's abstract concepts which when you understand the role in playing the theory, then you know what they mean. And you know what they mean because you know how to use them to make sense of the data you observe. So places where these kind of business comes up a lot in human learning, in semantic learning is for example in intuitive psychology. So if you saw the--I showed you some of these demos yesterday in the language of vision workshop. But, you know, just to show it for those who haven't seen it, will remind you, you know, you can show infants these two ballrolling around the green surface. But adults and even young infants see this as something kind of psychological or intentional like the blue one is chasing the red one or this classic [INDISTINCT] and similar business here. You know, you could see this as two triangles moving in a circle moving but people actually see it as more like, you know, a big guy kind of, maybe bullying or beating up on this little guy, scaring you away, that implies hiding. You know, we see--we see psychology in even fairly simple motion like this. So what is this--what's the--what's the intuitive theory here? Well, this is some example of works from Chris Baker and our group where we try to infuse a basic theory of a--of this intentional agent. You have actions that you can observe and those are caused by goals and believes and a rational agent basically tries to choose action or plan sequences of actions which lead to--likely to this--to the achievement of goals such as their beliefs. And you can add a lot of things like the goals are drawn from some prior this notion of preference, beliefs are formed by proceeding the goal, some kind of general role knowledge. And again, these are--the meanings of words that are absolutely critical in, you know, in the kind of abstract concepts that children use when they're learning about how to understand people's behavior, words like goal or belief or action or agent or preference, you know. What do you know, what do you prefer? Well, the meanings of those words to claim is, are basically these structures in this kind of abstracts [INDISTINCT] and I'm very probably [INDISTINCT] What we've done--this is the point of [INDISTINCT] skips yesterday's topic, sorry for that. But anyways, so we've--we haven't built models here of how you can learn this kind of theory and we haven't really try to do [INDISTINCT] what we have done is try to build models of how this really works, how people can use a pro holistic model defined in this way to make sense of agent's goals or make sense of their joint preferences and beliefs like to read, to watch somebody [INDISTINCT] their environment and try to refer what they want and what they know. An we think this provides the hypothesis basically for these abstract [INDISTINCT] it's not perceptional grounding, it's--I call it cosmo grounding because the words correspond, I mean, but plays the role and you want--they want perception of grounding further allow me to do with these obstructions. If the theory is a whole, this was grounded in perceptional experience and the words refer to pieces of theories. So just to wrap up then, I told you about the program of research that is, you know, very much in progress and the language aspects of it are particularly very much in progress. People who are interested in working on this, you know, please come and talk to me. But these aspects of one child learning, learning to learn and learning abstractions, learning content sensitivity, learning function words are absolutely important. And we think that this tool kit of learning with the sufficiently richer representations, things that basically come down to pro holistic program for learning. It's the kind of programs induction is a list of a way forward. There's really hard in infants problems which, again, we need your--we need your help on that but we think that at least this is [INDISTINCT] of how the most interesting parts of semantics might be required.