Tip:
Highlight text to annotate it
X
>> I'm sorry, because that was [INDISTINCT] with other word [INDISTINCT] it's a lot of
[INDISTINCT] >> It sounds ridiculous.
>> No. No, no, no. That was--that was last minute. I'm just--I'm curious how many of
[INDISTINCT] in other class I give them already. All right, it's over break. I think we over
a little--it's a little bit of [INDISTINCT] I would say about it. So the reason why I'm
interested between word meanings in semantics because it can [INDISTINCT] signs and absolute
sensible to [INDISTINCT] but the whole [INDISTINCT] you can be accomplish in that submits your
learning because I think when you say how we've learn semantics it forces a state to
change in large and brief idea of learning. There's this paradigm if you don't missed
this goals on this basically. It's this idea that learning as statistics on the grand scale,
picking data, high dimensions, finding a structures. Supervise, unsupervised learning if follow
that and finding structure in high dimensional data. And we know how to capture this in [INDISTINCT]
organization. We know how to document some of this and nothing--that was on your way.
Through this we know how to relate neuron works in the brain, and that will get us about.
And this kind of map taking technique outside of neuroscience context has been really useful
in building for the real world useful systems that you know start to realizing you know
it's kind of like promise of AI. Whether it's in face of [INDISTINCT] or Google you know
it's--its kind of place but its worth still in recognizing particular when talking about
learning semantics that--that Google we might not a authority but originally as AI system.
But its--it's you know basically the world's best AI semantic system that anybody could
imagine if you--if you went back to the early days of their life. I'm impressed with that
achievement. Or you know meeting most recently the idea on lots of system for claiming jeopardy
which means some acquit to the world's champion at this game. These are systems that take
real word language data and definitely get some kind of meaning out of it. But I want
to highlight with some of the invitations of these systems are because they sort of,
you know, point it towards the need to actually understands semantics. Semantic is going to
require enlarging our tool in assistive of learning. So in just a couple of my favorite
examples here, Google's tell me correction, again you might think of this as a semantic
exercises, it's statistical pattern recognizing. It's, you know, [INDISTINCT] technology which
is [INDISTINCT] of a lot of applying [INDISTINCT]. It works amazingly will. I can take their
name instead of Google version of that query. You can all read it. Google map we reads it
gets you a useful answer. But that take something like this. You know how to read this? It's
kind of hard to read but... >> Anyway this is you--something...
>> Something, yeah. Let me rearrange the order of the words.
>> Are you agreed this [INDISTINCT] >> All I did between here and here was rearrange
the order of the words and of course you might have see people send these things around my
email. These are cute examples. It turned that you can read and rearrange the letter
pretty much massively within words. And people can still read it if you preserve two things
you got preserve this sequential structure of the sentence. And has it to be a meaningful
sentence. So I'm going to say this was if you are intuitive abilities or natural cognitive
abilities for, you know, finding the signal and noise of language. You might not thought
it still in correction as a semantic exercise but I think that's a very concrete way to
see it. First, Google has nothing to say. I've be--and once and again this is impressive.
Is anybody watched this thing on TV? >> Yeah.
>> The--let's take it to the other one. To me, the nearest thing about you can watch
this on YouTube or I just going to--repeating [INDISTINCT] this number some like--okay.
I mean some article they just--they get [INDISTINCT] >> Now, you proof by [INDISTINCT]
>> That's how you'd be count but--and you have this. Okay. So, the key things we look
at here is, look at the second [INDISTINCT] there's need of--I'll show you their first
choice and then the second and third one. Didn't quite with that one but the most certain
to see to me is how often it is the case that--the cause weather--and we pause in here. Regardless
of weather--yes, watching as the right answer is the second and third choices are often
make they'll sense at all. They are like incompletely semantically anomalies. The best thing in
this screen... >> The first modern crossword puzzle is published
& Oreo cookies are introduced. Ken? >> KEN: What is the 20's.
>> No. Watson? >> WATSON: What is 1920's.
>> No. Ken just said that. >> It has no speech record to response with
us. We have to know what's going to have people say. That maybe true to you.
>> To push one of these paper products is to stretch established limits. Brad?
>> BRAD: What is the envelope? >> Good.
>> BRAD: alternating to 800. >> Stylish elegance or students who all graduated
in the same year. Watson? >> WATSON: What is chic?
>> No. Sorry. Brad? >> What is class?
>> Class, you got it. >> Okay. Anything--you can watch that I just
picked around--well, I don't know if it's particular or bad but you can see cases where
it's definitely the same because they're not--they're not just like sort of wrong answer but there
just like, you know, answers with no cue or whatever--whatever you thinking. So what's
the gap here? Well, yeah. Well, that's in too much about how you builds the systems
which are really impressive change, I don't want to diminish that. I am [INDISTINCT] of
scientist or to trying to understand the computational terms. How children are able to learn the
abstract meaning of words. And what I taught of you here is just to give a--to observe
of a--of some of the words that we know that people in our field doing. Trying to get it
something really interesting things about how humans learn words. And I guess it title
to talk with more human like machine learning. Most of the things I'm going to tackle here
are really very useful machine learning techniques. They're more like using the language of machine
learning and stretching it to tackles the human learning of semantic concepts. And ways
that hopefully there will be so inspired for machine learning people if I move more in
this direction in kind of applications that we care about the method. So some of the things
that your interested on our actual, we can do a one child learning, how we can learn
the meanings of words from very few examples. And how that supportive by what we call it
learning to learn. This is something I taught about in one of the morning workshops. And
I'll discover a very quick to over that. But--then [INDISTINCT] of instruction like how we can
learn abstract concept or types of concepts. Concepts that have been of course with wordings
but don't actually how to direct perceptional core. It doesn't mean they are grounded. What
will see is the--these becomes are grounded but they're kind of cognitive grounded. They're
grounded as part of the goal they played in intuitive theory. And that theory grounds
out the perception. We're very interested in learning context sensitive language like
we'll show you--anything just very simple attitude for example that you might take for
granted because thinking on what they mean. It turns out some quick solve to describe
what they mean. You mean to use interesting concept programs to describe how they work.
For function words, words to get it like, "the or every or three or of." Words that
don't see to have any direct reference. But they already blew of conversational semantics
that help give language at the level of the sentence. It's mean--I think that's probably
enough. Well, let see if your knowledge, I will try to give to a tool and emphasize a
couple themes, ideas, you know. Ideas that we think, you know, we talk our thesis it
can explain how human children were able to learn these things. The idea of basic learning
over holistic models with certain kinds of forms, these were go beyond the simplest kind
of machine learning tool. If they often have rich charitable structural, we often capture
the generative process. Each of the calls of process of the world in retro based then
we currently uses in tackling the machine learning. And then lets, you know, really
good of that. The heart of semantics is they have this conversational language of thought
here. And I think that's a themes it's--I didn't see a lot of the [INDISTINCT] certainly
a theme that I think people have been emphasizing certainly the last and it's really what brings
us here. Just one other kind of motivating example I'll come back to this at the end.
I'm very interested in the origins of--ultimate origin of knowledge. And I think it highlights
some of the issues in semantic acquisition if people in developmental psychology have
focused on looking an analogies between children learning and the origins of scientific theories.
So if you ask yourself, you know, how did Newton learn or come to this knowledge like
the universal off gravitation or how did mental cognitive stereo genetics. You know, they
observed data and particularly in mental skills where Robert is familiar I believe in Newton's
case where the astronomical data is pretty noisy. There are times that statistic reference,
right? You can't just deductively re basing to [INDISTINCT] the data. You also can't just
crunch them through somewhere you know there's no stars of no. These are context optimization
problem that leads to this being the output, you know, the best description for orbits
and planets, right? Or this theory, you know, specifying the idea of genes and the legal
and domain recessive in whole sort of, you know, combinatorial structure that men will
came up with there. Now, there is something more like program induction and this a concept
that is you know, it's a sort of scary one but I think many of us were interested learning
semantics sooner or later come down to it, come back to it, right? So that somehow you
could imagine that there's some kind of program that describes the semantic content of one
of these theories like, you know, the theory of natural selection of this. The origins
of species or something you could--you could express Newton Law of the kind of program
that's maybe of holistic program or you can explain a noted data. And certainly, it handles
genetics as a classic kind of called holistic program where you--you have accurate functions
and the probabilities on the things you don't know and like we believe [INDISTINCT] the
original genetic state of calculations. You start of before you cross several generations.
And then you compare different hypotheses which correspond to different models in this
phase of holistic programs. You compare their [INDISTINCT] on the data and some you know,
broadly speaking basing approach where you have prior depends if they're simpler, shorter
programs and you have the light in there which is how well we uses to pass the program can
capture the distribution of your observations. That this gives you a kind of scoring function
and then you face this terribly difficult and terribly scary search problem. But what,
you know, like it's our best understanding that's, you know, anyway you think about the
origin of scientific knowledge. I think that when we start you know, you start to think
about as semantics, we're going to be coming back to think like that. So I'll go very quickly
through so these words or these things in fact I could probably just skip it by want
to just highlight how working on these problems started to get as thinking in terms of program
reduction. We want to understand how, given these few examples like that, you can--you
can learn those word, against, you know, it's someone you call it's already slide before
but again, it's not very hard to see but those several two first as you can see, allowing
that [INDISTINCT] and our left is in too far. The one below it isn't. The one below that,
the third one down isn't but the one to the right probably is. And we can think about
what's going on because we're forming some hypothesis that captures the structure of
these objects something like in evolutionary program. It doesn't have to have natural selection
in it but something like, you know. If you saw the--seems some of that are coalescent
based are clustering models some kind of [INDISTINCT] tree structure that explains how these objects
might be derived and whether this some kind of branching process that captures this super
categories like you can see all of here are some kind of funny plants and that corresponds
to highly over branch these two phase down here are maybe correspond to this branch.
And by picking this--by taking this cluster one, you know, unlabelled set of objects and
deriving something like this kind of a horrible causal genetic model. You can generate of
hypothesis based of [INDISTINCT] given by the words with now [INDISTINCT] on branches
of this tree. And that turns out to be a reasonable way that have to have children learning generalize
one of our few examples in this some further insight. But I don't want to dwell on this.
I want to focus on the questions of where the hypothesis based come from. Like how do
we know what is the relevant features of objects to explain but something like a tree structured
model. Well, this is the kind of thing that--again I called up some in earlier morning talk and
one of the other workshops. And I won't go into that much but this is where the Russell,
[INDISTINCT] and I have been doing also [INDISTINCT]. We've been trying to understand something
about how you can learn what features count for word learning. And it's--again this idea
of a tree structured kind of model that captures classes and super classes to be able to say
some--there's some sense in which similar categories have similar similarity meant for
some. Using this idea, we can have, you know, learn from multiple related examples of different
classes and we think this is a--this is a basic thing that even children are able to
do. It's--there's been really interesting work and development of psychology explains
something about how trying to capture how children's realizing this is--I'm trying to
compress the whole topic with couple of slides probably nothing makes sense to anybody. But,
anyway, did really interesting work and development that psychologists have done in trying to--trying
to explain to explain how children's word learning accelerates very dramatically around
[INDISTINCT], let's say, between one and a half and two years of age. And this kind of
learning to learn in a higher [INDISTINCT] model is a--I think compelling way. We found
some structure. I think I'll have to say check out this--that--this [INDISTINCT] Ross and
I really interested in that. And come to a--come to our first sort of program reduction. And
the keys which is to say well, it's one thing to be learning to learn in higher behavior
model. But how do you get the idea that you should be building some kind of a tree-structured
representation at all? And then, this is where but Charles had to and I did. That we're trying
to describe very simple kinds of programs that could capture the unscratched structural
form of different kinds of representations. To be--to be able to say, what we see something
like objects which are, you know, it was animals or plants, they have some features and to
recognize that something like tree structure. This is it. The tree, let's learn from it,
the data set of animals and their features at--is--it is the best way to capture the
structure their. Not as insidious going in with the higher clustering [INDISTINCT] in.
And just sorting things horrifically seeming that looking for. But actually trying to learn
but for this data set something like tree structures the right way to capture what's
going on? And with Charles, I came out with the cleverly way to do that, it's like we
didn't miss the giving simple kinds of graph grammars which are very simple kind of programs
for growing up structures and then by doing inferences in higher [INDISTINCT] model, where
the top level is some kind of a very simple growth for growing graphs. And then that's
basically generates prior on models which are at that level below that which means some
of the tools of [INDISTINCT] graphical models as the standard principles of ideas for you
to find some kind of smooth distribution of what features already in the objects with
a covariance structure of the galaxy in process that corresponds to the inverse of the graphical
[INDISTINCT] or fancy lane with for basically setting. Things that are nearby in the graph
tend to have the same features. That's the way if we give inferences in this hierarchical
model to learn not only the best see, tree structure graph but to learn the very ideal
tree by scoring at the top level. This quality will be different programs will grow in different
kinds of structural forms for representation. And you'll find that seem kind of framework
to say blowing data being in this case was this how US Supreme Court judges voted [INDISTINCT]
lots of cases. And there you discover a quality of different structure, this linear left ray
structure with more liberal judges over on the left and the more conservative ones over
on the right. You can take cross products of simple structural forms and get for example,
let's take that distances between cities on the blow when you get this cross of a chain
in a ring that's sort of cross product of two simple graph grammars which corresponds
to latitude with longitude or you take for example data on phases where we generate that
these with the phase of size in program which very the reason that masculinity gender dimension
of the faces and they are the [INDISTINCT] to discover that across of two chain structural
models gives this is the best way to describe what's going on that data. There reason I
want to highlight this some semantic point of view is the same with, you know, If we
want to learn representations, we could do on supervised learning in any kind compressive
systems with the use of [INDISTINCT]. We could use [INDISTINCT] machine, whatever. But having
a model like this, extracts representations with these abstract structural form but this
semantically meaningful, right? It--in Biology, these internal nodes of a tree those have
means like fish or mammal, right? In this case, we will talk about [INDISTINCT]. We
talk about left and right. We're talking about the parts of the structure. Those are more
abstract concepts that in order to--you can understand what those words mean, you have
to know what you're talking about this one dimensional structure or magnitude or longitude.
What are those words mean? They refer to the abstract parts of these form, they don't correspond
to individual concepts or nodes that we started of with. But it's sometimes the possible meanings
of those words are discovered by these kind of [INDISTINCT]. Or you know, these dimensions
here. Or we talk about, you know, masculinity or we talk about black or white. It pretty
shows dimensions, right? Listen, again, those words matter and their semantics are in the
sense you know, this is in a word learning model. But it's a model that's able to discover
the kinds of concepts. There could the semantics of those abstractions. So, this is more fit
you know, subsummarizing stuff that mediate talking about few years ago. And now, what
we've been doing more recently is really gaining more at language and that's why I just want
to tell about we work with a couple of students from our group. One comes from a thesis by
Lauren Smith. Who is studying learning context-sensitive meanings, in the particular case of real adjectives?
Since I wasn't here this morning I'm not sure if people already talk to us. Anyway, they
really talk about adjectives like tall. I don't want to reverse that familiar example
of it. But it's pretty interesting, you know what I mean? These are very simple words like
tall, short, heavy, light, strong, good and we might think that their meanings should
be pretty simple too, right? What we mean tall is, basically we all know that tall,
we see it right? Okay. The higher, the taller. That's what we think. But think about the
context-sensitivity, right? The sense of tall think of fit. The sense of height that corresponds
to being tall if you are a tree is different that if you're a boy, right? A tall boy wouldn't
be very boy for a tree or a tall building, right? The tall tree wouldn't be very tall
for a building. So that's so what tall means has some kind of inherent context-sensitivity.
Something like this. Like, you know, tall tree, tall, tall the [INDISTINCT] something
like greater than the mean value for tress on the dimension of height. And you could
write this in some kind of simple program in that you know, I could--I am not expert
[INDISTINCT]. So excuse me if there is a mistake here but basically what we're saying is tall
has some function which was taken in instance in a class. And say, "Is it true that this
thing x is tall relatives to that class?" And that then, we're just describing formal
content there which is something like check if the height of x is greater than the meaning
of a height of some samples. So we want to assume he have some function and sample instances
of the class, say you do that 20 times, we take a little statistical sample of the height.
We take the mean and we plot if x is greater than the mean. Now, I go to total to write
it out. Like it's because I think many of you might look at this definition and say,
"What is that the right definition of height? And you're of--of tall?" What do you think?
>> Much greater. >> Yeah, might be much greater. Then we might
say greater than one stand deviation [INDISTINCT] It's not just not enough to be in the top
50%, right? So then we could write that down which is slightly more complicated, right?
Well, you know, maybe it's one, maybe it's .5, we don't know. Maybe there's just a free
parameter here that has [INDISTINCT] along with the structure of the program where it
turns out when you start looking at this makes you do the experiments in psychophysics. It
seems like some of kind of more sub ordinal statistic [INDISTINCT] maybe it's more like
greater than the 65th quintile of trees under dimension and height. You could even write
that down too. And we did experiments which [INDISTINCT] the details of the experiment
but basically, giving people various distributions of place such as showing you some of the stimuli
and they had to pick out the tolerance, right? And then we test it various smalls what tall
could be. Is it defined by some number of standard deviations about the meaning? Is
it defined by some sort of non-parametric quantum statistic? We also found a lot of
success with cluster moles and we could see him like what tall, like, mean is cluster
the objects along with dimension of height and then find the highest cluster and that's
the tolerance. And we were able to, not really distinguish very clearly between those models
and with these--one of these experiments but later experiments were able to kind of teach
us. This is somewhat messy story like a lot of semantics but the point I want to drag
your attention to is that need to be learned with hypothesis based it looks like these
things. Now, supposed you want to--you want to go beyond as well, one of the interesting
reasons we see when we look at great [INDISTINCT] is there--well, there's are all classic works
that happens form and we might want to be able to capture that semantic structure right
so the whole classic words which have a meaning like this relative some class c, it's greater
than some quantile on some dimension. And we can write down that each direction, right?
It's high level kind of [INDISTINCT] and I think there's evidence like he is actually
learning that in the sense that once they sort of get the idea of how these attitudes
work. We can now learn new great [INDISTINCT] for new dimensions very quickly because they
understand how they work where it takes something like good or strong which are particularly
interesting ones where unlike tall like for taller dimension is fixed, part of learning,
right? Like refers to the dimension height but good it certainly has a category specific
dimension, right? Like a good man is different from a good cheeseburger. It's different from
a good conference, right? In each phase you--the class doesn't just stress by the reference
set that you're going to be computing with it. It also specifies what's relevant variables
for man, you know, a good man is something maybe like ethical rest for cheeseburger in
some dimension of taste and for conference, it's some dimension of intellectual associate
stimulation or whatever, right? And so good is, it's a [INDISTINCT] kind of words, strong
as kind of like that, like even more obstructed. So these are hired but, you know, approachable
challenges, I think. Here's some orbit seen on the LCD has been doing, he wrote it very
nice pieces looking at a bunch of different sort of aspects of learning functional language.
One case like that, you look at it as the case of learning to use basic number of words
like one, two, three and so on, partly because there's been a lot of incurable work recently
in the last decade or two in common science looking at how kids learn these words. So
the kind of task that we do with kids here is you might say, "Now, can you count the
balloons?" And pretty early on by age two and half or smart kids barely will do that,
smart kids in our society. So they'll go, they'll learn to count your team, one, two,
three, four, five, six, right? And if there were ten, we might know it's ten, we may go
[INDISTINCT] to the content. But there's evidence that, that's just really an alerter team not
a real semantic obstruction and one way to test whether the kids understand what number
or words mean is the so called give and task. So you ask the kid, " Can you give me three
balloons?" So if you take a kid where there's no trouble counting up to three or ,certainly,
six and you say, "Can you give me three balloons?" Well, at the age of which kids first are able
to do the counting thing, they are now very good at that. They might give you a random
number where in typical age, where you say two years, ten months, kids are at a stage
of what's called being one-knower. That means if you say, "Can you give me one balloon?
" The'll give you one balloon. But if you say, "Can you give me two or three or four
balloons." They'll just give you some arbitrary number more than one. That's not--it's a little
bit more [INDISTINCT] that more a lot of what happens. And then, a few months later, kids
become a two-knower right around age three [INDISTINCT] hypothetical ages. So if you
say, "Can you give me one balloon?" They can give you one. You say, "Can you give me two."
They'll give you two. But if you say "Can you give me three." They'll give you some
number more than two, sometimes three sometimes four and so on. And then they're just three
numbers that you give which takes another few months. And you could imagine going on
like that four-knower, five-knower and soon or later, they'll get to you, whatever, but
doesn't work that way. There's this really interesting lead of obstruction that happens
typically after the three numbers stage, occasionally, there's three, four number stage where kids
suddenly get all the other word meanings. Not--this isn't the same as kind of discovering
sweet infinity, the idea that there's no largest number. It's more than all numbers routine
are counting routine. So all the numbers that they can access when they go on to provide
the same [INDISTINCT] so on. Now, they understand how to fix those cardinalities in this given
task. So somehow, we want to explain this learning career. Why it takes actually relatively
long time writing at this characteristics sequence of knower stages and then there's
this interesting kind of lead of obstruction to the--what's called the CP over cardinal
principle of literature. So the latest [INDISTINCT] is he said, you know, starting to give something
that looks like a real language of thought here, give various kinds of prevalent functions
that can be useful for expressing numerical knowledge, of--you know, write all of these
in lambda calculus which will allow you to express various kinds of stages of knowledge
so you can write down a three-knower as it observe higher function you could write down
a--so that's the cardinal principle knower up there which is a recursion so, basically,
you count down the calculus and you map that course only to the set sizes. You can now
add most of weird things like a two not one-knower or a two-unknower or even have other, you
know, other kinds of languages have counts of which is more like singular, plural, just
one, you know, anything more than two is or anything more than one is two. The reason
why this is interesting for my [INDISTINCT] point of view is because people have emphasized
there's this sort of, you know, it's kind of [INDISTINCT] stimulus if you like that
there's this, you know, the possible way to map words on the card nowadays is not really
determined by the data and we wanted to be--how [INDISTINCT] space was able to express not
the only the actual lexicons that we could see that many other possibilities. And so
then on Steve did various stimulations where you imagine to know if there were--stimulates
the child who's, you know, getting mostly true but sometimes noisy data of an adult
or a cabinet speaker, you know, referring to the cardinality of a set and the--they
basically have a [INDISTINCT] program already in set here where you have a fire over those
expressions which is basically a medium of description kind of prior and then the likelihood
which is how well is explained the observation taking to count the possibility of noise which
could come for the fact that you're getting data, you know, possible some is using the
words incorrectly but it's also, it's more often the case that, you know, you might have--if
you don't really understand the pragmatics of how these words are used, you might have
to say these four glasses here but somebody could say, "Well, I'm goingI to take two glasses."
Like you could use a number that isn't necessary just referring to the obviously cardinality
of set. We wanted to build in that kind of robotics. Now, the national figure of any
Asian learning Christian should be asking here is, you know, if you're going to define
a prime over all these programs and learn in that way, you could just possibly work
or how could you make it constructible as not some simple context recommendation. And
it's--we don't have it using solutions to that what we're dealing is a kind of thing
that probably most people of Asian Learning would blanch out because we're going kind
of MCMC in defining a [INDISTINCT] grammar over these expressions. So I won't really
go to the detail. The basic of this is, this MCMC is [INDISTINCT] and is the same difference
out of the basically as in the church as far as the programming language which involves
making basically having a--having a program [INDISTINCT] traits and then making a proposals
where you take some cut off that program traits at some point and then [INDISTINCT] of resample
a new possibility and be accept and reject in the [INDISTINCT] it's amazing that it works
at all it's not very efficient, we obviously need to do better but it's enough with all
these sorts of tasks you're able to affect that these search is a very large space of
lexicons where some, you know, the number of possible number lengths here that--these
album searches that searches, you know, tons of thousands of processes. And then we look
through learning [INDISTINCT] that we look at a function of the amount of data that the
child proceeds what concerns the learning, what order, well, you know, the whole acts
we do, the things this learner doesn't seem what children do mainly the highest rank sort
of number lexicon initially is the one-knower and there's a stage of two-knower and then
the stage of three-knower and then grab it quickly you get the CP-knower. And the others
were the four, five and six-knowers and so on. Just--at this point, basically, are too
complex relatively to the [INDISTINCT] of the data where that CP-knower with the lead
to that refers to the function winds up being the best candidate. So if you [INDISTINCT]
it's a kind of [INDISTINCT] needs probabilistic programming [INDISTINCT] and this is--this
is really the first model in the cognitive science [INDISTINCT] they were able to explain
these characteristics number of learning code. Steve apply the same kind of approach to learn
quantifiers and want me to go through the details but basically, we need to find a similar
kind of landing calculus representation for substandard separation of quantifiers like
none every, some in most basically did the same kind of basic learning set-up and was
able to show that this qualifiers could be learned to get similar kinds of learning career.
Now, the data on the order of which children learned is, this--the qualifiers is not as
clear it has been the study of such intensive case studies but now that we have models of
[INDISTINCT] is motivating language development research to go out and check on this. How
much time do I have? >> Ten minutes.
>> Ten minutes? Okay. So the last sort of set of things I want to come back to is what
I was trying to motivate at the beginning, it's hard for me the motivation to think about
program induction but also it's--is this idea that is one of the most the deepest in quantum
ideas in development of Psychology that sits between language acquisition and conceptual
development more generally. It's sometimes called the theory theory. This idea that children's
knowledge is organized into these contributive theories that are affects systems of concepts,
they are kind of like scientific theories, it's not like--and that what words mean of
these particular words that refer to aspect concepts that don't have a direct conceptual
grounding but are really important to how children think about the word as they label
the concepts of thinking how they learn to think, how they learn to understand things
that we don't directly observe but we learn about through our combination of observation
and with basic interaction with others. So this idea that we need to be able to capture,
you know, hypothetical candid word meanings the sometimes abstracts in intuitive theory.
And we got that originally by starting to look at, you know, learning call long as the
sort of the stuff we're all [INDISTINCT] you directed a sort of graphs that can capture
calls relations, this is sort of a QMR live network, if you're familiar with it, classic
disease symptom once but there's an extra set of variables which capture these risk
factors. And the standard from [INDISTINCT] called the learning approach would be to hook
down the prior [INDISTINCT] which we have just possible directly set the graphs on these
12 variables and observe patients who are samples from this called law. And you could
try to do call the learning from sample data. It's a really hard though until learning to
do structure like this and you completely [INDISTINCT] with the generic prior and then
worked it because me and others we're doing on just purely on counseling, didn't think
that we were sending semantics. But I'll show you why we came to that. We said well maybe
you could do causal learning much better if you had a high level hypothesis, some kind
of abstract, intuitive theory of this domain which divides the variables into these several
classes, we're calling behavior, diseases and symptoms. And if you know that--let say,
this variables working, [INDISTINCT] smoking, some other not categoric behaviors and flu,
bronchitis, lung , some other disease, category of headache, fever, coughing are symptoms.
And you have this--the [INDISTINCT] that says, all colleagues go symptoms to diseases and
diseases to symptoms. And so that--you have--you'll only going to be learning at that fits that
high level sema, that's a hugely useful constraint for causal learning. It cuts down the hypothesis
phase from hypotheses--space of all possible [INDISTINCT] on 12 variables which is super
exponential writing the number of nodes are very--something like 521 Zillion, really,
I think. But if you have that high level knowledge, it cuts it down to only a hundred and thirty
one thousand and learning is much more efficient. Now, the reason why this is interesting from
machine learning point of view, is that we don't just have to while ahead in some kind
of, hand full of knowledge. But we came up with the way to learn that, actually. This
is where that Tom Griffiths and Charles Kemp and I started doing of the actual machine
learning site is done by the [INDISTINCT] is part of his undergrad master thesis. So
what he worked out was basically a way to like sort of freeze the time kind that left
out the most of demand but basically, he worked out a way to--I'm going [INDISTINCT] to this
one. He worked out a way to define like a non parametric based kind of, Chinese restaurant
process, prior over groupings of variables which subject to the grouping. One of which
is shown up on top then that gives you a prior on graphs here. It's basically just a kind
of pro holistic version of what you can see up there. And then you do inference at both
of these levels where at the top level, you're make your--for the--trying to decide which
class of different variables go in. And then subject to that clustering up there, that
it's prior on graphs and so then you're doing inference at this level basically looking
and looking at possible errors which way the causal errors go. So the inference here consists
of MCMC at these two levels just to the two discreet variables of class membership of
the variables and which exist consists in the graph. And it's quite remarkable how well
this is--this is able to work. It's--it doesn't, you know, we didn't figure out how to scale,
that's a very large problems. But what we showed was how to take, you know, what these
relatively small graphs which is still very, very hard to learn from like here's a two--a
two layer sort of disease symptom, graphing them all which require something like a thousand
samples to learn with high accuracy if you--if you're just learn the generic [INDISTINCT]
but if you allow results to learn at this higher level where you consider the possibility
that the nodes could be divided into this groups and then after it's [INDISTINCT] on
graphs, then you can learn from, you know, order of magnitude, less data maybe about
80 samples and even just 20 samples from that graph, you could identify those abstract concepts,
you can figure out by the first six variables are kind of the causes and the next ten are
effects. And that's--and you'll get that kind of getting the big picture first is a very
human like kind of learning. But it's also about semantic, right? Because words like
diseases and symptoms, those are words that are important for our intuitive concepts of,
you know, intuitive theories in medicine and well, I think they're not nodes in the graphical
model about they're labels for these more abstract concepts and if you want to understand
how those words get learned, we need ways of generating the word meanings with this
kind of hierarchal based in learning is able to do. It's sort of even greater extension,
this is some worth that no one could [INDISTINCT] did. Where they supposed [INDISTINCT] we described
that the abstract knowledge that share across different, called the networks was some kind
of first order logic theory that basically winds up capturing the sort of [INDISTINCT]
intervention semantics have caused, in a sense, we call this learning to be causal because
this is basically a learner who knows about the [INDISTINCT] graphical model but doesn't
understand the [INDISTINCT] intervention semantics of what it really means to be a cause, some
sort of--sort of a aero breaking semantics and I won't go through details. But basically,
by searching over a space of [INDISTINCT] that characterize what it means to be an intervention,
this systems is able to learn how interventions work and use that to learn better causal learnings.
Within a sense, it's kind of learning with the word cause means. The last example I want
to give is again sort of--sort of core theory, theory stuff. It's--the--there's this term
from Philosophy, Philosophy of Science and Philosophy of Language called conceptual role
semantics which I won't go through the scope because time is short. But basically, there's--because
I that words to take their meaning by referring to concepts which has some kind of abstract
relation to role, you can see this very clearly in the classic scientific theories like what
is force and mass [INDISTINCT] energy or momentum, those words cannot be defined if--as Philosophy
of Science would say in observation language. There's no way to sort of break definition
in terms of, you know, axis and x dots for what forces you connect, define acceleration
at but you can't define force with mass. Another hand, the way you define force with mass is
you say F=ma and then you observed some objects in motion and you can--you can parse that,
you can lead off where the forces and masses are like indeed. In another words, the words
are defined by their meaning and that's conceptual assisted. So this is his idea of an intuitive
theory that specifies these laws, these concepts. When I was getting in [INDISTINCT] genetics,
I mean, I went through this, you know, quickly but it's that--that's kind of thing we're
talking about like these concepts like gene, [INDISTINCT] recessive dominant, veno type,
geno type. Those are typical words in biology which, you know, what do they mean? They--you
can't define them in terms of teapot, right? They're--it's abstract concepts which when
you understand the role in playing the theory, then you know what they mean. And you know
what they mean because you know how to use them to make sense of the data you observe.
So places where these kind of business comes up a lot in human learning, in semantic learning
is for example in intuitive psychology. So if you saw the--I showed you some of these
demos yesterday in the language of vision workshop. But, you know, just to show it for
those who haven't seen it, will remind you, you know, you can show infants these two ballrolling
around the green surface. But adults and even young infants see this as something kind of
psychological or intentional like the blue one is chasing the red one or this classic
[INDISTINCT] and similar business here. You know, you could see this as two triangles
moving in a circle moving but people actually see it as more like, you know, a big guy kind
of, maybe bullying or beating up on this little guy, scaring you away, that implies hiding.
You know, we see--we see psychology in even fairly simple motion like this. So what is
this--what's the--what's the intuitive theory here? Well, this is some example of works
from Chris Baker and our group where we try to infuse a basic theory of a--of this intentional
agent. You have actions that you can observe and those are caused by goals and believes
and a rational agent basically tries to choose action or plan sequences of actions which
lead to--likely to this--to the achievement of goals such as their beliefs. And you can
add a lot of things like the goals are drawn from some prior this notion of preference,
beliefs are formed by proceeding the goal, some kind of general role knowledge. And again,
these are--the meanings of words that are absolutely critical in, you know, in the kind
of abstract concepts that children use when they're learning about how to understand people's
behavior, words like goal or belief or action or agent or preference, you know. What do
you know, what do you prefer? Well, the meanings of those words to claim is, are basically
these structures in this kind of abstracts [INDISTINCT] and I'm very probably [INDISTINCT]
What we've done--this is the point of [INDISTINCT] skips yesterday's topic, sorry for that. But
anyways, so we've--we haven't built models here of how you can learn this kind of theory
and we haven't really try to do [INDISTINCT] what we have done is try to build models of
how this really works, how people can use a pro holistic model defined in this way to
make sense of agent's goals or make sense of their joint preferences and beliefs like
to read, to watch somebody [INDISTINCT] their environment and try to refer what they want
and what they know. An we think this provides the hypothesis basically for these abstract
[INDISTINCT] it's not perceptional grounding, it's--I call it cosmo grounding because the
words correspond, I mean, but plays the role and you want--they want perception of grounding
further allow me to do with these obstructions. If the theory is a whole, this was grounded
in perceptional experience and the words refer to pieces of theories. So just to wrap up
then, I told you about the program of research that is, you know, very much in progress and
the language aspects of it are particularly very much in progress. People who are interested
in working on this, you know, please come and talk to me. But these aspects of one child
learning, learning to learn and learning abstractions, learning content sensitivity, learning function
words are absolutely important. And we think that this tool kit of learning with the sufficiently
richer representations, things that basically come down to pro holistic program for learning.
It's the kind of programs induction is a list of a way forward. There's really hard in infants
problems which, again, we need your--we need your help on that but we think that at least
this is [INDISTINCT] of how the most interesting parts of semantics might be required.