Tip:
Highlight text to annotate it
X
DOM MASSARO: Today, I appreciate all of you coming
to the talk.
And I'll try to go through fairly quickly, so you'll have
time for questions.
As Shuman said, I was trained as an experimental
psychologist, majoring in mathematical psychology.
And I developed an information processing model of
perception, memory, and learning.
And I wanted to apply it to real world domains.
So I got interested in language.
And so I've been working on speech perception in reading
the last 40 years.
And I want to give you this background primarily as a road
map to where I ended up, in the idea that kids can learn
to acquire reading naturally, without instruction.
So I'm going to talk a little bit about our work on
multimodal speech perception, and then a project we did on
embellishing face-to-face communication.
And then talk about naturally acquired literacy.
So as you know, language has been thought as being special,
particularly be led by Noam Chomsky, that it doesn't
follow the rules of perception in memory and learning.
And we took the opposite view, that indeed, speech perception
and reading can be understood as prototypical pattern
recognition.
How to make sense of the world around us.
And so what we got interested in is how much the face
contributes to speech understanding.
So here's a little film exert that you can take a look at.
OK.
So it's particularly important that the men of the audience
picked that up.
As I tell my students, if think guys miss that, see me
after class, because it's important for the survival of
our species.
So to study this problem of speech perception we use
synthetic auditory speech.
We wanted to also vary the visible speech, that we could
control it exactly.
So we developed a computer animated talking head called
Baldi, spelled with an i, rather than a y, because he's
from California.
So here I'll let Baldi describe himself to you.
BALDI: I am Baldi, and I am proud there is very little
[INAUDIBLE] my attractive exterior.
See, there is only a wire frame under me.
I live through computer animation and
text-to-speech synthesis.
My visible speech is accurate.
DOM MASSARO: So we approached the problem as speech
scientists.
We wanted to make the visible speech
as accurate as possible.
And we develop Baldi that he could be aligned with either
synthetic speech, or natural speech.
We also made Baldi multilingual, by looking at
the unique characteristics of different languages and
programming the appropriate mouth and face
movements in Baldi.
BALDI: [SPEAKING MANDARIN]
DOM MASSARO: How was that, Shuman?
SHUMAN: Good.
DOM MASSARO: So Baldi can also be aligned
with natural speech.
BALDI: That's one small step for man, one
giant leap for mankind.
DOM MASSARO: OK.
So basically, you can think of Baldi as a
puppet on a set of strings.
And we're manipulating those strings moment by moment, to
produce the acrid visible speech, such as jaw rotation,
mouth opening, lip roll, and zipping, and so on.
BALDI: My movements are controlled by adjusting the
wire frame model at each time period.
My speech is accurate because it is based on real speakers.
And I have a lazy tongue just like they do.
I can be texture mapped with the image of a real person.
Hey, who said I'm not real?
See you around.
DOM MASSARO: So we also modeled the tongue, because
the tongue is very important.
For example, if you have Japanese trying to learn r and
l in English, you don't see much on the outside the face,
but the tongue makes very different movements.
So we use electropalatography, in which you put a palate on
the roof of the mouth, with sensors, that picks up where
the tongue hits the roof of the mouth.
And also ultrasound picking up how the tongue moves.
BALDI: I have a lovely tongue, and a groovy palate, with
[INAUDIBLE]
with tongue and palate.
DOM MASSARO: So we know that in speech production people
speak with a lazy tongue.
The way we articulate one segment is influenced by the
segments that precede it or follow it.
So you can't really just use a key frame method, where you
have prototypical mouse movements, and then you just
interpolate between those.
So we developed a dominance constraint.
So each control of Baldi has a certain amount of dominance.
And if they have equal dominance, here, as shown in
the upper panel, then you do a simple interpolation.
But if you take some characteristic of production,
like lip protrusion, if you say the word stew, you notice
the lip protrusion of the ew comes much earlier in time,
and influences how you say the s and the t.
So the s and the t in stew is very different than it is in
steep, for example.
And so we assign higher dominance for the protrusion
for ew, and therefore, when we interpolate, we see then, the
lip protrusion then comes forward into the s and t.
This is a coarticulation algorithm that's been tested
in many different experiments, and has been shown to hold up
pretty well, in the sense that it produces
acrid visible speech.
Our gold standard here is to get Baldi to give the same
kind of quality of visible speech that
a real talker gives.
Now some of you have experienced the
McGurk effect, anybody?
OK.
So what you want to do here is you want to look at Baldi and
he's going to say a set of syllables, like bah, gah, vah,
tha, dah, mah.
Simply, watch Baldi, keep track of what you were hear.
And then we can talk about it.
BALDI: Bah, bah, bah, bah.
DOM MASSARO: OK.
So Baldi said four syllables.
Did you hear this syllable changing
from syllable to syllable?
Some of you?
Some of you not.
It's pretty small, but in fact, for those of you that
did hear it, the auditory syllable was always bah, but
the mouth was going bah, vah, tha, dah.
So if you were hearing different syllables, in fact,
the visible speech was influencing what you hear.
I can play it one more time.
You can either look at it again, or close your eyes.
BALDI: Bah, bah, bah, bah.
DOM MASSARO: Now I've been looking at that for much
longer than I want to remember, and I still get the
allusion that the visible speech has an impact.
And so based on this, we developed this multisensory
pattern recognition scheme, called the Fuzzy Logical Model
of Perception, in which we have these very simple stages
of process model, where you evaluate the two sources of
information in terms of how much they support the
different alternatives.
You integrate them together to get an
overall metric of support.
And then you make a decision.
And a learning occurs with feedback, so that you can
modify the evaluation value, given the
feedback that you get.
And this turns out to be what we call the Fuzzy Logical
Model of Perception in space.
And Fuzzy Logic, it turns out to be, it's mathematically
equivalent to Bayes' theorem which is an optimal method for
combining multiple sources of
information to make a decision.
I should mention, if anybody has any questions, please feel
free to interrupt--
clarifications and so on.
And if I'm going too fast, or too slow, let me know.
OK.
So given that we had support for Baldi having good visible
speech, again, compared to our gold standard, Baldi does
almost as well--
I won't show you those data--
we thought that there would be value in Baldi
being a virtual tutor.
So who better than a bunch of deaf and hard of hearing kids,
that are behind in vocabulary, simply because they have
degraded hearing?
And we thought that Baldi might be a good tutor.
So here's a little part of a Primetime special that
evaluated Baldi's instruction of vocabulary for these kids.
[VIDEO PLAYBACK]
-To see how fast Baldi can work, Primetime had teachers
create several new vocabulary lessons for Timothy.
Words he had never spoken before.
-Let's talk about what you see.
-First Baldi checks his knowledge.
-Click on the bowling balls.
-Then, Baldi shows Timothy each item correctly.
-These are bowling balls.
-Followed by a drill of speaking the words for the
very first time.
-What is this?
-Ola balls.
-It's not easy.
-Click on the tennis rackets.
No.
That's not right.
-Timothy mispronounced and misidentified
nine out of ten objects.
But just three weeks later, we re-tested him.
-OK, Timothy, you're ready for the final test.
What is this?
-Soccer ball.
-This time, he got nine out of ten correct, and his
pronunciation improved dramatically.
-What it this?
-Baseball.
[END VIDEO PLAYBACK]
OK.
So it looked like the kids were learning vocabulary, but
we wanted to make sure that indeed it was the Baldi
intervention that was responsible.
So we did some experiments in which the kids are learning
three sets of words.
We're testing on all three sets every time, but we're
only training on one set.
So the idea is that, if it's the Baldi intervention that's
important, then they'll only learn that set of words, and
not the other sets.
This is called a multiple baseline procedure.
And so, sure enough, the dark squares are comprehension, and
the open ones are production.
And comprehension is always a little easier than production.
But you can see that, sure enough, this particular
student learns the Set 1 words, but
not the Set 2 words.
When they start trading on the Set 2, they learn those items.
And then they learn it for Set 3.
So this is an accepted procedure that is accepted by
peer review journals, that says that, yes, you're
intervention was responsible.
So we were happy with Baldi being an effective virtual
tutor for learning vocabulary.
And we looked to the autistic kid community, and thought the
Baldi might help there, too.
Autistic kids like constancy, and Baldi is always constant.
His emotion can be controlled exactly.
And we can also teach grammar.
Here, we're teaching singular versus plural with these
autistic kids.
And I'll just give you a quick look at Tony
working with Baldi.
[VIDEO PLAYBACK]
-Let's practice.
Where are the ladybugs?
-Ladybugs.
DOM MASSARO: So he's learning singular versus plural.
And he's clicking on the right answer, but he is actually
saying them, too.
The kids really resonate to Baldi.
And they come in, hi, Baldi.
I love you, Baldi, and so on.
Sure enough, Baldi was also effective with
the autistic kids.
And so more recently, we decided to
port Baldi to a tablet.
And here we have Taryn working with Baldi on a tablet, in a
simple tile matching game, where you match two tiles so
they disappear.
MALE SPEAKER: Question?
DOM MASSARO: Yeah.
MALE SPEAKER: Is it true that autistic kids have a harder
time parsing human expression?
DOM MASSARO: Yes.
Autistic kids tend not to look at the face.
And in fact, we taught kids to look at the face, and found
that they could use that information and integrate it
with the voice, in the same way that normally
developing kids do.
So that's a good point.
But they tend not to look at the face.
But we were able to get them to learn how to lip read
better with Baldi, then they had previously.
And sure enough, they could look like normally developing
kids, in terms of integrating the two sources of
information.
One of our mantra's is that people naturally integrate
multiple sources of information,
even autistic kids.
So thanks for that.
So here's Taryn working with the tile matching game.
[VIDEO PLAYBACK]
I got the ball.
-Wonderful.
-You see?
I found it.
-You did, that's great.
-He's gone.
[END VIDEO PLAYBACK]
DOM MASSARO: So you can see that the
little game can be engaging.
And the nice thing about a virtual tutor, it's available
all the time.
But there is a downside.
These programs to write, are expensive.
They require some maintenance.
And one could argue, it's 2D media, they're not getting
personal interaction.
So one of the reasons I talk about this is because, this
kind of intervention is coming kind of late
in the child's life.
And one of my mantra's is that we need early intervention--
just to anticipate some of things I'm going to say.
OK.
So now we're going to switch gears to a second project, in
terms of embellishing conversations.
And we have one out of ten people, across the world, deaf
and hard of hearing, and we all lose our hearing,
particularly the men, as we become chronologically gifted.
And therefore, we depend more on the face.
But the face give some information, but it doesn't
give complete information.
So for example, you can get place of
articulation from the face.
You can see the difference between the
and duh, for example.
But you can't really get things like voicing so well,
like the difference between buh and puh.
So what we thought is that, if we had a hard of hearing
people that were getting degraded hearing, maybe we
could embellish the signal, by giving visual cues about those
things that aren't seen so easily on the face.
And this would be a wearable appliance, that people would
wear, like a pair of glasses.
And if it wouldn't take much sophisticated technology.
You could just have a couple of LEDs on the
corner of your glasses.
You're looking at the person, lip reading, getting some
degraded hearing, if it's available, and then
integrating these cues with it.
So we decided to develop a scheme where we would
represent these characteristics
of the speech visually.
So voicing would be indicated by a blue dot, frication by a
white dot, and nasality by a red dot.
So what we did is, we developed a neural network
model that tracks your speech.
So you'd have a microphone on your eyeglasses.
You would be tracking the speech of the interlocutor
you're having a conversation with.
And then these cues would be showing on your eyeglasses.
And the idea is that you would be integrating these cues with
the face and the voice, to understand the message.
And so, you can see like, remember red
is nasal, like men.
So you saw the red and the blue, which is voicing.
And then you can say things like sys, where you get the
frication, the voicing and the frication.
Or men, where you get the the nasality.
OK?
Or church.
So the neural network actually does a pretty good job.
It's doing it in real time.
We're only lagging by, these are 10 milliseconds steps, so
we're only lagging the speech signal by about 50
milliseconds.
And it does about 90% correct.
And so, that's one half of the problem.
The other half of the problem is for people
to learn the cues.
So if you see fan, OK, if you just see it on the lips, then
that could be van because you don't see the voicing so well.
But if you get to cue that it's voiced--
let's see if this works, live demos--
so if you don't get the white cue then you're OK.
Fan so I saw it went white, blue, white.
Where if I say van, then you see, you don't
get the white cue.
So that could tell you the difference
between fan and van.
So you just have to put those two things together.
So what we want to do is teach people.
And we didn't have a whole population of people, deaf and
hard of hearing people, so we depended on university kids
that came in every day, or maybe three or four times a
week, for an hour or so a day, in which they practiced
integrating these cues with the face.
So Baldi would mouth a word with the cues.
The student would write and indicate what they perceived,
and then Baldi would give feedback.
All right, so these kids were very heroic.
They came in for over a year, every day, and were
learning the cues.
And they did a pretty good job.
But what we found is, they did pretty well with single words,
or maybe even two or three word phrases, but they were
lost in continuous conversation.
Think of trying to track what I'm saying
with all these cues.
That's a tough one.
So we also made the observation that it's hard to
change behavior.
Now we know, first of all, people that require hearing
aids, they have very little patience.
Most of them throw them away.
Don't really use them.
You probably have experience with people
who have done this.
And similarly, we thought, there's no way we can convince
people to spend a year learning these cues
that might help them.
And the other thing is, as you can see, when we don't hear
something so well, our natural tendency is to move our ear to
the source and lose the visual information.
And so this woman who has hearing aids is gaining about
three decibel of understanding, but in fact, if
she were looking at the face she would
gain about 12 decimals.
So how do you teach people to do that?
It's hard to change behavior.
So this is another lesson where I'm going with, in terms
of the naturally acquired reading.
So what we did in this project was that we thought, well OK,
why don't we just do full blown speech recognition, and
then communicate that way.
So the hard of hearing person can get all of the cues of you
talking, and then also get the words.
So they can get the paralinguistic information,
and the linguistic information, and then have a
conversation.
[VIDEO PLAYBACK]
-Do you have any plans to enjoy the nice weather?
-Not today.
[END VIDEO PLAYBACK]
OK.
So the idea then, again, if I'm talking to someone that's
hard of hearing, I can ask them a question, and we all
know about speech recognition, open ended alternatives and so
on, but do you know if there's a Starbucks nearby in this
neighborhood?
So, not bad.
Did you see the frost on your roof last night?
So you can do a pretty good job.
Now this recognizer only runs locally.
If you had access to the internet it could do a lot
better, and it could be faster too, simply because, if you
know about speech recognition, it has a much bigger database,
and more computational power, and so on.
So this is where we ended up on this project, where we're
doing the full blown speech recognition.
OK.
So we're making pretty good time.
In fact, maybe I'm going too fast.
OK.
So here's what you all came for, I guess, and that is,
again, I wanted to tell these stories because I want to show
you how this took me in this direction.
I should have shown some research, also, that we've
done in reading, just in the same way that we did in speech
perception.
That people integrate multiple sources of information in
reading, like putting together information about the letters
themselves, the orthographic structure, that is how the
letters go together in words, and syntactic and semantic
constraints, for example.
And so about three years ago, I arrived at this idea that
kids are immersed in spoken language at birth, why can't
we immerse them in written language at birth?
This has never happened because we haven't had the
technology, but the technology is getting there.
Why don't we immerse the kids in written language at birth?
And the idea is that they'll learn to read naturally,
without instruction.
And this has huge implications for the way society is
structured, today.
What are we up against here?
Well the current belief is that speech and language, as I
said, are very special things.
And that they're more or less like instincts.
Whereas reading is artificial.
It's artificial because it's was created a couple thousand
years ago, and we created it, rather than some
extraterrestrial influence, or something.
And Maryanne Wolfe, here, represents the neuroscience
community, when she says in your book, here, "Unlike its
component parts such as vision and speech, which are
genetically organized, reading has no direct genetic program
passing it on to future generations."
So there's something special about reading that is
artificial.
And speech is natural.
And so that's what we're up against.
Maybe I can have someone from the audience, would you be
willing to participate?
So what I want you to do is, there are going
to be a set of pictures.
I want you to name the object in the picture.
OK?
And just go from left to right, across the two rows.
FEMALE SPEAKER: You want me to name the picture?
DOM MASSARO: Name the picture.
FEMALE SPEAKER: Tree, book, shoe, nest, eggs, baby,
rabbit, ring.
DOM MASSARO: OK.
Now you can do the next one.
FEMALE SPEAKER: Tree, book, shoes.
DOM MASSARO: This is like the Stroop Effect, right?
Where you're trying to name the color of the print when
it's spelled in words of a different color.
Right?
So it makes a point that we did learn to read, but we do,
once you learn how to read, you can't help but read.
That's why advertising is so effective.
And that is just one more similarity with speech.
We can't help but hear.
If someone mentions our name, we can't help but orient our
attention to it.
So the way I thought about the problem is, what's needed for
a child to acquire spoken language?
And does that same child have that same stuff to acquire
written language?
And so for spoken language, the child's got to do some
kind of signal analysis.
They have to hear the syllables, combine the
syllables in different orders, form categories that are
associated with meaning, and most importantly, they need an
early exposure.
We know from those few sad cases where kids haven't had
language until adolescence or even six or seven years old,
they can't acquire language.
And what's really impressive, you don't know this, and maybe
this is why you're tired in the evening if you have kids,
in a given year, say from one to two years old, a child
hears about 1,000 hours of speech, which is about a
million words.
So that's a lot.
Right?
So they need that early exposure, and they
need a lot of it.
Now what's not needed?
Well some of you may have heard of the Theory of Mind.
This is something that kids don't get till about 3 years
old, and that is that they have some kind of
understanding that they are them, you are you, and you can
engage in something like a dialogue.
And you might have different beliefs than they have.
And that you can change each other's beliefs.
Well that's not necessary to learn language.
Because by age 3 kids are incredibly sophisticated
language users, even though they don't
have a Theory of Mind.
As a car talk guys might say, it's unencumbered by the
thought process.
So the kids acquire the language without thought.
Now one convincing piece of data was, you all have heard
of Kanzi, who can do amazing things with language.
In fact, how did Kanzi learn to manipulate these symbols
that were associated with speech, and eventually learn
how to understand speech?
Well here's Kanzi on your mother's knee, Matata, while
they were teaching Matata these symbols.
So they spent a year or so doing very formal instruction
of the adult, Matata, to learn these symbols, and Kanzi was
just there nursing and having a good time, and so on.
Well it turns out that Matata never learned, and then simply
one day, some serendipitous discovery, they found out that
Kanzi had learned all of the symbols.
So that's pretty wild.
Not much is made of that, in that literature, but I see a
really importance here.
And this agrees with the idea that there's this incredibly
explosive brain development that occurs in the first few
years of life.
That the brain is getting bigger, it's making more
connections, pruning connections that aren't
meaningful.
And this is another support for the idea that it's
important to get in there early, when
the brain is plastic.
Now all these companies are telling us that our brain is
plastic through our lifetime, and we could learn, and lower
your chronological age, and so on.
But I'm kind of skeptical of that.
It's a little plastic, but not like what the young kids bring
to the table.
OK, so that's more or less what's needed for speech
perception.
And we can ask similar questions about reading.
So for reading, we have to do some kind of signal analysis.
We have to learn letters.
Combine the letters.
Associate those combinations with particular categories.
And, like speech, we need early exposure.
And we need some kind of language and reading immersion
in the same way that we have it in spoken language.
Well how many hours do we need?
We don't know.
But we know, right now, the kids are
getting next to nothing.
And I'll expand on that.
First of all, in terms of the signal analysis, babies come
equipped with vision that's very
sophisticated, very early on.
Here's a baby at 3 weeks old, where you can see her making
the sciatic eye movements, tracking the toy.
That's at just 3 weeks.
A week earlier she couldn't do it.
Also their visual acuity is very good.
So that at one month, they're real good at arm's distance,
looking at you, but by eight months, they pretty much have
the vision that we have.
So they can do about as well on the eye chart as we can.
So the babies seem to be equipped to process visual
information.
And sure enough, they can form categories.
So at one-month-old, infants can see the difference between
a square and a triangle.
Whereas, at one-month-old they can't see the difference
between the square and the triangle if they're embedded
in a circle.
But at two months they're able to make that distinction.
So again, there's this rapid development of what's
necessary to form categories.
And it turns out that babies are
incredible association engines.
They learn statistical constraints like nothing.
And developmental psychologists have made a
cottage industry of this behavior, showing that kids
can very quickly learn associations in speech, in
music, in objects.
So a typical experiment might be the experimenter sets up
probabilistic constraints among objects.
Here you can think of these as pairs of objects, three pairs
of objects.
One object always follows the other object.
But between pairs they follow each other only a
third of the time.
So there's a set of constraints there.
If you take a seven-month-old infant, and you give them two
minutes of this sequence, they get bored out of their mind.
They habituate.
You change the statistical properties of that sequence,
the objects are still the same, the infants wake up.
This is how we can tell what infants know about the world
around them.
They get bored very easily, we change the world, they wake
up, and we assume that they noticed the difference.
So indeed, they can do this.
Pretty impressive.
And what's so interesting, again, even though the
developmental psychologists have made this a cottage
industry, they never thought of studying letters, and
letter combinations.
Somehow reading seems to be off the map for them.
So we propose an experiment of the following kind.
That we use the same kind of constraints, but now we do it
in letters, rather than in objects.
And the hypothesis would be that indeed,
kids would learn this.
So maybe someday someone gets some funding to do it, so we
start learning about how kids process written language.
Now we think that kids are going to be able to do this,
it's going to be a no-brainer.
Why?
It turns out, it seems that, if you do a topographical
analysis of the alphabet and alphabets of the world, they
have the same characteristics of the world around us,
whether it's a geometric architectural world, or
whether it's a pastoral world, they seem to have the same
properties.
So the argument by Changizi there, is that the alphabets
actually developed, not because they were easy to
write, but rather, to make it easy for the visual system
that's already prepared for that kind of information in
the environment.
So as I mentioned, then, we want to have this critical
period of development.
We see it's true in the auditory system, the visual
system, speech and sign language.
And we're saying that in reading it's the same thing.
That we want to have reading being learned at this critical
period of development.
Obviously, it doesn't have sharp boundaries.
But the fact is, the earlier you can get in, the better.
So how do we immerse kids in written language?
As I said, this has never been done because we really haven't
had the technology.
And we still don't have it, but we can have successive
approximations.
So we can think of picture books.
Picture books are really important for kids, and their
acculturation in the world.
And we all read picture books to our kids.
And it turns out that you think, oh picture books,
there's writing in picture books.
Right?
Look at the writing.
Well obviously, when you take eye movements of kids in
picture book reading, 95% of the time they're looking at
the pictures and not the words.
And as you can see by the graph there that the artist
makes the fonts real funny, and so on, to please the
adult's reading, not for the kids to read.
So we developed this app where we put in all of the popular
books that we could find.
We put them into our app.
And so the caregiver can choose a book that they're
reading, and then supplement it with nice, visual letters
that the kids can really easily.
So here, I chose Barbapapa, Barbapapa at the Zoo, and you
could be reading along and the kid would be
looking at the book.
When they were having a party after the fire, Barbapapa
heard cries for help.
And then a fierce leopard had escaped from the zoo.
And so what you can do is, you can then dictate this, a
fierce leopard had escaped from the zoo.
[COMPUTER REPLAY]
A fierce leopard had escaped from the zoo.
DOM MASSARO: So now the child's getting nice, written
language, that supplements the picture book reading, and they
should then get some of this written language that they
need for the development of learning to read.
When we have all the books in memory, and we know what book
the caregiver is reading, it makes speech
recognition a lot easier.
But someone can't really read our book without
having the real book.
So we're not really hurting any copyright.
So here's another little one.
So you can see, it was kind of difficult for me to negotiate
holding the book, and the iPad.
So there's these nice T-shirts you can buy, where you can put
your tablet right in here.
And then it's easier to negotiate.
So there's Keegan looking at the words.
And you might wonder about this format that we're using,
where one word occurs on top of one another.
But in fact, this is called Rapid Serial Visual
Presentation.
And psychologists use it a lot in experiments.
It turns out that kids, even third graders, do better with
this Rapid Serial Visual Presentation format, than they
do with a page format.
And again, we grew up reading at our own pace.
And so we wouldn't like that.
But in fact, just like we're pushed in listening to speech,
we can be pushed into reading.
And we can actually then, read faster, more words per minute
with better comprehension, in this kind of presentation,
than the standard presentation.
Of course, as you could see, this gives the visual quality
much more value to the kids, because they get nice big
words, and it wouldn't work very well in a page format.
So here is Nathaniel's mom, reading Goodnight Moon.
He's--
what is he four-months-old, or is he eight months?
I'm sorry.
[VIDEO PLAYBACK]
-Goodnight kittens and goodnight mittens.
[END VIDEO PLAYBACK]
You see he's really looking.
And he does pick up on it.
And the thing that we don't realize is that we're talking
to our kids all the time, and they don't
understand a damned thing.
MALE SPEAKER: That keeps going on.
DOM MASSARO: And it gets worse.
It gets worse.
So we're looking at other ways to embellish the written world
for the child.
And one is Write My World, where you can then write what
the child is experiencing.
[VIDEO PLAYBACK]
-Car racing.
See?
Vroom, vroom, vroom.
Car racing.
See Keegan, look.
Keegan, car racing.
[END VIDEO PLAYBACK]
So people, you might think that's kind of artificial, but
in kids learning sign language, the caregivers are
faced with the same problem.
They have to get the attention of the kids--
it's a little different from speech--
to watch the signs.
So the caregivers might actually sign over the object,
or get their attention, look at me, and do the signing.
And kids learn sign just as easily as they
learn spoken language.
So this other application we have here, is that the child
would carry around a camera and zero in on the world
around them.
And we have bar codes, so here we can distinguish 24
different things with these bar codes.
And then we can present whatever we want.
We can talk about it, we can show the visual information,
and we can also present it.
So it's always right on top of the object, taking into
account the camera angle.
So that's another way, the child could walk
around and do this.
Well so here at Google, you've got your ideas about
Interactive Digital Signage, so that eventually we're going
to be surrounded with a literate world.
So there's no reason that that digital world can't be
available to young kids, as it is available to our adults.
So that's certainly indicating that the technology is getting
there, in terms of having the child
immersed in written language.
And of course kids can have robots.
The European community is--
looks like they're going to fund a $10 billion project on
robots as companions.
And there's no reason why these companions can't provide
written language, as well a spoken language to kids.
So here's a little concept video that was done in 2009,
about a Danish farmer that goes out with these
intelligent glasses, goes out to his barn.
And he looks around the barn.
And of course, it's doing object recognition.
And he sees that the roof needs repair here, and this
cow has to stay on medicine for another two
weeks, and so on.
And so these are very valuable things.
And he comes in the house and the recognition system, of
course, recognizes his spouse and tells him, hey, tomorrow's
your anniversary, you better get a present.
So in our patent application I proposed this heads up
display, where there are two things that you want to do.
You need to understand the experience of the child and
represent that in written language.
So there's two basic ways to do that.
One is to do speech recognition.
So the speech that's being said is very predictive of the
child's experience.
And second, is to do object and action recognition, where
recognizing what the child's doing is another idea of what
they're experiencing.
And both of those would be associated
with written language.
So the caregivers says, you did a fine job.
The heads up display might just say fine job.
So it would read it.
So you all know about Google Glass, and the point here is
that the wrong person has it on.
The baby should be wearing the glasses that give the
information about the mother.
So although she put some toy glasses on the baby here, to
take a picture, it's really the baby that should be seeing
what's going on in the world.
Eventually it'll all be on a contact lens, so that'll make
it even easier.
But for now, we just have these portable tablets that
kids seem to be attracted to, naturally.
They love the touch system.
And I was telling Shuman that they could do something like a
swipe on the tablet, to interact with the written
language that way.
So we're winding down here.
What are the benefits of early reading?
Well the idea would be that the illiteracy would be no
more frequent than speech impairment would be now.
Whereas now, it's obviously much more frequent.
It would reduce the cost of reading instruction, and there
are advantages of written language.
We did an analysis of the kind of language you find in
picture books, versus the kind of language adults have when
they talk to each other.
And we found that the language in picture books was much
richer in being at a higher pitch.
A more unique vocabulary, more complicated
grammar, and so on.
So it's good to get kids reading as quickly as
possible, because they're going to be faced with more
difficult language, that only can be beneficial.
So if we're successful, then this is going to change how we
allocate resources.
It's going to have incredible implications for the deaf and
hard of hearing community, and it allows
us to rethink schooling.
So right now, if you look at the public spending on kids as
they go through them birth to adulthood, versus their brain
growth, you see this inverse function.
So we spend all our money on kids after they go to school.
Whereas, the brain growth all occurred
before they go to school.
So we would need to have a realignment of some public
spending, before schooling, rather than after schooling.
And sure enough, our Nobel Prize winner, James Heckman
showed that the return on investment is much greater
when you invest in preschool kids, relative to school age,
or adult kids.
So you get a good return on investment there.
For deaf kids, most deaf kids read at a fourth grade level.
That's primarily because 19 out of 20 kids that are born
deaf are born to hearing adults.
And the adults want to hear them talk, so these kids don't
learn sign language very quickly.
With our method, these kids could be bootstrapped with
written language, so that would be their second
language, in addition to either oral
language or sign language.
So the written language could be a real great modality to
bootstrap kids that are deaf and hard of hearing.
And then finally, we can envision schooling after kids
are ready for reading at the age at which they would
normally go to school.
And so rather than we spend something, the way I figured
it out, we seem to spend about $10,000 per kid,
for a year of schooling.
So we could save a lot of money.
And we have the three R's reading and writing, the
children would have both of those.
So it would be something that could help the economy a lot.
But it would also allow us to think of schools as Dewey did,
where we would have communities of scholars with
certain interests, and they would congregate together and
pursue their interests, rather than sitting in a desk and
learning a litany.
So this is the conclusion I usually give to the groups I
talk to, because behavioral and social sciences is still
pretty conservative.
So I don't have to give that conclusion here, that it's
clear that science and technology impact life.
And that we have to be open to disruptive ideas.
So I'm happy to entertain questions now.
And we can open floor to you all.
Thank you.
[APPLAUSE]
MALE SPEAKER: When kids are small, you showed the brain
growth and importance of speech.
The--
DOM MASSARO: Wrong time to get a call.
MALE SPEAKER: I've got a brand new phone and it's [INAUDIBLE]
or anything.
So I have my son who didn't speak until he was three.
So there was a lot of time he didn't speak.
So we really couldn't communicate.
So the other side of the reading early thing, for me,
would be to be able to communicate effectively with
my son, through reading, as opposed to speech.
Hopefully bootstrap the speech, or maybe get at the
early thing.
So have you thought of it, in sort of the reverse way?
DOM MASSARO: That's a nice idea.
Certainly, one of the reasons that we try to make a
behavioral science case for learning to read, in the same
way that you learn to understand spoken language,
and one of the things you see are kids that are late
talkers, so-called late talkers.
Your son was one of these, could understand hundreds of
words but didn't speak.
MALE SPEAKER: We have no idea if he
understood a single word.
DOM MASSARO: But once he started speaking, did he?
MALE SPEAKER: Yeah, but it took 2 and 1/2 to 2 and 3/4
years to get him to speak.
DOM MASSARO: To get him to speak.
But the point is, he did understand.
MALE SPEAKER: We actually believe that he hadn't figured
out that that crazy noise you hear, actually
had meaning to it.
He just thought it was just noise.
DOM MASSARO: I can talk more about that lately.
There's a scale you can fill out that says how many words
your kids comprehend, versus how many they produce.
And kids always comprehend much more than they produce.
And that's a normal trajectory.
And so the point would be for reading, that
could be also the case.
That kids could be reading much more than
they are able to write.
But one alternative that people have chosen, in similar
situations, are baby signs.
That babies are able to make signs that they use to
communicate.
So we can talk more about that, later.
FEMALE SPEAKER: While far less sophisticated or consistent,
Sesame Street would frequently show the words flashing with
the object in precisely how you have it.
As the word is spoken you see the word flashed, and the
picture at the same time.
Are there any studies that showed the effectiveness of
that on literacy, for children who watched Sesame Street,
versus those who didn't?
DOM MASSARO: Yeah.
That's a good question.
I don't know of any, but that would be
something to look into.
So we had that question mark.
We don't know how much written language the kids need.
And so the few times that, today the show is brought you
by the letter L, that's probably not enough.
But it's a start.
And it would be nice to determine that.
There are couple anecdotes.
The gentleman mentioned in India they're showing the
written language, subtitles, with the spoken language, to
helping in literacy.
When I visited Denmark, I was impressed that they show
Sesame Street without dubbing.
So the kids are hearing English, but
they show Danish subtitles.
So my idea was, hey, these kids want to
learn to read Danish.
Because they want to understand what the hell's
going on in Sesame Street.
So that's kind of neat thing.
I don't know of anything systematic that's
been done with that.
People have looked at picture books, and seeing
what effect that has.
Picture books has a big effect on the
language, but not literacy.
Because again, the kids simply aren't looking at the words.
And there's some controversy about to what extent kids can
learn from 2D media.
But obviously, reading is 2D media, so I think that's a
no-brainer.
Of course they can learn.
Yeah?
MALE SPEAKER: Have you experimented with teaching
kids phonics using the flash method?
DOM MASSARO: OK, so many years ago, we wrote a paper that
said the trick about, how do you teach a kid to read by
today's schooling?
Well what you do is teach them how to decode.
What does decode mean?
It means that they're able to map the written language into
spoken language.
And the way you do that is that you teach them phonics.
And so a colleague of mine just shared an anecdote with
his granddaughter the other day.
She had a picture of a cat, and the word was written
underneath.
And she went cah, uh, ca, ca, t, cat.
And then the next picture came on, it was an insect.
And she went buh uh ug, bug.
Except it said ant.
OK?
So that's a great anecdote.
One of the things we see we wrote, in 1979, was that one
of the benefits of phonics might not be with respect to
decoding, but drawing the kids attention to the
orthographic structure.
That means the spelling constraints in the language.
What letters follow other letters, and where
they occur in words.
And we showed, that indeed, people are sensitive to these
constraints, even though they have nothing to
do with spoken language.
There's just certain constraints
in the written language.
Some are dependent on spoken language, and some aren't.
But we're also sensitive to those.
So our ideas is that written language has the same
constraints, and a deaf child could learn written language
independently of spoken language.
When we heard the picture book reading, we have an option,
you can either have the voice on or not.
You heard it on.
And that could help bootstrap the child if she already knows
the speech, but it doesn't have to be on.
And the idea is that the deaf child could learn it that way.
So get back to the original question--
I'm sorry about the long answer-- is that we think that
the child can decode into spoken language, and then it's
a no-brainer.
Right?
They can understand the spoken language.
But in fact, I'm not so sure that's the case.
Even if they do decode successfully, it could be the
case that that decoding takes so much attention and
cognitive processing it distracts them away from the
understanding of the message.
And therefore, if we had the early literacy, they would
comprehend the written language directly, without
going through the spoken language.
And not needing the decoding process.
MALE SPEAKER: Does that mean if you start with very young
infants you want to start showing
them words, not letters?
DOM MASSARO: That's right.
So you're not going to teach them the alphabet.
MALE SPEAKER: Just skip the alphabet?
DOM MASSARO: Yeah.
And it will be learned naturally.
With the idea that, as we saw here, and let me tell you
about the baboon study.
These investigators in France just did this study.
They had these baboons in an open play area.
They could go up to the computer anytime they wanted.
They probably were a little hungry, that they
could get this food.
And they saw four letter strings.
And some of those letters, half of those latest strings
were words, and the other half were non-words that were
composed of the same letters, but they were not words, so
they had different orthographic structure.
These baboons were able to classify these items as words
versus non-words.
They didn't know the meaning, but the point is, they were
using the structure to discriminate those two.
What letters combination occur in what positions, and so on.
So they're picking up these constraints.
So the idea is, that when kids see whole words, they're going
to learn about the individual letters, because they are
separate objects, and to learn about them in the constraints.
FEMALE SPEAKER: In that example, did the whole words
bring food, but the non-words didn't?
DOM MASSARO: No, the correct answer brought food.
They have two categories.
I'm sorry, I didn't explain it clearly.
They said word or non-word, by two levers.
And when they were right they got food.
Yes, sorry, you were first.
MALE SPEAKER: So there's obviously
differences in languages.
For example, in Chinese you learn symbol in some sense
anyway, rather than the pronunciation.
Right?
DOM MASSARO: Well like Shuman will tell you, the Chinese,
you're really teaching them something like Pinion, first,
where they're getting something
closer to an alphabet.
OK?
So again, in my system, I don't think I'd want to throw
characters at the kids.
But rather, maybe something like Pinion.
But like Shuman said earlier, kids can learn anything.
And maybe they could learn the characters too.
But there's no reason why we would have to give them the
characters.
We could give them the Pinion and eventually they would
learn the characters.
MALE SPEAKER: That's what I'm saying, other languages, of
course, it works better to learn than the characters.
For example, in Spanish, or somewhere, pronunciation and
the actual letters are much better matched than in
English, where a u can be pronounced very differently.
DOM MASSARO: Yeah, so that's a whole nother dimension of the
reading wars, of whether a nice direct mapping between
the written language and the spoken language makes you
better reader?
And there's not much evidence that it really does.
And again, my argument would be that you're going to pick
up the structure of the written language regardless of
how it's mapped into the spoken language.
So it could be a deep orthography or a surface
orthography.
It doesn't matter.
Good question.
MALE SPEAKER: So I know that--
my son had trouble reading, too.
But the big thing about reading is, once you get it,
you get it.
And you might get it a couple years late, but once you know
how to read, you know how to read.
And you've made your vocabulary, but you actually
know how to read.
Is that somehow, true for speech, too?
Once you know how to speak, you know how to speak?
There's actually, for adult readers, there's a thing, you
say, you're an adult reader.
And, ***!
You can be 40 years older, or 15, and you haven't changed.
DOM MASSARO: Yeah.
That's actually a good question.
Because, I don't know how easily we can test it.
Because, again, most of us learn to speak very quickly.
And so it's hard to know, and say, now you can understand
the language.
Right?
And reading, again, you're waiting
until they go to school.
And then they learn the decoding.
And by fourth grade, then, the teachers are saying, yeah,
they decode, but they can't really comprehend.
So again, I don't know when you say that
someone is a reader.
That might be they do the decoding.
MALE SPEAKER: It's just like adult readers, you can
actually read things.
So they did [INAUDIBLE] and try to make sure that you
actually achieve adult reading.
DOM MASSARO: I think, again, it's a gradual process, in the
sense that the material you're reading.
So as I mentioned the, reading material's much more demanding
than spoken language material.
And so that plays a big role in it.
There are websites where you can put in a passage and it
will tell you the reading level.
And you can see that reading levels can differ a lot.
I'm sorry, we're going to have to close here,
and I can talk offline.
SHUMAN: Let's thank Dom for his [INAUDIBLE].
[APPLAUSE]