Tip:
Highlight text to annotate it
X
Well guys,
I'm going to record our talk.
Thanks Feijao for inviting me to speak
I hope you like it
I'm going to show some stuff that
I find very interesting
Stuff I would like to have known when I graduated.
To motivate you guys from ITAbits and ITAndroids
I'm going to start talking about
a question that this freshman "Radioativo"
I don't know his name, is he not here?
He was asking me:
If I were to build a robot, what matters the most?
Is it hardware or software?
You know, we see so much software available
Java, Eclipse, C#, some programming language
You can start developing right off
Hardware seems much more complicated to get
because you need some equipment, a prototyper,
things you might not have.
It turns out I gave the question a thought
and I can give a clearer answer now, which is:
Even though it's easier to obtain software, and
that it may look less important,
at the end of the day software is *more* important
That's because if you're going to build
a mechanism, and let's say this mechanism has
tolerance or else it won't work.
Have you ever heard about laser gyroscope?
A laser gyro is used to measure rotation, angular velocity
It's based on the fact that the speed of light is absolute
So, in order to build a laser gyro,
as the equipment spins, there's an interference counter
which tells you how many spins happened
The mechanics used involves a fantastic precision
And it turns out the full precision is a bit
beyond the specification of some cutting tools
So, how do they do it?
It's a mechanical system
This mechanical system has an average,
which is what you're looking for,
but also has some variation
If you know the mechanic requirement and
your machine has, say, some mean and
standard deviation, you can make 100
gyros and use the one that falls into
the required tolerance
Statistically, there should be one within it
Should you need a lower tolerance, build
10000 and grab only the best
It will be expensive but it's doable
Now let's turn to an AI software
If you consider a problem, for example, image identification:
Make me a software that identifies a bicycle in a picture
That is a trivial task for us humans:
I can just stare at the picture
You can have as large a budget as you wish
since we don't know how to do this, you
can't do it even with all computers in the world
That means that having a good heuristic,
a well implemented logic is worth more than
the mechanical component itself, because
knowing the algorithm is the difference between
being able or not being able to perform a task
It's not a matter of budget; we just don't know
how to do some things
Following this line of what we know and
don't know how to do, I'll show you the results
of a research that was conducted about
how it is that we learn, how our brain works,
which I found extremely interesting
This is a presentation from Stanford's professor
Andrew Ng, founder of their AI lab,
which I'm going to comment myself
There's the audio here
Whoever wants to copy may get it from the web
If you've already seen of course
you alredy know this.
So what they did was this:
they got the auditory nerve of some animal,
some frog I guess, for what he said,
and this auditory nerve is connected to the
auditory cortex, which is the part of the brain
trained to process this type of signal from
the auditory nerve so that you can hear
So the signal comes here to this auditory part
and then what they did was to cut this link
and they got the visual nerve, cut the link to
the auditory nerve, and rewired the vision
into this auditory part of the brain
Now guess what happened?
The auditory cortex learns to see
A part of the brain of the animal
which was previously trained to see
now with rewiring of the signal of the eye,
this same part of the brain learns to see
And it's not just see meaning there's light or not,
it's see as in recognize objects and do
everything that can be done with vision
It's not just some ghost
A part of the brain that used to hear now can see
[Andrew Ng speaking]
And so not satisfied with that,
researchers intercepted the touch signal
this part here of the brain, and
did the same experience:
they took off the sensory signal and
rewired vision here.
Guess what happened?
This region of the brain that used
to process touch signals now learned
to see, which is very interesting
If you do research in AI you know that
to train a robot to do some task
that is the ideal situation:
what is this algorithm that runs here
that can learn to process sight, touch and vision?
It's the same algorithm
This is one of the great secrets people research
It would things very simple:
Let's say I want my robot to learn to play soccer
I'll just install the same learning algorithm
and I make it play soccer
Now I want it to learn how to drive
I can simply teach it how to drive
and it will learn how to drive
We're not quite at this stage yet,
but this is motivation for many research areas
Let me speed this up a bit
Now I'll show some state of the art research
in this area
This is not exactly AI but it serves
as basis for us to extract ideas
How is it that our brain works?
After all, we're trying make a computer
replicate what we do
So here what people are trying to do is this:
our tongue has many blood vessels
So blind people are using sensors in their tongue
and the hope is that they can learn to see
using their tongue
This research here is very interesting and
has some very good results
This is also for blind people
What they do is they put these people
in a room with random obstacles
and then they put this buzzer on top of the guy
So based on the sound that bounces back
from the walls, as he touches the walls and notices
how the environment is, with time,
he learns to process this sound of the buzzer,
kinda like a bat, and know where
there are obstacles in a room
And then the guy can go around in an unknown
environment without bumping into anything
just by processing the sound that bounces back
Then, with time, as they get practice in this,
these vision impaired people learn to
perceive the sound that comes back when
they snap their fingers and locate themselves
in an environment which they didn't know
This is a direction sense, kinda like the
sonar one, and here they put this third
eye in a frog and did this experiment where
they wire this third eye into a random
place in the frog's brain and so the
frog learned how to see with this
third eye, which is interesting
How can that be?
We don't really know yet, but
we want to get there
So this is what I intend to talk about here:
How is it that we learn?
How can we do it? How does our brain work?
Can we grab this we have and put it
inside a computer?
A 2 year old child can perform visual
recognition tasks much better than
the best computer we have today, just so you know
Here's another thing we don't know how to do:
let's say I play two different movies
in two different screens and show that
to a 6 month old child. Then what you do
is you put the audio corresponding to
one of the movies, so that in one screen
movie A is playing and in the other one
movie B is playing.
Say I put the sound of movie B
The 6 month old child, after a few seconds,
will look at the screen B
He knows that that sound goes with what
is being shown at screen B
Today we can't do using artificial intelligence
We don't know how to do this that
a six month old child can do
So if we discover how it is that we learn
that will be a great evolution
in terms of how we can make AI work
Now let me talk about some of what we do
know how to do
It's a lot of things, there are many research fields
and there are many choices around and
there's actually a lot you can do and earn money
If you want to found your own company and
make lots of money by solving a problem we
already know how to solve,
you just need some structure
That's specially true when we talk about
processing the Portuguese language
There are algorithms out there but not in Portuguese
at least to the extent I know
Well, let me talk a little bit about
some stuff I did
I'll show you some software, patent, some stuff I did
I also brought a quiz just so we can discuss
what is easy to do and what's hard
I brought some real problems,
research in artificial intelligence we see around,
importance of machine learning and
parallel processing, which I study, that is a method
to have, say, 500 times better performance
in a computer using graphics cards,
and this is a recent development
Well, I graduated from ITA, MEC05,
I stayed some time in CTA, IAE, at the
Defense Systems Division and then I
went to Petrobras. My work there involves
technology development for construction and assembly,
especially for large-scale equipment and jobs,
which is a field that, to me, needs some serious
improvement seeing as some of the techniques
are extremely old and much is done as it is
simply because it has always been like that
So that's a very interesting area
I think I am really lucky to be where I am
While I was here in ITA I used to
develop using MATLAB and then I decided
to create something akin to MATLAB but
not quite the same
So I developed some algorithms, some stuff
with teachers here, with Adade, with
Heinzelmann, which I think is not here anymore
And then sometime later I got this
certificate from INPI, which is a
patent for softwares
I made this software in my senior year
It has some algorithms I intended to test
Much of what's here is available in more
modern applications now but
nevertheless it was an important experience
for me not just because I wanted this patent
but also because it helped me understand
the numeric methods which were implemented
in MATLAB
[pause]
I'm hoping nobody saw the answers there
Let me ask you now, what can we do with AI?
This OK symbol means
"We have a commercial solution",
that is, one which works satisfactorily
This symbol here means "well, there's research,
there's a prototype, but not a product"
And this question mark means
"we just have no idea".
Now let me ask, autonomous vehicle driving?
We were talking about this earlier,
anybody has any idea? [audience]
Yes, Google has got its driving license
but still, right now, we don't have
a commercial solution, a satisfactory one
So this one is in an advanced research state
How about split voices from N microphones?
Let's say I put 5 microphones in this
room, or 10 microphones, and 3 pairs
of people talking, can I differentiate
all these dialogs?
[audience]
Yes, right, that is possible but there's no
commercial solution readily available
There's good amount of research about it
and it already works, it's doable
For you who want to know more, it's done using
SVD, singular value decomposition.
Split two voices in a single microphone?
This works like this: I'm recording
a dialog here of me talking to somebody else
Can I make a software that reads this audio
and splits what parts are my speaking
and what parts are his speaking
is there anything available for that?
Is that in an advanced research state?
Or is it that we don't know?
[audience]
Well, let me tell you, it looks
to be the same problem, but in
fact we have no idea
If it's a man talking to a woman,
then we can use the frequency,
because male voice has lower frequency
than female voice.
However, if the two people have about the
same voice tone, we don't know how to do it
There's a lot of research but nothing
very convincing, although it looks simple
Next, speech recognition
Like, get this audio of me speaking
and turn it into text
It's not the contrary, I mean,
not Hawking's equipment where he types
and that computer voice speaks
what he wants to say, that's not it
It's read what I'm saying and turn
that into text
[audience]
That exists in Siri, but this has a detail here,
do we have that in Portuguese? I don't know
I'm pretty sure that doesn't exist in Portuguese
[audience]
But this 15 year kid work wasn't quite
like how Siri works
Identify the idiom of a given text
Let's say I give you a text file which
is written in some language
What I want to know is which language it is
Google does that, this is a solved problem
For you who want to know more about the topic,
that can be solved using
letter bigrams and trigrams,
I don't know if you've heard of that,
but that is how you solve it.
That is, I get many texts in a given language
and count how many times I saw AB, AC, AD
up to AZ, ZZ
I count these in various languages and then
I get to know which are the most likely
letter sequences in each language
So, for example, if a word contains DER
the chance it's a German word is very high
Correct "sessão" to "seção"
[audience]
Yes, Word can do that, but "sessão"
with double S exists, there's that detail
[audience]
It gets green? Yes, there's that,
Word does it, but systems in English do
it better, it's not that good yet in
systems in Portuguese as it is in
English systems
That is, detect that the meaning in the
text fits better using another word
that has the same sound
This exists, but the best systems
are in English
[audience]
Watson? No, Jeopardy is something else
I'll talk about it later
I even have some slides here about
these question answering systems
It's a little bit different,
the technology used is different
What I use to correct "seção" to "sessão"
is a statistical procedure
For example, let's say I have
"sessão de cinema" and "seção de supermercado"
I know that I saw "seção de supermercado"
in many texts using "seção" with "ç"
and I saw few texts containing "sessão"
with double S before "supermercado"
So it's statistically more likely that,
before supermarket, I'll use "seção"
with "ç" and before cinema I'll use
"sessão" with double S
It's a simpler analysis
If you want to do it like Watson,
which was the champion of Jeopardy,
you need a deeper analysis of the context
You need to understand more about
the semantics of the sentence
That is, this adjective belongs to
this sentence and so on
"Fulano" is husband of "Ciclana"
You need to understand that "is"
is a verb used to link,
husband refers to the guy who is
the husband and "Ciclana" is a
complement of husband, get it?
It's not just statistics,
it's a more complex problem
Now, classify some news article as "sports"
I give you an article, and
I want your system to tell if it
has to do with transportation, politics,
what do you guys think?
[audience]
This classification problem, that
we already know how to do
A statistic classifier will do the job,
kinda like how we correct
"sessão" and "seção"
If you want to know more about how
to do this, perhaps the most used
algorithm is called Naive Bayes
It's a statistical algorithm
Given a text, recognize sarcasm and
meaning inversion in the text?
[audience]
We have no idea about how to do it
There's research in this area but
nothing too relevant
Language understanding to this level
is still a very difficult task
for a computer
What's that? Yes, maybe
[audience]
Yes, more or less
The thing is that it's too dependent
on the context, you know
For example, if I say I saw Geicke today
walking by, you'll find it funny
whereas a computer will not
[laughs]
Yes, but you know, this depends
too much on the context, understand?
I mean, unless it is very obvious,
that is, some word in the text has
been modified by various positive adjectives,
which we can do using parsing, and
in the end it is modified by a negative
adjective, then we can have an idea that
this is not really the case, but not
how we people do it
And finally, I give you various pictures
which contain either a cat or a dog,
as many pictures as you want, say,
one million pictures of cats and dogs
Now I give you a new, unseen picture
what's in the picture, a dog or a cat?
This is a task we humans do without
any problem, isn't it?
Make a computer program that does that,
is there research, is there a solution,
what do you think?
[audience]
We have no idea about how to do it
What has been done around, and
people have been trying to do it,
but the correct classifications were
about 50%, that is
[laughs]
Now that's easy, you make a software that
has a 50% chance of outputting dog
will have the same performance as
some of the research around there
So the message here is that some
tasks that look difficult may in
fact be easy and some tasks we do
on a daily basis are really complicated
to implement using a computer
Let's now turn to real problems
Some of them have been solved
For example, creating a text editor
with spelling correction when you
miss a letter is a solved problem
in any language
All you do is keep a dictionary of
valid words and when an unknown
word appears you highlight it
Software Word has been doing that
for a long time now
But there are some problems,
and that's where opportunities lie,
that we can solve with what we know
But the problems are out there
waiting for somebody to put together
an interface, you know, waiting
for somebody to present a solution
What I mean is that we can solve
some things with the science that
is already available
Things that have not been solved
And there are problems for which we don't
know if a solution exists
And I'm not talking about NP-hard problems,
problems that are too difficult,
there are some problems that we don't know
if they are NP-complete and some we
know they are, but it may be the case
that a heuristics exists that can
solve that problem at least to a
satisfactory extent in an acceptable time
There's a lot of state-of-the-art research
in the field of AI about these
intractable problems but you may get
a heuristic, a mechanism to at least
sometimes get the correct answer in
a reasonable amount of time
These are research opportunities
This gives rise to what I call the
Industry technological paradox
This is how it works:
we have some known technologies which
are widely used
And here we have the boundaries of
our knowledge, where researchers work,
where there's research budget
And here there's an area we
don't know
Industry has problems that could be
solved using known technology;
some with a low technology level,
some with a higher level
Industry has some problems that would
be solved using knowledge in the boundary
of what we know
And when I say industry, I'm referring
to Petrobras, where I work,
the Air Force itself, where I stayed
for some time, Vale, Odebrecht, which is
a big contractor, you know
I'll give examples of these problems
in the next slide
And industry also has some problems
which are beyond what we can do,
especially involving materials science
and nanotechnology
The focus of industry, what we see happening,
is that most of the solutions used are here,
in the lowest region of known technology
Industry doesn't use more advanced
known technology, it's very restricted
to basic stuff
For example, control systems:
what you'll see around are PID systems,
you'll know when you have this class,
but PID is not state of the art
Maybe some processes could be controlled
using lead-lag, AI, fuzzy logic which are
known techniques but not used
And the focus of academia, which is
where research lies, is this,
the boundary of knowledge
So researchers won't research here
because they understand this is a
solved problem, somebody did it,
and he's not going to publish
a paper about the topic
But then there are these problems in
this region here, see?
These are problems that we could solve
using current technology but that
industry won't use since it's not
its focus and researchers don't
study because either he won't publish
a paper about the topic or because
he doesn't know the problem exists
And this region here is very promising
in terms of making money
I'll give you an example: Instagram
This guy sold Instagram for 1 billion
That's a lot of money, isn't it?
Is there anything unknown there?
No, you just take a picture and
apply a filter
We've known how to do this
for a long time
But the guy envisioned how to sell
this to the correct people
He earned 1 billion
Was it worth it or not?
That's good, isn't it?
[laughs]
You can have quite a good barbecue
with all that
And then in this region
"I don't know how to do it",
industry will tell you
"I don't know how to do it"
because it has never been used
and people don't want to take risks
"This thing here has always worked,
so why should I change?"
And the guy from the academia will say
"this won't generate a paper"
But many times it's because
he doesn't know the problem exists
So here, coming back to this paradox,
I'll give you some examples of
known technologies which are used,
problems and stuff in the boundary
of knowledge, but in companies
So, for example, electrode welding
This is known and widely used
Welding with controlled short-circuit,
which is a more productive process
and has a better weld quality
This is a known process in industry,
there's even equipment to do it,
but not all industries use it because
since electrode has always worked,
why should I change,
understand what I'm saying?
And here laser and hybrid laser welding,
this is boundary of knowledge,
but that could generate to a company
like Petrobras millions in profit if
it existed, because of productivity
For example, if I can produce oil from
a reservoir today, it's worth a lot more
than have it produce a month from now
It doesn't matter if laser welding is
expensive because I get so much money
producing today that I might be able
to afford paying 20, 30, 40 times
more and use this type of welding
Text editors, spreadsheets and
CAD drawings, that is known
There's AutoCAD, Word, and many other
programs that do this
Version control in a collaborative
environment: have you ever used
Google Docs?
Have you used that one like Excel,
that spreadsheet from Google Docs?
When there's more than 1 person
using it, the cells each person
use gets highlighted
It's an excellent tool for
collaborative work
Now you ask me, do people use that
in the industry?
Do all companies use it?
It's not, and it is very useful
because many people can work
simultaneously using the same document
There's no such thing as
I got my copy and I'm working here
and then somebody else got his copy and
worked there
Then to get what I did and put together
with what he did, somebody's going to
have to rework what I did and what
he did and put it together, although
this technology already exists and
it could have been used
And it's not used in many industries
And one of the things that are
boundary of knowledge is context
dependent search
So, say I have many texts from
many construction jobs
Let's say, in a given construction,
I get a daily job report
So every day, I get a paragraph
describing what happened in the
construction
After 3 years, I have a lot of stuff
to search if I want some information
So if I want to know, when did that
accident happen in which some guy broke
his leg?
Then I search leg in these 3000 documents,
I'll find the leg of a chair, the leg
of some equipment, understand?
It is going to be difficult because
although the data is there I can't find
it because I can't do this disambiguation
of a person breaking a leg and
some equipment breaking its leg
All right, you can do that in some cases
but this is still a research area
which would be very useful for companies
Now this is what I was talking about
when I mentioned control
Much of what is used in the industry
in ON/OFF control, which is the
most basic of all
Now Lead-Lag and optimal control are
known methods but they are rarely
applied or not applied at all
And control using machine learning,
this one we still need to
develop more to apply in industry
That's because it's difficult,
even if you have a research that works,
to transform it into a product that
has the required reliability to be
used in a company such as
Petrobras, Vale or Odebrecht
Let me get back here
What do I mean by this?
These problems, for you who want
to create a business and earn a lot
of money, these problems
whose technology is already known, but
that industries can't solve
because it has always been done
some other way, these problems are
an excellent opportunity
to earn money
Not if you want to become a researcher,
but there are many problems such as
these ones that we can solve,
knowledge is available,
but the problem persists, why?
It's because people there don't
worry about changing what's there
because it's been there for a long
time, and sometimes people here from
the academy don't know the problem
I want to show you some videos here
about research about AI which are
fantastic videos
About this research that I was talking,
processing Portuguese, that is,
transform what I'm talking about
into text in Portuguese or make a
sentence recognition system in
Portuguese, or make a system that
can automatically correct answers,
there's a lot of opportunity there
These are things that exist in
English but not in Portuguese
Let me show you some of these videos
The Stanford autonomous vehicle
This video is available in Youtube
Let me forward to the interesting part
For you who know, this is the guy
who, Sebastian Thrun, who made the
car which won the DARPA Grand Challenge
by crossing a desert
It's a car without a driver
And this is a demonstration of
some of his research
These are some images of the DARPA
Grand Challenge
Let him speak
[video audio]
Notice that this car is being driven
by a machine, there's no driver
There's a guy there watching in
case it's necessary to brake or
something like this, but all driving
is autonomous
Watch this
I'll stop here to comment
Can you see these yellow and red
parts in this drawing?
This is related to the positioning
system of this car using
distance sensors and this green
trajectory here, can you see?
This drawing is based on search
algorithms originally developed in
gaming, many heuristics used here
came from games
And the localization system
of the car has sensors there, but
the basic algorithm is quite simple,
it's called particle filtering, and
the difficult part are the
sensor models, if you care to
know more about this topic
So, how about it?
If someone wants to make an
autonomous car here at ITA,
I think it's an interesting challenge
But I would suggest something
different, because you saw that
the city car is at an advanced
research stage now
But making a car that can cross
mud and offroad situations,
and map the environment,
that doesn't have such an advanced
research status and it would
be useful in a country like
Brazil for, say, topography
Now this one here, I don't know
if you've already seen it,
have you seen the Stanford helicopter?
Just to make it clear, this
helicopter has algorithms to
perform complicated maneouvers
that professional pilots take
many years to learn and they
implemented algorithms to perform
these acrobatics automatically
So all you do is press a button and
the helicopter will do what you asked
And this was done using
Machine Learning techniques
This one here is also interesting
We talk about computer vision,
path planning, object mapping,
this one was an initiative to
put this all toghether and
perform a task
You'll see what this is
I'll turn up the volume
What he asks the robot is:
"robot, please fetch my stapler
from my office"
and he does that using his voice
We have Siri today but that was
not available then
So he issued a voice command
to this robot here
I'll fetch the stapler
That is, it knows where it is
in the map and now it's
looking for the stapler
Where it is?
[audience]
Yes but this is 6x fast forward
[laughs]
That is, well I find it interesting,
but we have a long way to go,
the true speed is 6x slower
than that we saw
Here I put some stuff about
question answering, a research area,
which is what we were talking about
earlier, about Jeopardy and Watson
I put here
"Which is the sixth most spoken languague?"
this is Wolfram Alpha, I don't know
if you know it,
and this is what it understood,
sixth most spoken languague,
and this is the answer, Portuguese
"How many calories are there
in an avocado?"
Avocado, and its answer here
So this is an intelligent system
It may look simple, but to do that
the system has to extract information
from text written in natural language,
there's no formal structure like
a programming language or a table
And Google has that too,
I don't know if you've tried to
type a question in Google,
but I put here
"Who is the founder of Apple?"
and it returned here the founders
We were also talking about this
statistic approach to language,
and that there's a lot to be done
in Portuguese
Take a look at some Google translations
"A project report shall be presented
every other week".
This translation here is correct
This "every other" in English means
at every 2 of this period of time
Every other day, every other week,
every other trimester
Now take a look at what it does
in this other sentence:
a project report shall be presented
every trimester other
Why does it do that?
Because "every other week" has been
translated many times into
"a cada duas semanas" and it thinks
this to be statistically probable
Now "every other trimester" wasn't
done correctly because it's improbable
I don't know if you knew this is
how Google translate works, but
it is a statistical approach
to translation.
Look here, "every other year",
"every other month",
now look at this:
One meeting, yes, the other, no,
and it translated to
every meeting other
But I got another example
after I had prepared this
presentation which is even more
interesting, let me see here
I did this test here:
tree bark is tree bark,
and dog bark is the action
So I came up with this sentence
intending to be ironic:
Dogs bark a lot, but a tree bark
has never been seen
Now look at this: it translated to
dogs bark a lot, but a tree outer part
has never been seen
[laughs]
So Google and these translation systems
don't undestand context, but
nevertheless the probabilistic methods
work well
Now about Machine Learning,
changing the subject a little,
is a fundamental piece in all this
So let's say I measure a set of
points like this, which model
should I adjust?
Do I want a straight line?
A straight line probably won't
be very good
Do I want a parabola?
Probably yes
But I can construct a polynomial
which passes through all these points,
but do I want that?
The old way, so to speak,
and by old I mean before 2008,
it's not that old,
was to put all this data into a
statistical training system and
output: "Well, this fourth order
polynomial has a R-squared statistical
error of so much, 99%"
Nowadays, however, we have more
modern techniques to do this,
especially when you need to generalize
If you want to talk about this later,
I have more material about the topic,
but since I've been speaking
for quite a while, I'll give this
to whoever is interested
Other machine learning techniques:
like I was saying, before 2007 and
around 2008 we would proposed a model,
take measurements and fit a
least squares model, if you've
heard of least squares, it's a
regression method that allows you
to build a model
Now what do we do today?
We do something called regularization,
which are techniques that allow
us to reject outliers, basically
And we want to learn a model that
is not too specific to what you
observed, and this is
such a common mistake:
This guy goes to the lab and
takes 100 measurements
Now he builds a model that
fits his 100 points perfectly
Yeah, but so what?
I don't care about his 100 points,
what I want to know is whether he
can use his model to predict
future, unseen instance,
that's what matters
And this is the focus of
current Machine Learning
Another focus is learning by
examples
Have you heard of this MNIST dataset?
It's a database of handwritten
digits, like these ones here,
which are used to test
recognition systems
So what does your algorithm
receive as input?
It receives this image, that is,
a matrix of intensities, and
the algorithm has to decide that
this is a 7, this is a 6, and this
can be performed very well using
example based learning, that is,
I give your algorithm an example:
Look, this is a 7. This is a 7 too.
This is a 7 too.
When I give it another 7, I want
it to have learned this structure
And then there are techniques
to do that
So what would be an overview
of modern machine learning?
What is mostly used is informed
search, or heuristic search,
which Google uses to plan paths
for its car
So, for example, the traveling
salesman problem is intractable
It is an NP-hard problem
But there are solution modes,
for example, if I sum the
distances I have a limit,
an upper bound for the best
possible solution
Based on that, I can create a
system that does the search in
a more intelligent way
It doesn't need to try all
possibilities, it only goes
so far into the search tree
Supervised learning:
in the previous example I told
it: what is a zero?
what is a one? a two?
I gave various examples
Then there are techniques for
the computer to learn based on
these examples
Then when a new, unknown example
comes it will know what is a 0,
what is a 1
Now, on the other hand, there's
unsupervised learning, which is
what we were discussing here:
For example, if I have a database
containing various facial
expressions, I want to throw all
that into the algorithm and
have it tell me how many
different facial expressions
there are in that database
Or in the case of handwritten
digits, somebody who writes the
4 with a leg, and somebody who
writes a chair-like 4, these are
different types of 4's
So if I give the algorithm
a bunch of 4's it might not
know beforehand that this
is a chair-like 4 and this is
a legged 4, but there are
techniques to tell,
given a set of examples,
that there's a category here,
which looks similar,
and this other group is also alike
I can't tell what it is,
but I do know this one is
different from this other one
This is a way to learn
Afterwards an expert may come
and tell "this is this,
that is that", and label
the categories
And reinforcement learning is
used to perform real-time
decision making, such as the
Stanford helicopter
The helicopter receives an
input command and, depending
on which state it ends up in,
which is its attitude, its
speeds etc, and computes a
reward function for this
I took this action when I was
in this state, and I ended
up in this state, which is good
So this is a good action
Now I did this here and lost
my helicopter, I lost
my entire research, and that
is not good, I don't even know
when I'm going to be able
to test this again
But what has been studied a lot
recently are methods based on
probability and statistics
That is, given that I'm here
and I can see this wall from
this distance, and this one
from this distance, and this
from this distance,
where am I?
I can answer that if I'm in
this location because I have
a map of this region in my brain
But can we do that
using a computer?
The answer is yes, and I
don't know if you how the
Google car works, but one method
it uses to know where it is
is by watching the ground
So based on this history of
ground it has already seen
it can locate itself and
know where it is in the road,
and which is its geographic
location, whether it's
left or right
One of Google's car problems
is to drive when there's snow
When there's snow the car can't
see the ground so they don't
have a solution yet to solve
this snow problem
Maybe they'll use Street View
because you can get a tree
here, a traffic sign there
And this set of information,
I mean, a single sign, alone,
doesn't help much in localization,
but the set of signs:
well there's a sign here, then
10 meters from here there's this
other one, then 10 meters more
I see that tree, you put all this
together and you have a
good probability of being able
to locate yourself
These are graphs that are
used in searches, and this is
a method of supervised learning,
where this segmentation is
labeled as building, sky
This one here depicts reinforcement
learning, that is, throw this mouse
in this environment
If it gets to the cheese, it gets
a reward which is the cheese,
so as time passes it learns
which is the path it should take
that is a path where
it doesn't walk too muchh
and still reaches the desired reward
Now it's still very important
to know the problem
In heuristic search, if you want
to have a good heuristic, you
need to know the problem
For example, if you want to
create a program that plays chess,
people who know how to play
chess will develop a much
better chess software than people
who don't, because people who
know will be able to identify
advantageous positions
This guy who can't play chess
will have no clue
So the software of the guy who
can't play chess, if there's
infinite processing power,
will get to the same result
But since that is not the case,
having a good way to evaluate
positions without having
to get to the end of the game
is very valuable
In the case of reinforcement
learning you need to build a
simulator, which involves
knowing the problem
The message here is that it
is mandatory to know the
problem; there's no computer
mechanism so that you throw
anything in there, it recognizes
what should be done and does it
That's not very close to happening
And lastly, what is this parallel
processing that arised recently?
It is a new technology to use
graphics cards and multicore
processors to accelerate certain
processing tasks
For example, in medical imaging
there's been a 45x speedup
In fluid mechanics 17x,
planet interaction, 100x,
some medical research,
up to 400x, gas diffusion,
35x
In summary, let's say I want
to run some application in a
server cluster and I need, say,
10 computers to do the job
If I use a GPU and apply this
technique, it's as if I had
bought 1000 more computers,
because the algorithms are
so much faster
Now look at the impact such
thing would have for banks, for
example: banks need to do secure
key exchange for every transaction,
based on number theory
This is an expensive computational
procedure: generating a prime
number takes a few milliseconds
And then you have to do it
millions of times
Now if you could accelerate this,
you'd go to the bank, install
GPUs in each server, modify the
code and suddenly it's as if
the bank had 100 times
more servers
There's a good opportunity
to make money, for example
Now there's a world of possible
research that can be
done with that
For example, you could plan
the sequence of construction
and assemlby
How am I going to build
this equipment?
In Catia, for example, how
should I position this part?
It has a module that can
compute collisions and
stuff like that, but there's
no computation of a
completely automatic trajectory
that avoids all collisions
At least I think it's not
fully developed yet
That's an opportunity
3D reconstruction:
Let's say I get multiple
cameras and try to make a
3D map of this environment
Oh, yeah, I'll hand you
this 3D camera so you can
take a look at it
So, this 3D reconstruction
thing from structured points:
if I measure distances
in a point cloud, how do I
go about reconstructing the 3D?
If I have multiple cameras
to find objects, or if I
have multiple Kinects,
for example, how do
I do that?
Now, parsing and speech
recognition in brazilian
Portuguese
For example, telephone
attendance systems
In the United Stated you can
call to buy a ticket, you'll
practically talk to the
answering bot
Here, in the telephone, you go:
Press 1 for this
Press 2 for that
Press 3 for that
Of course you always need
option 7, and you need
10 minutes for each part
of the menu, and then
you still end up having
to talk to a real person
and you hang there waiting
15 more minutes
See, that's something that
doesn't exist in Portuguese
At least I have never made
a call to a company that had
such a system in Portuguese
And, you see, the technology
is known, so that Siri already
exists in the USA
Why not make one of these in
Portuguese? Is it that
there's no market?
I find that very implausible
Synthesis of voice and
instrument sounds
This is to create music
For example, there's this
Encore software that allows
one to input a score sheet and
it will play the instruments
But hey, why can't we do that
with voice?
Let's say I record a singer
sing various phonemes and I
want to compose my own song
and have this guy sing it,
so that I can test and see
how it goes, can we do it?
Yes, we can
Of course you'll have to
study how to connect
the sounds, there's some
research there, but it's
a very interesting topic,
and it's doable
If one of you is interested...
[audience]
Now what I was talking,
about banks, this arithmetic
modulo N is what is used
in cryptography
Banks and credit cards can
use a considerably lower
amount of servers to perform
the same task
So instead of buying 1000 more
computers they buy 1000 GPUs,
which is way cheaper
Integration with microcontrollers
and sensors, in order to do
integrated product development
For example, you know
the Arduino, which is even
supported by Google
Let's say I want to
make a table that maintains
uniform illumination
I need this because I'm assembling
some electronic circuit and
do some precision work
What I can do is I put
many high power LEDs
and measure the light intensity
throughout the surface and
control the power of each LED
so, for example, if there's a
light source to my left, the
illumination of the table
should remain uniform
So I can read data from these
sensors, transfer them to a
computer, process them and
give back to the controller
what should be the values
of each LED intensity
That's hard to do using only
a microcontroller because
the model is not that simple
Particle filtering and robot
localization: this is what
we saw about Google's car
There are some limitations:
for example, to drive in snow,
it will have to process
images from the sides, and since
there are many images that's
harder than just
processing ground images
So you need to embed more
processing power, which is also
a good opportunity to use
parallel processing
And nonconvex problems, which
are problems that require
brute force: have you heard
of Rainbow Tables?
It is a scheme used to
break passwords
When you log in Windows,
it saves your password,
or else it won't remember
your password, obviously
But it doesn't save your
password directly, what it
saves is a hash of the
password, which is a
modification
You can't get this hash
and reconstruct the password
but since passwords are usually
short, ranging from 6
to 8 characters,
this is what they do:
they scan through all possible
passwords and generate all
corresponding codes
Then he goes to your computer,
retrieves your hash code
and reads your password
from that list
Just so you know,
passwords up to eight characters
are already broken,
This means that if someone
goes to your password-safe
computer and reads the
folder where your password is,
he can retrieve your password
hash and discover the password
if it has less than
8 characters, and in Windows,
up to Windows XP, even if
the password had more than
8 characters, Windows would
split the password into
blocks of 8 characters and
save the corresponding
hashes, which doesn't help
So it was useless to have
a longer password against
these Rainbow Tables
Now some suggestions:
these are some things I have
been doing, and if you
want to know more I can
give the code and explain
how it works and show details
This is 2D and 3D collision
That is, these two vehicles
here don't touch each other
although this part here
is intertwined, but
there's no collision
It's just that they're, like,
one inside the other
So we need exact collision
detection and it's possible
to accelerate that using GPUs
That's not how games work
today, as an example
Games compute various boxes
around the character and
computes collisions using that
So a shot that barely hits
is understood to have hit
It would be interesting
to use this technique in a game
You could write a
Counter-Strike that works
like this: someone got shot
You could check that
the bullet hit the leg
and reduce the speed of the
character, do stuff like that
In Counter-Strike, it's
either hit or miss, unless
it's a headshot, in which case
it renders a head but
computes collision using
a cube, which is easier
This is a performance
comparison chart
This line here shows CPU
performance as a function
of number of polygons
And this one here is
the GPU result
And you see that there's
a lot of difference:
if you extrapolate this
blue graph here to compute
this amount of polygons
it won't fit in this
area by a large deal
Just check the
derivative here
OpenCL and cryptography
Parallel processing
Many opportunities and challenges
One of the challenges is to
build a secure system,
if you like cryptography
Even if you implement the
algorithm correctly, it may be
the case that it is insecure
That's because, for example,
a memory card: it implements
a cryptographic key, and
if you get this card and
measure the energy consumption
of this card, each part here
in this graph is a step in
the cryptography algorithm
So if you get the card and
measure its power consumption,
based on these peaks here
it's possible
to extract the bits of the key
So if that's running in
your own PC, you'll be
running the cryptography
in one of multiple cores
If there's a malicious code
running in another processor,
it turns out to be possible
to watch the processor
cache and check how many
times the crypto code needed
to fetch data from memory
instead of using cached data
And then it can figure out
your password
Why wouldn't that be possible
with OpenCL?
Or at least I think it's
not, I don't know, there's
not much research on that
At least I don't know
The thing is that using
parallel processing techniques,
you can control whether you
want to use all processors
at the same time
So when you run the code
in parallel, you can
configure the system to
run only your code
So it's not possible
to do a side channel attack
on a GPU, to the extent
I know
If the GPU is running crypto,
it's all it is doing, and
besides it will be much faster
This regards convex
optimization and quadratic
programming, there are
multiple methods to
solve linear programming like
the simplex method
This is a more general
interior point method
and it can solve more
complex problems
Whoever likes fractals,
fractal generation with
parallel processing can be
800x faster
So for example, these images
here are from NVidia if
I'm not wrong, and this image
here I generated in a software
I wrote to run in the GPU
You can have this
type of result
OpenCL and shape recognition
This here is...
I even sent you the link to
a video in which there's
recognition of some coins
I have it here and I
can play it at the end
if you guys want me to
The question here is
this occlusion part
Can you see this shape?
Say, for example,
this region is filled
and this region is filled,
notice that this whole
region is going to
be one single shape
so if I get just the
boundary, this here is
the boundary, see?
There's no distinction
And one of computer vision
problems is when an object
is behind some other one
How do I keep recognizing
an object when one can
pass behind the other one?
So it's also possible to
accelerate this type of
thing using parallel
processing, and let me show
you this one quickly
This is an example
I'm going to load a shape
So look, I'll load this
geometry, this shape,
this one here, this one,
and this one
Now I'm going to open
this file here
So, you can see that
this part here has no
defined boundary
I can't...
did everyone see this here?
Interesting, isn't it?
This 3D technology
without glasses is something
that may appear in televisions
within some years
But what I was saying is
that there's occlusion here
This part of the image, see?
I can even get the border
but it's a connected border
and I can't... it's not that
I can't, it's that it's not
trivial to read this border
here and split the shapes
that gave rise to it,
which is the star and
these other known shapes
And here I'm going to
use OpenCL, parallel
processing, to find these
geometries here
So this is the result
Even with occlusion,
it can find geometries
Let me load this other
shape here, this one
So this has a very
interesting acceleration
Now let me open that
image again, just so
you can compare
Now check these times
It's almost 1 second
for this filter here,
practically 1 s total time
Let me deactivate parallel
processing and load
that same image
We can talk in the meantime
Now here, just the filter
took 4.5 s, then 3 s for
borders and 3.4 for
edge thinning
So, you see, in this simple
example, not so optimized,
I have a 10x faster algorithm
[audience]
Oh, to find it!
I didn't even measure
that time
You see that it's much
slower, even though I
didn't measure, it's much
slower than before
Well, I've talked a lot,
just to end here, I'd like
to say that today we do
things that would be
unimaginable 3 years ago
So methods we have today
in computer science,
algorithms, there's a lot
we have today that allows
us to do things that would
be unthinkable 3 years ago
So the evolution has been
fast and interesting with
parallel processing, the
computing power is quite strong
There are many unsolved
problems but that we can
solve, and that's an
opportunity to make money
If you want to deal with
some of these problems, you
won't be researching
state-of-the-art stuff,
but you have a good chance
to make money with these
problems, for example, the
ones I showed about welding
and document collaboration
I see many opportunities
regarding natural language
processing, especially in
Portuguese
And there's a world of
applications using parallel
processing, these ones
I showed you are just a
small set; I myself did
other things and I know
many other researches in
this area and there's a
world of applications
So if you'll accept the
challenge to work with
that, whoever wants to
work with this, do some
interesting research, you
don't need, say, to make
a complete cryptographic
system, but you can make
a small piece, publish an
article, do some other
interesting thing, do your final
work concerning a subject that
is under the spotlights
instead of just doing some
literature review which may
not help even yourself later on
Whoever wants to do something
interesting can do research, and
if I can help, I will
If someone wants to ask
something, thanks for the
attention, I hope I showed
something interesting here
I hope you liked it and if
you want to ask, if I can
answer, I hope you're ok with
the many I don't know I'm going
to say...
[question]
So the website of this OpenCL
developer... well, first you
have to pick a platform
And today we have 2 of them,
basically: there's CUDA, which
is NVidia's language, I don't
know if you've heard of it,
and OpenCL, the Open Computing
Language, which works on
various platforms
So CUDA is more mature, but
OpenCL can be used in AMD
GPUs, Intel CPUs, so it's
more versatile
If you want to use CUDA, there's
NVidia tutorials which are an
excellent starting point
And for OpenCL what I'd
recommend are the examples in
the website of the manufaturer,
which is Khronos Group
Also if you're interested you
may want to take a look at
some of the things I wrote in my
website, there's some
interesting stuff there too
Take a look at some examples,
some applications, create some
simple code, you know, get
some familiarity with it
[question]
Well, when I was here at ITA,
I discussed this with people
who studied computer science
in my class and we talked about
Moore's law, which says that
processing doubles every
18 months and so on, but back
then in 2005 we were already
reaching this stage where
it was not possible to increase
processor clocks too much,
because a physical limit is
being reached, meaning that
current processors won't go
much beyond 4 GHz at this point
So at that time we would say:
well, if this can't be done
anymore, the most logical
solution is to use many cores
And so let's say I have 20 cores
in my computer, I don't want to
run 20 programs
Instead, I want the software I'm
running to run fast
So I started to look at that
see how progress was being made
There was OpenMPI, MPI, which
is a language used for computer
clusters, high-performance clusters,
but I didn't have a cluster, that
made things harder
And then 2, 3 years from now
CUDA already existed but I thought,
at the time, that being platform
specific was too limiting a factor
And then came OpenCL, we started
to study, we decided to create
a website to consolidate the
things we did and then...
I learned much with the website
itself, up to the point that...
Have you heard of LAPACK?
LAPACK is what runs under MATLAB
It's what MATLAB uses to solve
linear systems, it's the heart
of MATLAB
We wrote a sort of LAPACK, a
simplified version, to run in GPUs
and we got better performance than
LAPACK, which is quite interesting
considering LAPACK has been
developed for more than 10 years
[question]
It was me, Edmundo and Diego,
guys from my class, Edmundo and
Diego studied computer engineering
So this is how it goes,
start with something...
Get a "for" loop, and this for
has an i which goes from 1 to
1000, well that's something
Just by doing this you can get
a 5x faster algorithm
People kill for 20%, 10%
And then you go on to learn
other structures
There are algorithms which can't
be fully parallelized
For example, say I have a really
large vector and I want its sum
What is most logical is to
accumulate that sum in some number
but that can't be done in parallel
because the next value depends
on values I have summed previously
And then what you do are more
complex parallel structures, you
get all this, break this big
vector into pieces, then each
core of the GPU sums a piece, and
then I'm left with, say, 1000
numbers, and I have to sum only
these 1000 numbers instead of
1 million numbers, for example
But that was how it went,
I learned slowly, had this notion,
even a strike of luck to see that
parallel processing was a
tendency in a future which is
practically true now
That's somewhat how it was
[question]
Yes, I intend to get my doctorate
here at ITA
[question]
Well what I intend to do here
are robotic systems, robot aided
systems and computer vision to
build large scale equipment,
because in Petrobras there are
many construction jobs, like
the COMPERJ, the petrochemical
complex, refineries in the
northeast, there's the fertilizer
plants, and many large scale
equipments will be built, and
by using robots we should be able
to reduce how long it takes and
the cost to finish these jobs
and get to use them earlier
Like I was saying, this is worth
a lot of money, the opportunity
cost of having a refinery ready
1 month earlier, that is
worth millions
So this is my intention here
with ITA researchers, and it
has applications in large scale
construction, like refineries
and the oil industry
I don't know if that answers
the question
[question]
Oh, right, it's
www.cmsoft.com.br
[question]
Which?
[question]
Oh, yes, we wrote the software
So this is a real-time application,
let me show you the video,
so this here, this parallel processing
to identify circles in real time, this
wouldn't be doable without
parallel processing
In other words, 3 years back we
couldn't use a single computer; I'd
need a cluster to do this here
So, you see, even though there's the
pen in the middle here, it keeps on
identifying the coins
Look here, there's a pen on top
of the coins, but it still sees them
Now here there's this strange
thing, it's unrelated, but...
Now here it understood there was
nothing there, but the pen is
still here, it's on top of the
coins, right on top
See occlusion here
This is a known algorithm, we
parallelized it, we made some
adaptations to make it parallel
[question]
Yes, similar, it's the same
algorithm, and the name of this
algorithm is the Hough Transform
It's just that this is the
generalized version, I'll show
later why, I basically don't need
an equation, I just need a shape
and I can recognize any format,
independently of occlusion, which
is the main advantage of the
technique: I can see an object
even if it is partially obstructed,
that is the application
[question]
Well, you know, this I showed you
here, if you want to ask me
anything, send an email and
work together
Of course I don't live here
in Sao Jose, but you can count
on me, I can't dedicate exclusively
of course, but send me an email,
ask what you want, if I can help
I will help
So, that sound synthesis, I found
it interesting, how do I do that?
I have some stuff I wrote,
some libraries, I'd be OK with
passing it and helping you do it
There's a lot I would like to do
but I can't
So it's interesting for me, because
it's a new application of parallel
processing, and I think the person
who gets to do it will learn a new
technology, which is very promising
Within 1 or 2 years there should be
phones with a small GPU that
will be able to run parallel
processing, so this is something
that has an enormous potential
to grow, so if you want to do
something, if I can help,
do count on me
So this is the occlusion part
part, in which I can train this
format here, can you see?
There's no equation, and then
I can find a key among the coins
This here is because of orientation
So this is the algorithm,
it's advantage is that it is less
sensitive, or just not sensitive,
to occlusion, but there's still
the problem of scale and rotation
Computer vision problems have
many solutions, this is one of them
Here it's not real time, it's
accelerated, two times faster
But still...
[question]
Yes, quite the same
[question]
Oh, this software, this is open
source software I wrote, it's hosted
in Google Code, you can download
it, we put many examples in the
website about how to process images,
how to create a filter
In Google Code? The examples are in
my website, CMSoft, what's in
Google Code is the source code of it
Yes, that website
Then to process images and compute
border detection, you saw here
how much a difference it makes
It's much faster
Oh, count on me
Well, I hope you liked it,
count on me if you want to do
something in this field of
parallel processing, we can publish
it, post in the website, create
applications, create localization,
I will be glad to help
Of course I can't help all the time
but then there's the weekends,
overnight work, you know
Well I guess I'm not quite up
to that anymore
Well thank you guys for
your attention