Tip:
Highlight text to annotate it
X
[GITHUB PRESENTS Passion Projects Heather Arthur]
My name is Julie, I work at GitHub
and I'm also the creator and organizer of Passion Projects.
[cheering, applause]
[laughing] I'm really excited. [laughing, applause]
I'm really excited to have all of you guys here
and I'm incredibly proud to introduce
Heather Arthur, who is our speaker for tonight.
So, if we can just give her a nice round
of applause and welcome her to the stage,
that would be great. [cheering, applause]
[Heather Arthur] Alright, so just to give you
a little background. I'm Heather,
and harthvader on Twitter,
harthur on GitHub.
And I am a developer at Mozilla.
I work on Firefox, so that means
I write open source code, all day, every day,
which is really awesome.
In particular what I work on is
the Firefox developer tools.
So, these are the tools that
are built into the browser that help
web developers debug and create their webpages.
So, this is just a screenshot
of the stuff that I work on
but I'm not gonna talk about this today.
I'm gonna talk about something else
that I'm really interested in,
which is machine learning.
More particularly, bringing
machine learning to JavaScript.
So, let's talk about machine learning for a second.
So I think machine learning as a phrase
sounds really intense.
It kinda sounds like a robot apocalypse
to be honest.
I know, I remember telling one
of my friends
that I was doing machine learning stuff
and they were, they sounded terrified,
when I mentioned, and they were,
like, "Oh my gosh
you're making Cylons or something?
So, I think when a lot of people
hear machine learning, they think of
robots and math formulas
and it's actually a little bit like that
but not, it's not all like that.
You're not always making robots.
And it's also, can be really useful even
just for everyday applications.
So, I like to think of machine learning
as a collection of algorithms
and these algorithms are used
to solve problems that people have
been really good solving, but
programs have been pretty crap at automating in the past.
So, examples of this are spam filtering.
So, your email deciding whether or not
something's spam so you don't have
to see it or not.
Face detection, recommendations like
Netflix and Amazon do.
Character recognition, so digitizing books.
So there's some of the applications.
And if you sat down to write
a program to solve any of these,
let's say, spam filtering. If you sat down
to try to solve that without machine learning,
what you might do is, you might
look at some emails.
You might be like, "Alright,
well a lot of these spam emails
are in all caps, and I don't see any
legit emails in all caps."
So, you might make a rule that's,
if the email is in all upper case, then it's spam.
And then you might look at
some more emails and you might
be like, "Alright, well, a lot of these
spam emails are also mentioning
free ipads. And that's kinda sketchy
and my other emails don't mention that."
So you might make another rule,
that says "or if there are the words free iPad in it"
then, it's also spam.
So what you're gonna end up with is
a bunch of painstakingly hand-picked
"if" statements, basically. And even then,
you can't look at all the emails in the universe.
You're gonna miss a lot of things
just doing this manually.
And what machine learning algorithms do
is they find these "if" statements for you and they do this
better than you can and with less error.
So essentially, machine learning algorithms
are, like, better programmers than you.
Oh, just for these particular applications.
So, to sum it up, on the one hand
we have traditional programming,
where you just have, you choose the
conditionals and all the parameters.
And then on the other hand we have
machine learning algorithms, where the rules are learned
from the data. And so,
this is actually really powerful.
And there are some applications, in fact,
where a machine learning solution is the only really
manageable solution to that problem.
So yeah, basically, machine learning
super important and super useful.
So now let's talk about JavaScript for a second.
JavaScript is a ubiquitous programming language.
This is a screenshot of the top languages
on GitHub. I believe this is
the percents of repositories
where the main language is JavaScript.
And some language, or some repositories
are multiple languages, but I think
it's safe to say JavaScript is squarely in first place.
And not only that, but JavaScript is
essentially the only language that runs
in the browser and also runs in server
with no JS. So yeah, it's a huge language.
So it makes sense that we see if we can put the two together,
machine learning and JavaScript.
[Let's solve a problem] So let's do exactly that.
So what we're gonna do, is we're gonna solve a problem,
with machine learning, and JavaScript.
So this particular problem we're gonna solve is,
you get bombarded with
a lot of content today on the internet,
and sometimes you're wondering if you should bother,
checking something out.
I have this problem a lot.
So, you might ask yourself, "Should I bother clicking on this link?
Should I bother checking this thing out?"
So the answer to that questions lies
in another question, which is,
"Does it have any cat pictures?" [laughter] Course.
So this problem is called cat detection.
And this actually, this is not an original problem at all.
If you look this up, if you search for this,
there are research papers and PhD theses
on this going back years.
So it's, clearly an interesting problem.
So the exact problem we're gonna solve is
given an image, does it contain a cat?
We're gonna break down a little bit more,
something more manageable.
So just given a section of image, we're gonna see,
is that section a cat head or not?
So what we would need to do this, we need something
that will take an image and will be,
Yes this is a cat, or no it's not a cat.
So what we need is a classifier.
So a classifier does exactly that.
It takes a piece of data and it tells you which class it's in.
For our case, we have two classes:
cat and not cat.
And classifying algorithms are usually
trained on data where you
actually already know the class to it.
And some examples of classifiers,
there's Bayesian classifiers, for spam.
K-nearest neighbors, neural networks, support vector machines.
These are all different classification
algorithms used for different things.
We're gonna use the support vector machines.
So, to talk about the support
vector machine algorithm.
So, it's also another phrase that sounds
pretty intense but we'll get to at least
figure out the vector part in a second.
So these support machines,
just like I mentioned before,
are trained on data where we already know the answer.
So it's trained on labeled data.
And what I mean by data is,
basically, a bunch of vectors of numbers
arrays of numbers. And then,
what I mean by labels is, we're gonna
give it a one, or a negative one.
So in our case, we want to label a vector
with a one, if it's a cat, and negative one, if it's not a cat.
And also, in our case, what we have to do
we have our labels,
we know it's one it's a cat, if it's negative one, it's not.
But what is our data gonna look like?
So we know it has to be an array of numbers.
So we have to figure out how to take
an image and get an array of numbers that represents it.
[What's the input? Pixels] So, what's our input gonna be?
So one thing we can do, is just
straight up take the pixels from the image.
So we could just take the [RGB] values
and flatten it out into array,
or, we could just take the gray scale
intensities of each pixel.
In both those cases we're gonna end up
with like a ton of information, not all of it's relevant,
there's gonna be a lot of noise.
So we wanna help out the classifier a bit.
We wanna give it something it can
chew on a bit more easily.
So here's another idea, which is
we could just look at the edges in the image.
So this is the edge of an image.
Edges in images are places where
the image changes from dark to light
or vice versa. And here we've taken
the gradient of each of the images above
so each of these images on the bottom
are the gradient of the image above it.
So the gradient is basically,
getting the edges from the image.
And you see, after we've done this,
we've taken out a lot of information,
like the colors and whether it's dark
or light and we've retained, basically,
the only thing we've retained is the shape
of the cat head. So, that's really awesome.
That's exactly what we want and
we don't have a lot of extra information
that we don't need. So it's great.
So we're gonna do something similar
but a little bit more sophisticated.
So we're gonna do, is get the histogram
of oriented gradients of the image.
Another intense sounding word.
So it's also, I call it a HOG descriptor.
I've never heard it said out loud,
I've only read it on the internet and in papers,
so I hope that's how you really say it. [laughter] If you're a real
machine learning person,
but I call it HOG descriptor
and this will capture the edges
that we were looking for before
but also capture the direction of the edges.
So, we'll capture the angle of the cat ear.
So, and it does it, even better.
And a HOG descriptor is an array of numbers.
So this is perfect, this is exactly what we need to feed
into our classifier. So just giving
you an idea. The HOG descriptor
is gonna be an array of about a thousand numbers.
Alright, so now we have this, we know
what data we're gonna give the classifier.
So now we actually have to collect it.
So this is an important step,
getting a lot of data to train our classifier with.
So what we need to do is
we need to get a bunch of cat pictures
and a bunch of non-cat pictures
and we want a lot of both of them,
so the classifier knows about both classes.
So for the negatives, those are the images
without cats. For those, we're just gonna
download a bunch of images
from Flicker and crop out random sections of them.
And then for the positives, we need
thousands of cat pictures and we also
just need the cat head too, we need
to crop around the cat head.
So they're actually, there happens to be this amazing data set of cat pictures
Each one is annotated with the location
of the ears and the eyes and the nose.
So this is very, convenient. I can't believe
this exists but it does. [laughing]
So that's very nice for us.
So all we have to do now is
take each one of these cat pictures,
rotate it so that the eyes are level
and then we crop the picture so that we're
just framing the cat's head, like that.
And then also we want each of these
pictures to be the same size.
So we can feed it to our classifier.
Ok, so that's awesome. So now we have
you know, thousands of cats and non-cats.
And now, we can start training with that stuff.
So, training. This is where JavaScript comes in.
So, I'm just gonna go through this a bit.
So first of all, there are a couple
of packages that we're using
from the NPM package manager.
There's a HOG descriptor package,
to do the HOG descriptor.
And the SVM package, which will give us
our support vector machine.
So you're creating a support vector machine
and then we go through each of
these pictures that we have,
and then we first extract the HOG descriptor from it.
And then we give it a label.
One, if it's a cat, and negative one, if it's not.
And then finally once we've gotten all
these inputs and their respective labels,
we can call svm.train with this and put some labels.
Alright, so that's it. So we've trained
our support vector machine
and just to give you a better idea
of what's happening when you do that
what training does,
is it creates this state.
So this is a JSON object,
a JSON object that is the trained state
of our support vector machine.
I hope you can see. Yeah.
So, you know how I said earlier that
machine learning algorithms find
"if" statements for you?
So they actually, that's actually not what's happening.
In reality, this algorithm is finding
floating point values for you.
So, the "if" statements are fixed
and the floating point values change
based on which data you trained it with.
So, the values of these numbers here
will determine, will be combined
with your new data that you
haven't seen before and will help you,
will help the support vector machine
determine what class it's in.
So ok, so you've got it,
our support vector machine trained.
So now what we have to do
is go back and start solving our problem,
which is, the first problem we're trying to solve
is just given a section of image,
is it a cat head? So what we're gonna do
all we have to do here, is take this image
that we're gonna try to classify,
we extract the HOG descriptor from it,
just like we did with the training image.
Then, we just call svm.predict with that descriptor.
And that result will be one or a negative one
and that's just, that's not a label,
that's actually just a prediction
of what the support vector machine thinks
the image is. So then, if it's one, we'll say,
"True, it is a cat." And if it's not, False, it's not a cat.
Alright, so great, we've [bought] this function
that will take a section and tell us
if it's a cat head or not.
So now, we can go one level up
and finally answer our original question,
which is, given an image,
does it contain a cat?
So what we do, is we take this function
that will classify individual sections
and we run it on all the sections in the image.
So we're testing windows at all different
locations and scales throughout the image.
And then after that, around cat heads
we'll have a bunch of overlapping detections,
so we're gonna combine all those
and also weed out, in the process
of doing that, we're gonna weed out
any detections that, that dont have
enough detections. They're probably
false positives. So, not really cats.
So yeah, and then that's it.
That will answer our original question.
So, let's go to demo.
Ok, so we have Kittydar.
I actually did this about a year ago,
but this will run this algorithm that we just made.
So we're gonna try it out. We're gonna
give it a cat picture. Ok, and you see what it did,
was it did that thing where it detected
at different windows and then it combined those overlaps.
So bam! It got, there's a cat. [applause]
We're gonna try another one. Yay!
Ok, we're gonna try another one.
You'll see the limitations of it.
So, you can see here
that it got two cats, but there are 4 cats there.
It didn't get the two on the sides.
It didn't get the one on the left.
Yeah, it's left for you. Ok, didn't get on
the left because we didn't train
the classifer to detect cats that are facing the other way.
We just trained it on cats that are facing the camera directly.
And then, the one on the right has an abnormal shape. [laughter]
So, it's not, and it's also a little tilted, which will make it harder as well.
But still, it works pretty well. So that's pretty cool.
And I have to mention one other thing that I did.
This is running in Webpage
but we have to run it on Web Worker,
so in a separate process.
This stuff we're doing is serious number crunching.
It's basically the equivalent of doing
a while-true loop for a few seconds
on your page, so you really want to,
you really wanna make sure you're
not running that on the main thread
because it will block everything and it will be really bad.
So it's one more thing that you did,
that I did as well.
Alright, so what I think is really awesome about this,
is this is running in the browser.
I just think that's the coolest thing ever.
It's all running on the client's side
and actually, well, we see the url harthur.github.com,
so you know it's static, and it's really cool.
So finally, alright, so,
We just saw that running in JavaScript.
So a few years ago, or several years ago,
this just would not have been possible
in JavaScript and there are a few things
that have made it possible.
So, I guess the big one is just speed.
JavaScript has gotten a lot faster the past few years
and this kind of number crunching
just ran way too slowly in the past.
So, that's one thing. Another big thing
is node.js, which came out a few years ago.
You wouldn't have been able to do
that collection and training stuff
we just did without node.js.
We really need a command line and access to the file system
to do those things. But I also,
I have to note that you don't have to do
each of these steps in the same language.
So, you could do your collection
or training in another language and then,
export your train state, so JSON
or something and then use it from JavaScript, too.
So, that's another option. And actually,
for the collection, I think I used a [INDISTINCT]
script to get Flicker images, cause the API
was easier to use there.
So yeah, you could still mix it up but
node.JS makes it possible to do these steps
in JavaScript if you want to.
Another thing that's helped out a bit is
typed arrays. So when JavaScript-
usually, when you have an array,
you can put anything that you want into it.
You can mix types.
You can have indices where you don't
put anything in it, and then some have strings,
and some you put numbers in,
and some random objects. So, it's very lax.
And with typed arrays, you're telling
JavaScript this array will only contain,
floating point 64 bit numbers,
and so what Java can do
is, it can figure out the offset into memory a lot faster
using that information because
it knows the type of everything in the array.
So, it makes it a lot faster.
In fact, it makes it about two times
faster for training. So instead of it taking
ten hours to train our support vector machine,
it takes five hours, which is a big deal
if you are trying to iterate and figure stuff out.
So, it's one thing. Another thing is that
workers really make this possible
to run this stuff in the browser.
Otherwise, it would, there is no way
you'd want to run it in the browser.
It would just block everything. So there's
some big things that helped out a lot.
And as you can see, it's clearly possible
to do it now. But there's still stuff that JavaScript is lacking.
JavaScript is definitely not your typical
language you use to do machine learning.
Things like Python, languages like Python
have a ton of really good building block libraries like Mat Lab
and stuff like that, which JavaScript is lacking.
Matrix Math, Library Statistics, Image Processing, that kind of stuff
is just not quite there yet in JavaScript.
So that's one thing that still really needs to happen.
But also in a, also it'd be nice if it were
even faster, so it's cool.
But yeah, let's see here.
But things are still happening though.
There's some libraries out there.
There's neural networks, there's support vector machines,
clustering, and the natural language
processing library, which I really like,
called Natural Node,
that also has a Bayesing classifier, as well as,
doing natural language processing,
which is, kinda similar to machine learning, in some cases.
So I wrote the neural network and
the clustering a few years ago.
Also Andre [Carpathy] has been
contributing a lot of libraries, too
including the support vector machine implementation,
which we just used, so that's cool.
And there's some applications in JavaScript, too,
using machine learning. My favorite one
is this real-time face detection.
So this is face detection that runs really quickly.
So, that means that you can do it on
every frame of a video, like an HTML5 video.
So that is super cool. There's also
a hand and eye detection library
and the first machine learning
I heard of in JavaScript was this OCR,
Captcha solving algorithm, which is pretty cool.
And that uses neural networks
to craft Captchas.
So it'll be really interesting to see
where machine learning will go in JavaScript.
It's hard to tell right now, but I hope that people will try it out, [sure].
And finally, I want to talk about
if you guys are interested in machine learning.
So, machine learning doesn't always look like this.
There's not always this classification, you're not always doing classifications.
Sometimes, you're predicting continuous value,
like temperature, stock price.
Sometimes, you're not doing, you don't have to do
this training stuff at all.
Sometimes, you're not using labeled data.
So, there's a huge variety of things
that machine learning does.
So just be aware of that, too.
This is just kinda like a small example.
And also, there is a Stanford, a free,
online, Stanford machine learning class.
And that's actually starting on Aprill 22nd,
so that's in a few weeks. And that's hosted
on Coursera. I actually took it a year ago
and it was really good and it really helped
fill in a lot of holes for me.
So, definitely check it out.
And finally, what really I think I learn
the most from is just starting with
a problem and going from there.
So, when I was working on Kittydar,
I just kind of, well, first thing I just
searched for "cat detection" and then
from there, found research papers, I read them, when I didn't
understand something, I looked it up
on Wikipedia and then they linked
to other papers, and read those,
and I actually ended up learning
a ton of stuff just from doing that.
So much stuff, I never thought I'd know anything about.
So there is, yeah, there's a lot of
value in that. I really encourage you to,
if you're facing a problem, you think you
can use machine learning on,
just go for it
and you'll end up learning a lot.
I think that our brains are kind of like
these machine learning algorithms, where
you can just kind of throw fancy words and
research papers at it and eventually, it will
just figure things out. [giggle, laughter]
That's what I found, at least. So, yeah anyways.
So try it out. It's my best suggestion.
[image source: Sandra-honestly on deviantart]
Ok, thanks. [applause]
[GITHUB PRESENTS Passion Projects Heather Arthur]
Hi guys, welcome back. I hope you're
enjoying all of the yummy snacks
that my wonderful co-workers
provided for us. Pretty awesome.
That was not a plug, I swear.
So Heather just gave an amazing talk
and it was probably a lot different from
Rachel's talk last week, which was more
sort of like high-level and organizational,
whereas Heather's was super technical.
And so I'm just kinda gonna jump in
and ask you the question.
How did you get into programming?
[Heather] I got into programming in high school.
There was this three week seminar
period class in high school where
we could take one of five classes
or something, and my friend Alyssa
and I decided to take
this game programming class.
And it was just kind of random.
We were just like, "Oh games, that sounds kinda cool."
And so we signed up for it and
just the first day of class, you know,
I saw what the code looked like and I saw
what it involved finally for the first time,
I had no idea what it entailed --
[Julie] - It was exactly like the movie Hackers, right?
[laughter] Yeah, stuff streaming down
your screen. No, but I just
saw the code and I was just like,
"Yeah, this is definitely what I wanna do.
This is awesome." And then
ever since then, I just, you know.
I basically started in school,
and then I kept on taking classes,
and then majored in it in college,
and then, and so on from there.
[Julie] That's really awesome.
You have a really awesome story
in that it's, also, it's completely different
from Rachel's, who learned to program
kind of after college. You went to school
for a CS degree at Carnegie Mellon and
you continued straight out of college
but I think it's really interesting
kind of the way that you found Mozilla.
Did you kinda wanna talk about that?
[Heather] Yeah, so I work at Mozilla now
and I've worked there ever since college.
And actually I found it, I was at my career fair
at my school and I'd known that Mozilla,
I mean, I'd used Firefox two years,
I knew about Mozilla but I didn't know
that they actually hired people.
So when I saw their booth [laughing]
at my college's career fair, I was like,
"Oh my gosh." They, you know, I went and talked to them.
And I, I probably, I dunno,
I think I sounded like a total idiot.
But I guess they liked me, alright.
And then I became an intern there,
and so interned there one summer,
and then, when I graduated, I started
working there because I loved it so much.
[Julie] And you've been there ever since?
[Heather] Yeah, I've been there ever since.
[Julie] What's it like to work at an open source company?
[Heather] It's awesome. I don't have anything
to compare it to. [laughing]
It's really awesome, I don't think I could go
back to proprietary unless I had to.
It's really cool to be able to talk about
what you're working on and not
have to think, "Oh, maybe I can't talk about that, or something."
And I can release, yeah
everybody can see my code.
The best part is getting contributors, too.
You'll be working really hard
with your team and there's these bugs
that are falling by the way side,
nobody's fixing them and a contributor
will come along and fix
this really critical paper cut bug
and, it's just like the most awesome thing.
[Julie] It's pretty awesome.
So GitHub obviously,
has a lot of open source projects and
we asked, I got the privilege of
talking to Heather before her talks,
so I cheated a little bit and I got to hear
her story a little bit and how she
got into programming, obviously, and
also open source but we talked
a little bit about sort of the role
GitHub plays in open source and
what it was like to sort of transition
into using a tool like GitHub.
[Heather] Yeah, it was awesome.
I actually, well, I have been doing
open source a few years now.
I'd been doing it before GitHub was around.
And I was hosting my code on Google code
and stuff like that. I definitely wasn't
getting any contributions. I don't know
if people saw it even. But definitely
after GitHub, when I started
putting my stuff on GitHub,
I got so many pull requests and issues
and people saw my code. It's just great.
[Julie] The best Christmas presents,
basically. I think pull requests make
the best Christmas presents.
[Heather] Yeah, definitely. [laughing]
[Julie] Yeah, so yeah, GitHub has been awesome.
And also Git, too. It's helped. Absolutely.
Very essential.
[Julie] If you, if anyone has, I kinda planned to
make this more into a discussion,
so if anyone has any questions,
throw your hands up. Don't be shy.
There's a question. Hi Steven.
[Steven] So, did you study mathematics
and statistics as part of your college track?
Or is that something that you
started learning afterwards in order
to use that to solve problems?
[Heather] I did, I was a Computer Science major and I was a Math minor
but the Math I was doing was
this crazy stuff, Ring theory
and stuff so it wasn't really very practical.
For instance, we had the option to take
a matrix algebra class, or linear algebra.
Linear was really theoretical.
And I really could've used
this matrix stuff, for the stuff
I was doing here on the talk.
But I didn't take that class.
I took the really theoretical linear
algebra class that was really abstract.
So, I really kinda skipped over that
practical Math and Statistics stuff actually.
So, I really learned that through this.
And another thing that I learned,
I don't know if you guys found
this yourselves but when I was
in school taking Trigonometry classes,
Trigonometry was the subject where
I was like, "Ok, this is like, I'm never
gonna use this." You know, the sine angles
and stuff. But for doing the cat detection,
I had to figure out how to rotate
the image the right way and I was like,
"Wow, I'm actually using this stuff."
And I had to really re-learn that.
Yeah, basically, a lot of it I just learned
from figuring out problems. Yeah.
Even though I did take classes
on related stuff.
[Julie] So you studied CS in school
and I'm always really curious to hear
how you chose a language,
or how you chose what you would
work on next, 'cause you, obviously,
you code a lot in Javascript and you also write
[CUE2] and HTML and CSS.
[Heather] Yeah, so I guess my job has been
JavaScript programming for so long.
And I really, I mean, I have to
write JavaScript for my job.
Firefox is written and the parts that
I work on are written in JavaScript.
And sometimes actually platform stuff,
which is C++.
So right now, it's really about
what I have to use. I can't just call out
to Ruby from Firefox or something,
or people would be very mad at me. [laughing]
But so, that's part of it. But also,
I do have a lot of side projects and
they're mainly JavaScripts.
I really like doing them JavaScripts
so that they're in the browser
'cause I love it when things
are in the browser. But also,
I really, I love Node.JS too a lot.
It's been, made it really easy to make
command line scripts
and one-off things, too.
[Julie] So, Kittydar wasn't your first
open source project obviously, right?
[Heather] No, definitely not. That was,
Kittydar was just a demo
to explain machine learning.
But I think my first, I think the first
open source I did was working
as an intern on Mozilla stuff.
But I guess, the first one I did myself
was a Firefox add on that helped
web developers pick colors on their webpage and
save colors and tag them
and stuff like that.
[Julie] That was your first experience
with machine learning, basically?
[Heather] Yeah, actually that was
my first experience with machine
learning, too. I have this problem where
I would be displaying these colors
that [he] collected from this webpage,
I also displayed the color value
on top of this color. So if it was
a light color, I'd wanna display
black text over it and if it was
a dark color, I don't wanna display light,
white text over it.
And there are colors, like bright green
where it's not really immediately clear,
we're like, "Which color do you wanna,
black or white to display over it?"
So I looked up some formulas for doing that and
I found them to be pretty spotty.
So, I just kind of, in the back of my head
I had heard that the words "neural networks"
before and I thought it sounded
super cool. [laughing]
'Cause they do sound really cool.
And I figured out actually, I might
be able to use them for this.
And so I wrote neural networks
in JavaScript and it was a Firefox add-on
so it had to be JavaScript.
And it actually work pretty well
at determining which color to use, black or white over random color.
So that was my first foray.
[Julie] And you still work on open source
projects outside of what you work on
- at Mozilla, or does it matter-- - Yeah, yeah occasionally
whenever I have time. At this point,
I'm really just maintaining open source projects,
which is a lot of work in itself.
I'm trying not to create too many new ones
'cause then I have to maintain them.
[laughing] But yeah, mainly work,
which is also open source.
[Julie] Do we have any other questions?
Yeah, the excited guy.
[guy] Hi, you mentioned Python,
and all those amazing libraries in Python, and last week
we had Mozilla launching S and JS,
which would enable Python [inaudible]
being used on the browser.
Do you think that, what do you think
will be [inaudible] this technology [inaudible] ML in general?
[Heather] Yeah, I think that I don't know yet.
Yeah, I really I have no idea what effect
that stuff will have and how many people
will end up using S and JS and
other languages compiling to JavaScript.
- I can't-- - [Julie] Predict the future. Right now.
[Heather, giggling] Yeah, I can't predict that.
I'm not sure. It's not something I would do right now.
I wouldn't be like, "Alright, I'm gonna use this Python and convert it to LVM
and have it run unscripted, or whatever.
- Yeah, I don't know. - [Julie] The answer. We don't know.
[Heather, laughing] But it will be really
interesting to see and I hope people
try it out for sure. Maybe I'll try it
out at some point.
[Julie] Any other questions? Yes, Tea. Oh, sorry.
[laughing, comments from audience]
Now fight. [more laughing]
[woman] Do you know if citizen science projects
like Galaxy Zoo or Planet Hunters
are using machine learning behind the scenes
because they're collecting so much data
that could be used for people basically
tagging different photos and things like that
- on Facebook? - [Heather] Ok, so...
[Julie] That's a hard question to repeat. She's asking you about specific projects
and whether they're using machine learning.
[Heather] And so was it Galaxy something?
[woman] Like citizen science projects like Galaxy Zoo and space--
Planet Hunters and Galaxy Zoo
and there's a few others out there,
like Protein Folding ones, and all kinds
like that where they're basically
collecting a whole bunch of data
from people who are entering information
about images they're given.
And you think on the back end
they've been collecting that information
and developing machine learning to do it [inaudible]?
[Julie] Yeah, I've actually heard of at least one app
that's actually a game and it's used to detect protein bindings, or something.
And so they're using,
yeah, so there's a lot of really cool things.
I mean that I think is a native iOS app
or something like that, so it's probably
running on what, C, or something. I dunno.
But I don't know of any,
do you know of any others?
[Heather] Yeah, I'm not sure.
But yeah, most likely they are. I think
anytime that you have a lot of data
and you're using it to predict stuff,
it's probably using machine learning.
[Julie] Yeah, definitely. When we talked,
we had some really simple examples
of machine learning and applications
that we all use or sort of, you wanna talk
about those a little bit?
[Heather] Yeah, images I wanna talk
like spam filtering, everybody benefits from that.
And also, I think a lot of recommendation
[inaudible] use machine learning
to recommend things for you based on
what people like you also use.
[Julie] So when Amazon tells you to buy
that horse mask, they're obviously
using some type of machine learning.
[Heather, laughing] Yeah and any time
there's computer [ridge] that uses
a lot of machine learning. So, face detection,
self-driving cars, they're probably using it, too.
[Julie] Laura, you have a question? She's hiding.
I can call people out by name.
[Laura] What advice do you give people who
for people who are getting into programming?
[Julie] The question was what advice
do you have to people who are getting into programming now?
[Heather] I've, it's so,
I've been programming for so long
that I wish I could empathize more.
It's, I think, for me, I came from
an academic background and so I kind of
so you could always do that,
which is just taking online classes, free online classes,
like the machine learning one
I was talking about. Even that one,
you don't have to know programming to take it.
The professor, Andrew Young, is really good
at explaining things and you don't need
to know programming but you might
end up learning some in the meantime.
And also I know I've heard of Code Academy that does
this online JavaScript learning.
And I've heard nothing
but amazing things about that.
People say how easy it is and
how easy is to understand things immediately from that.
So that's another thing I've heard about.
So there is, yeah that's kind of coming
from the schooling kind of background.
But also I think if you can think of
any problem that you just wanna solve,
kinda what I was talking about with machine learning, it really,
once you try to figure something out,
you end up learning so much. And then,
before you know it,
you'll know a lot of programming.
Just by trying to solve something.
[Julie] Any others? Oh! [man] How would you address
the approach that you take when
approaching a machine learning problem
versus traditional programming problems?
How do you make those logical leaps
to find the algorithm you need?
[Heather] So I think he's asking,
how do you approach
machine learning problems differently
from regular problems maybe?
So, I guess, machine learning, well,
I certainly end up doing a lot of research
with the regular program I'm kind of,
I know what steps I'm gonna take,
more or less. I'm like, "Ok, well,
I'm gonna make this object
and then it'll probably do this thing
and it'll communicate this way."
[Julie] There are a lot more conventions and best practices around regular
programming problems, whereas machine learning is still very open,
depending on the problem and, I mean,
there can be an entirely different path [inaudible].
[Heather] Exactly. I think
with machine learning, it's more
you have to do a lot of research.
You're, "How the crap do I do this?"
And then you end up just searching for it.
And also another big thing, is he needs
to figure out where to get your data from
'cause there's always data. That's a big difference, too.
[woman] Hey, I was wondering how
you choose whether to use Webworkers
or if you use a more traditional
RPC Asynchronous framework
and what are the trade-offs that you're finding?
[Heather] She's asking whether, how do you
decide between Web Workers and RPC Asynchronous.
[woman] Yeah, because it seems to me
at least naively, that they have
a lot of the same benefits.
That you can do data computational
tasks or computational heavy tasks
without worrying about it using up
your [data space.]
[Heather] That's interesting, I literally
haven't heard of the RPC Asynchronous stuff.
I'm not exactly sure what
you're talking about actually.
[woman] So you send off an RPC call to a server,
maybe I'm just using the wrong terminology.
[Heather] Oh I see.
[woman] And then you let the server
handle the processing and you send it back.
[Heather] So yeah, it's basically whether
doing it in Web Workers and the browser
versus having it done on the server.
And calling out to, ok, cool.
So there's a big difference there.
So one thing, I actually, I like
doing my webpages all on the client side
if I can and having as little
server side as possible. Not that
I'm saying that's the best thing to do
or anything, but I usually do that
so I prefer using Web Workers to do
something like this.
And also it's just one less network request,
which is cool. And a lot less dependency
on that. So that's the main difference,
I guess. It's just whether you want to do
the computation on the client
or the server and do an extra network request probably.
[Julie] We have two more questions. Anna? Will you go first?
[Anna] What really drives you?
What are you really passionate about?
What makes you get up in the morning
and be excited to go do your job?
[Heather] She's asking what makes me
passionate? [Julie] What drives you
and what makes you get up in the morning?
What are you excited to work on most?
[Heather] I'm really excited,
mainly, about I love my job,
I'm very invested in making
our Firefox developer tools better.
I think it's really important
that we help web developers out.
So, that drives me a lot. I'm like,
"We need to get this feature done
so web developers can use it."
So in that sense, a lot of my motivation
is about just getting stuff done.
Getting particular things out there
so people can use them.
That's the main motivation. Also, some,
I mean, just programming in general
is really fun so that's
a secondary motivation.
It's just, "Oh I get to do fun stuff now." So that's cool.
[Julie] So one of the things,
or one of the reasons we started
Passion Projects originally, was
we wanted to hear from women who
really loved the companies they worked at
and I remember during our interview,
I asked, you know, what is your favorite
thing about working at Mozilla?
And I think you said it was the people.
Do you wanna talk about that?
[Heather] Yeah, definitely, at least
again, I haven't worked at other companies
but at least at Mozilla, everybody there
really cares. It's an open source company.
It's also wholly owned by a non-profit
and we have a really clear mission and
I think that a lot of people that work
at Mozilla are really passionate
about that. So, it makes for really
awesome co-workers and they're also,
you know, they're doing open source
but they're really used to helping out
a lot of people and so they've been
extremely helpful and they're also just really nice.
They're not, they're not
*** basically, which is awesome.
[laughing] So I just love everybody that works at Mozilla.
I've never met somebody that works
at Mozilla that I didn't like a lot.
[Julie] And I think that's another thing
about when you're getting into programming.
Surround yourself with people
who wanna help other people.
I know, I was really lucky. I started
writing code when I was at Yammer and
I had just this incredible network
of people around me who really wanted
to help me learn really quickly. And I got
just pointed in the right direction and
I mean, if you're self-motivated
and you have those types of people
around you. It's the perfect formula
for success, business. Other things.
- [Heather] Yeah. - [Julie] So people and such,
and open source companies I think
attract those kinds of people.
People who wanna build things that make
other people's workloads more efficient
and make them better at what they do.
[Heather] Yeah, totally.
Open source companies are the best.
[Julie] No offense, everyone else.
[laughing] Cool, well, thank you so much
for letting us rack your brain
and also for the amazing talk
on Machine Learning.
We record all our talks,
so we usually post them in the follow up blog post.
So anyone who's watching from home and
maybe missed the beginning or whichever,
we'll catch you up and thank you so much,
Heather, for being a part of Passion Projects.
[Heather] Oh, thank you for doing this. [applause]
[GITHUB PRESENTS Passion Projects Heather Arthur]