Tip:
Highlight text to annotate it
X
So Hadoop actually imitates a lot of the...
Linux file systems commands, so... like 'make directory',
and 'list', and all of that is pretty much the same.
[pause]
So, at this point does everyone have this done? Okay. Okay.
(new speaker) Can you just show your... TextEdit file?
(Jerome Mitchell) Okay.
Because it's not in your... powerpoint... on this one website.
(Jerome Mitchell) Okay... you... I mean, is this...
(new speaker) Number 7, you can't... really...
(Jerome Mitchell) ... you can't see this, though? Is this not clear?
(new speaker) What I'm saying is, so you've got
more steps after 7?
(Jerome Mitchell) No!
(new speaker) So we don't create an Input Directory?
(Jerome Mitchell) Oh, I'm sorry. Oh, okay. So....
(new speaker) I think it's back in the presentation...
(Jerome Mitchell) Lemme see here. Okay, there we go.
Okay. Sorry about that.
[pause]
The fourth, okay... that's just Hadoop namenode –format.
[pause]
Okay, so I think there's probably a general
consensus on everyone having this down,
but if not you can... the presentation is available,
so you can just download it and do it independently.
So after you've done the MapReduce program, just...
be able to validate or verify that the output is there by just listing
the output. And you can actually see the results from that... from this.
So I just have a few caveats that I want to talk about...
with Hadoop. The good, bad, and ugly.
So we know that Hadoop is good, this is why we're all here.
We know that Hadoop is not bad, else we wouldn't be here.
But one of the things that's sometimes ugly with Hadoop
is that the... it's kind of a out-of-the-box configuration,
it's not as friendly. So what do I mean by that? I mean...
the configuration files, you almost have to play with that,
and it's often hard to debug, as in many... parallel...
in parallel computing. And also tuning and optimizing
is somewhat of a black magic. But also I want to talk about this thing,
or how do you configure the number of maps and reducers?
So generally the number of mappers you choose is really dependent
on your input size, and the number of blocks that you have.
So the default is 2, but it's recommended that you have...
I remember reading that you... to have anywhere between 10 and 100.
And you can follow this... formula for that.
And so I think now we'll show you how to deploy this on...
the Cloud environment... with Stephen. Stephen will help with that.
[pause]
Thank you.