Iu Cloud Computing Mooc - Go hadoop 3 - Do - It - Yourself

So now let's get to the hands-on assignment. So I mentioned that Hadoop is this... toolkit for doing lots of great things. So you don't actually have to worry about all the intricacies of scheduling a fault tolerance and things like that. The only thing you have to be aware of is your data, your application and trying to actually map that to the programming model and key/value pairs. [pause] So here is just the classic WordCount example, and here I have my map function and my reduce function. So also when I mention that when we work with WordCount, the keys are actually the offsets of the file, so we actual... really care only about the... words as my values. And the job client, as I mentioned in the previous slides, is actually what runs the job. So here is also configuration parameters that you can tweak. [pause] Stephen had introduced you guys to his framework for... an environment for actually running Hadoop, so at this point, you guys should already be good to go with doing programming stuff. So... I'll assume that. Also I think it's important to note that... y'know, Hadoop is written in Java, but it does enjoy flavors in Python and C... C++... with this thing called Hadoop streaming and the C++... live api. So let's... so I just.. so Stephen's... the difference... so Stephen... has a... his framework is batch mode. Here I want to show you guys an interactive way of programming. So with the interactive way you guys can actually be comfortable with HDFS commands and actually seeing how would you start the name nodes and the data nodes and all that good stuff. So I think I'll... [pause] So in this part... so here I'm just setting up the virtual machine, right? So here I log in as 'root', and the password as 'school2012'. And all of this, of course, is uploaded to the site. So now I... so I prepared this WordCount [unknown] to ease... you in the source code and the input file, which is just a document. And also I have this bill script. This bill script, what it does, essentially it compiles your Java program, and it copies the jar file to the Hadoop bin... so that you don't have to worry about your class path and all that stuff. So here I'm just extracting the WordCount. [pause] So now... I kinda jumped ahead of myself, but I'm compiling it now... using the bill script. [pause] So now that I've compiled that, I'm going to the bin of the Hadoop. So now I actually want to format the HDF... the name node. [pause] So also I mentioned that when you format the name node, you only need to do that once. [pause] So now that that's formatted, I'll just start my HDFS Daemons and my MapReduce Daemons. [pause] (new speaker) So the... oh, sorry. (Jerome Mitchell) Oh, that's fine. (new speaker) So the format on the name node, what does that do? (Jerome Mitchell) So essentially it just wipes the HDFS, and it just formats it, that's all it does. [pause] So when you start the HDFS Daemons and the MapReduce Daemons, it's a good way to check to see if they're running. So here I have the name node, the secondary name node, the data node, and the task tracker and the job tracker all started. So now I'm... wanting to create a directory in HDFS. [pause] So I created a folder on the distributed file system. [pause] So now I'm actually wanting to copy that... the data in the Hadoop WordCount input, copy that onto the HDFS. [pause] So now I'm actually gonna execute the Hadoop. [pause] The WordCount example. [pause] So here the arguments are Hadoop, the jar, the jar of the WordCount, the class, and the input and the output. The... one of the things about the output you don't need to create a... an output folder on HDFS, it's automatically created. [pause] So here... it's showing that it's done, the map phase and now it's doing the reduce operation.