Steps to run first Hadoop job

Its a very very rough document for running a Hadoop MapReduce job. Its intended for my personal reference only.

==================================================================

CONFIGURATIONS

Hadoop Version : 2.5.0

Hadoop installation directory : /usr/local/hadoop

Hadoop installation procedure : http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html

JAVA Version: java version “1.7.0_65”
OpenJDK Runtime Environment (IcedTea 2.5.1) (7u65-2.5.1-4ubuntu1~0.14.04.2)
OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode)

JAVAC Version : javac 1.7.0_65

My Java installation directory : /usr/lib/jvm/java-7-openjdk-amd64/jre/bin/java
Java Tools path : /usr/lib/jvm/java-7-openjdk-amd64/lib/tools.jar

Example program taken from : http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html

==================================================================

STEP – 1

export HADOOP_CLASSPATH=/usr/lib/jvm/java-7-openjdk-amd64/lib/tools.jar

env | HADOOP_CLASSPATH

STEP – 2 : Compiling

hadoop@hadoop-OptiPlex-960:/usr/local/hadoop$ bin/hadoop com.sun.tools.javac.Main ~/Documents/Bkup/Hadoop-Examples/WordCount/WordCount.java
hadoop@hadoop-OptiPlex-960:~/Documents/Bkup/Hadoop-Examples/WordCount$ jar cf wc.jar WordCount*.class
hadoop@hadoop-OptiPlex-960:~/Documents/Bkup/Hadoop-Examples/WordCount$ ls -lh
total 28K
-rw-rw-r– 1 hadoop hadoop 22 Sep 3 23:33 file01
-rw-rw-r– 1 hadoop hadoop 27 Sep 3 23:34 file02
-rw-rw-r– 1 hadoop hadoop 3.0K Sep 3 23:32 wc.jar
-rw-rw-r– 1 hadoop hadoop 1.5K Sep 3 23:31 WordCount.class
-rw-rw-r– 1 hadoop hadoop 1.7K Sep 3 23:31 WordCount$IntSumReducer.class
-rw-rw-r– 1 hadoop hadoop 2.1K Sep 3 23:18 WordCount.java
-rw-rw-r– 1 hadoop hadoop 0 Sep 3 23:17 WordCount.java~
-rw-rw-r– 1 hadoop hadoop 1.7K Sep 3 23:31 WordCount$TokenizerMapper.class

STEP – 3 : Input files
hadoop@hadoop-OptiPlex-960:~/Documents/Bkup/Hadoop-Examples/WordCount$ echo Hello World Bye World > file01
hadoop@hadoop-OptiPlex-960:~/Documents/Bkup/Hadoop-Examples/WordCount$ echo Hello Hadop Goodbye Hadoop > file02

hadoop@hadoop-OptiPlex-960:~/Documents/Bkup/Hadoop-Examples/WordCount$ cat file01
Hello World Bye World

hadoop@hadoop-OptiPlex-960:~/Documents/Bkup/Hadoop-Examples/WordCount$ cat file02
Hello Hadop Goodbye Hadoop

STEP – 4 : DFS initial structure
hadoop@hadoop-OptiPlex-960:/usr/local/hadoop$ bin/hdfs dfs -ls -R /
drwxr-xr-x – hadoop supergroup 0 2014-09-03 23:11 /user
drwxr-xr-x – hadoop supergroup 0 2014-09-03 23:12 /user/hadoop
drwxr-xr-x – hadoop supergroup 0 2014-09-03 23:12 /user/hadoop/test1
drwxr-xr-x – hadoop supergroup 0 2014-09-03 23:12 /user/hadoop/test1/input

STEP – 5 : Copying input files to DFS
hadoop@hadoop-OptiPlex-960:/usr/local/hadoop$ bin/hdfs dfs -put ~/Documents/Bkup/Hadoop-Examples/WordCount/file* /user/hadoop/test1/input
hadoop@hadoop-OptiPlex-960:/usr/local/hadoop$ bin/hdfs dfs -ls -R /
drwxr-xr-x – hadoop supergroup 0 2014-09-03 23:11 /user
drwxr-xr-x – hadoop supergroup 0 2014-09-03 23:12 /user/hadoop
drwxr-xr-x – hadoop supergroup 0 2014-09-03 23:52 /user/hadoop/test1
drwxr-xr-x – hadoop supergroup 0 2014-09-03 23:51 /user/hadoop/test1/input
-rw-r–r– 1 hadoop supergroup 22 2014-09-03 23:40 /user/hadoop/test1/input/file01
-rw-r–r– 1 hadoop supergroup 27 2014-09-03 23:40 /user/hadoop/test1/input/file02
hadoop@hadoop-OptiPlex-960:/usr/local/hadoop$ bin/hdfs dfs -cat /user/hadoop/test1/input/file01
Hello World Bye World
hadoop@hadoop-OptiPlex-960:/usr/local/hadoop$ bin/hdfs dfs -cat /user/hadoop/test1/input/file02
Hello Hadop Goodbye Hadoop

STEP – 6 : Executing a HADOOP Job
hadoop@hadoop-OptiPlex-960:/usr/local/hadoop$ bin/hadoop jar ~/Documents/Bkup/Hadoop-Examples/WordCount/wc.jar WordCount /user/hadoop/test1/input /user/hadoop/test1/output

NOTE:    /user/hadoop/test1/output   directory did not exist till STEP – 5. Its created by MapReduce Job in STEP – 6.
hadoop@hadoop-OptiPlex-960:/usr/local/hadoop$ bin/hdfs dfs -ls -R /
drwxr-xr-x – hadoop supergroup 0 2014-09-03 23:11 /user
drwxr-xr-x – hadoop supergroup 0 2014-09-03 23:12 /user/hadoop
drwxr-xr-x – hadoop supergroup 0 2014-09-03 23:52 /user/hadoop/test1
drwxr-xr-x – hadoop supergroup 0 2014-09-03 23:51 /user/hadoop/test1/input
-rw-r–r– 1 hadoop supergroup 22 2014-09-03 23:40 /user/hadoop/test1/input/file01
-rw-r–r– 1 hadoop supergroup 27 2014-09-03 23:40 /user/hadoop/test1/input/file02
drwxr-xr-x – hadoop supergroup 0 2014-09-03 23:52 /user/hadoop/test1/output
-rw-r–r– 1 hadoop supergroup 0 2014-09-03 23:52 /user/hadoop/test1/output/_SUCCESS
-rw-r–r– 1 hadoop supergroup 49 2014-09-03 23:52 /user/hadoop/test1/output/part-r-00000

 

STEP – 7 : Verifying Job Result
hadoop@hadoop-OptiPlex-960:/usr/local/hadoop$ bin/hdfs dfs -cat /user/hadoop/test1/output/part-r-00000
Bye 1
Goodbye 1
Hadoop 1
Hadop 1
Hello 2
World 2

 

Hurray…! Thats it.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: