HADOOP – Setting up a Single Node Cluster

 

1. Installing tools

$ sudo apt-get install ssh
$ sudo apt-get install rsync

2. Setup passphraseless ssh

Now check that you can ssh to the localhost without a passphrase:

  $ ssh localhost

If you cannot ssh to localhost without a passphrase, execute the following commands:

  $ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
  $ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

3. Configuration

etc/hadoop/hadoop-env.sh :

# set to the root of your Java installation

export JAVA_HOME=/usr

# NOTE: which java 
# This command display /usr/bin/java. Its actually a link file. 
# However link file path also ok.

# Assuming your installation directory is /usr/USERNAME/install/hadoop
export HADOOP_PREFIX=/home/USERNAME/install/hadoop

Try the following command:

  $ install/hadoop/bin/hadoop
This will display the usage documentation for the hadoop script.
Create the namenode and the datanode folder in $HADOOP_PREFIX folder.
mkdir -p $HADOOP_PREFIX/data/hdfs/namenode

mkdir -p $HADOOP_PREFIX/data/hdfs/datanode

 

Update below files with respective data as shown below:
========================================

 

etc/hadoop/core-site.xml:

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
</configuration>

etc/hadoop/hdfs-site.xml:

<configuration>

   <property>

      <name>dfs.replication</name>
      <value>1</value>
   </property>
   <property>
      <name>dfs.namenode.name.dir</name>
      <value>file:/home/USERNAME/install/hadoop/data/hdfs/namenode</value>
   </property>
   <property>
      <name>dfs.datanode.data.dir</name>
      <value>file:/home/USERNAME/install/hadoop/data/hdfs/datanode</value>
   </property>
</configuration>

etc/hadoop/yarn-site.xml:

<configuration>

   <property>

      <name>yarn.nodemanager.aux-services</name>
      <value>mapreduce_shuffle</value>
   </property>
   <property>
      <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
      <value>org.apache.hadoop.mapred.ShuffleHandler</value>
   </property>
</configuration>

etc/hadoop/mapred-site.xml:

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>

4. Execution

  1. Format the filesystem:
      $ bin/hdfs namenode -format
  2. Start NameNode daemon and DataNode and YARN daemon:
    $ sbin/start-dfs.sh
    $ sbin/start-yarn.sh

    The hadoop daemon log output is written to the $HADOOP_LOG_DIR directory (defaults to $HADOOP_PREFIX/logs).

  3. Browse the web interface for the NameNode; by default it is available at:
  4. Make the HDFS directories required to execute MapReduce jobs:
    $ bin/hdfs dfs -mkdir /user
    $ bin/hdfs dfs -mkdir /user/<username>
    $ bin/hdfs dfs -mkdir /user/<username>/test1
    $ bin/hdfs dfs -mkdir /user/<username>/test1/input
  5. Copy the input files into the distributed filesystem:
      $ bin/hdfs dfs -put path/to/local/file.txt test1/input
  6. Run some of the examples provided:
      $ bin/hadoop jar local/path/to/program.jar JAVA_CLASS_NAME test1/input test1/output
  7. Examine the output files:Copy the output files from the distributed filesystem to the local filesystem and examine them:
      $ bin/hdfs dfs -get output output
      $ cat output/*

    or

    View the output files on the distributed filesystem:

      $ bin/hdfs dfs -cat output/*
  8. When you’re done, stop the daemons with:
    $ sbin/stop-dfs.sh
    $ sbin/stop-yarn.sh

Thanks to http://www.thecloudavenue.com/2012/01/getting-started-with-nextgen-mapreduce.html

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: