Steps to install Hadoop 2.x release (Yarn or Next-Gen) on single node cluster setup

Hadoop 2.x release involves many changes to Hadoop and MapReduce. The centralized JobTracker service is replaced with a ResourceManager that manages the resources in the cluster and an ApplicationManager that manages the application lifecycle. These architectural changes enable hadoop to scale to much larger clusters. For more details on architectural changes in Hadoop next-gen (a.k.a. Yarn), watch this video or visit this blog.

This post concentrates on installing Hadoop 2.x a.k.a. Yarn a.k.a. next-gen on a single-node cluster.

Prerequisites:

  • Java 6 installed
  • Dedicated user for hadoop
  • SSH configured

Steps to install Hadoop 2.x:

1. Download tarball

You can download tarball for hadoop 2.x from here. Extract it to a folder say, /home/hduser/yarn. We assume dedicated user for Hadoop is “hduser”.

$ cd /home/hduser/yarn
$ sudo chown -R hduser:hadoop hadoop-2.0.1-alpha

2. Setup Environment Variables

$ export HADOOP_HOME=$HOME/yarn/hadoop-2.0.1-alpha
$ export HADOOP_MAPRED_HOME=$HOME/yarn/hadoop-2.0.1-alpha
$ export HADOOP_COMMON_HOME=$HOME/yarn/hadoop-2.0.1-alpha
$ export HADOOP_HDFS_HOME=$HOME/yarn/hadoop-2.0.1-alpha
$ export HADOOP_YARN_HOME=$HOME/yarn/hadoop-2.0.1-alpha
$ export HADOOP_CONF_DIR=$HOME/yarn/hadoop-2.0.1-alpha/etc/hadoop

This is very important as if you miss any one variable or set the value incorrectly, it will be very difficult to detect the error and the job will fail.

Also, add these to your ~/.bashrc or other shell start-up script so that you don’t need to set them every time.

3. Create directories

Create two directories to be used by namenode and datanode.

$ mkdir -p $HOME/yarn/yarn_data/hdfs/namenode
$ mkdir -p $HOME/yarn/yarn_data/hdfs/datanode

4. Set up config files

$ cd $YARN_HOME

Add the following properties under configuration tag in the files mentioned below:

etc/hadoop/yarn-site.xml:

<property>
   <name>yarn.nodemanager.aux-services</name>
   <value>mapreduce_shuffle</value>
</property>
<property>
   <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
   <value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>

etc/hadoop/core-site.xml:

<property>
   <name>fs.default.name</name>
   <value>hdfs://localhost:9000</value>
</property>

etc/hadoop/hdfs-site.xml:

 <property>
   <name>dfs.replication</name>
   <value>1</value>
 </property>
 <property>
   <name>dfs.namenode.name.dir</name>
   <value>file:/home/hduser/yarn/yarn_data/hdfs/namenode</value>
 </property>
 <property>
   <name>dfs.datanode.data.dir</name>
   <value>file:/home/hduser/yarn/yarn_data/hdfs/datanode</value>
 </property>

etc/hadoop/mapred-site.xml:

If this file does not exist, create it and paste the content provided below:

<?xml version="1.0"?>
<configuration>
   <property>
      <name>mapreduce.framework.name</name>
      <value>yarn</value>
   </property>
</configuration>

5. Format namenode

This step is needed only for the first time. Doing it every time will result in loss of content on HDFS.

$ bin/hadoop namenode -format

6. Start HDFS processes

Name node:

$ sbin/hadoop-daemon.sh start namenode
starting namenode, logging to /home/hduser/yarn/hadoop-2.0.1-alpha/logs/hadoop-hduser-namenode-pc3-laptop.out
$ jps
18509 Jps
17107 NameNode

Data node:

$ sbin/hadoop-daemon.sh start datanode
starting datanode, logging to /home/hduser/yarn/hadoop-2.0.1-alpha/logs/hadoop-hduser-datanode-pc3-laptop.out
$ jps
18509 Jps
17107 NameNode
17170 DataNode

7. Start Hadoop Map-Reduce Processes

Resource Manager:

$ sbin/yarn-daemon.sh start resourcemanager
starting resourcemanager, logging to /home/hduser/yarn/hadoop-2.0.1-alpha/logs/yarn-hduser-resourcemanager-pc3-laptop.out
$ jps
18509 Jps
17107 NameNode
17170 DataNode
17252 ResourceManager

Node Manager:

$ sbin/yarn-daemon.sh start nodemanager
starting nodemanager, logging to /home/hduser/yarn/hadoop-2.0.1-alpha/logs/yarn-hduser-nodemanager-pc3-laptop.out
$jps
18509 Jps
17107 NameNode
17170 DataNode
17252 ResourceManager
17309 NodeManager

Job History Server:

$ sbin/mr-jobhistory-daemon.sh start historyserver
starting historyserver, logging to /home/hduser/yarn/hadoop-2.0.1-alpha/logs/yarn-hduser-historyserver-pc3-laptop.out
$jps
18509 Jps
17107 NameNode
17170 DataNode
17252 ResourceManager
17309 NodeManager
17626 JobHistoryServer

8. Running the famous wordcount example to verify installation

$ mkdir in
$ cat > in/file
This is one line
This is another one

Add this directory to HDFS:

$ bin/hadoop dfs -copyFromLocal in /in

Run wordcount example provided:

$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.*-alpha.jar wordcount /in /out

Check the output:

$ bin/hadoop dfs -cat /out/*
This 2
another 1
is 2
line 1
one 2

9. Web interface

Browse HDFS and check health using http://localhost:50070 in the browser:

You can check the status of the applications running using the following URL:

http://localhost:8088

10. Stop the processes

$ sbin/hadoop-daemon.sh stop namenode
$ sbin/hadoop-daemon.sh stop datanode
$ sbin/yarn-daemon.sh stop resourcemanager
$ sbin/yarn-daemon.sh stop nodemanager
$ sbin/mr-jobhistory-daemon.sh stop historyserver

Happy Coding!!!

About these ads

61 thoughts on “Steps to install Hadoop 2.x release (Yarn or Next-Gen) on single node cluster setup

  1. Pingback: Steps to install Hadoop 2.x release (Yarn or Next-Gen) on multi-node cluster | Rasesh Mori

  2. Thanks a lot! This is the best resource I found for latest Hadoop release. Others were outdated. Faced an issue that hostname is not found when formatting data node. Using other internet resources found that “hostname -fsqn” is giving error. So used yast>network settings>hostname/dns>assign hostname to loopback ip – checked the box. Took a long time to find it, as I am a linux new bie.

  3. Hi,

    Nice. I am able to run jobs on my local machine.

    I am just wondering where are the job history files located. Do you have idea about this. Can this location configured in the hadoop 2.0.

  4. Hi ,

    When I am running bin/hadoop namenode -format
    I am getting following error:

    hduser@ubuntu:~/lab/install/hadoop-2.0.4-alpha$ bin/hadoop namenode -format
    bin/hadoop: line 26: /home/hduser/lab/install/hadoop-2.0.4-alpha/bin/../libexec/hadoop-config.sh: No such file or directory
    DEPRECATED: Use of this script to execute hdfs command is deprecated.
    Instead use the hdfs command for it.

    /home/hduser/lab/install/hadoop-2.0.4-alpha/bin/hdfs: line 34: /home/hduser/lab/install/hadoop-2.0.4-alpha/bin/../libexec/hdfs-config.sh: No such file or directory
    /home/hduser/lab/install/hadoop-2.0.4-alpha/bin/hdfs: line 150: cygpath: command not found
    /home/hduser/lab/install/hadoop-2.0.4-alpha/bin/hdfs: line 191: exec: : not found

  5. Pingback: » Hadoop 2.1.0 beta and Raspberry Pi Y12 Studio

  6. First of thanks a lot Rasesh for wonderful tutorial. With the help of it, I configured Hadoop 2.1.0 Beta version in first shot. All components are up and running.
    nayan@nayan-desktop:~/dev$ jps
    11426
    1546 ResourceManager
    2038 JobHistoryServer
    2100 Jps
    1365 DataNode
    1275 NameNode
    1792 NodeManager

    But when I try to create a directory, it is failing.

    nayan@nayan-desktop:~/dev$ hdfs dfs -mkdir /home/nayan/hdfs/sample
    13/08/30 17:07:03 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
    mkdir: `/home/nayan/hdfs/sample': No such file or directory

    Can you please tell what I am doing wrong?

    Thanks,
    Nayan

  7. Great tutorial, thanks for sharing.

    As of Hadoop 2.2.0 it looks like two things need to change in the yarn-site.xml configuration file (mapreduce.shuffle is now mapreduce_shuffle). Mine looks like this now:

    yarn.nodemanager.aux-services
    mapreduce_shuffle

    yarn.nodemanager.aux-services.mapreduce_shuffle.class
    org.apache.hadoop.mapred.ShuffleHandler

    if I don’t do this, I see the following error:

    2013-10-14 22:26:21,228 FATAL org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting NodeManager
    java.lang.IllegalArgumentException: The ServiceName: mapreduce.shuffle set in yarn.nodemanager.aux-services is invalid.The valid service name should only contain a-zA-Z0-9_ and can not start with numbers

  8. urg, my copy/paste was botched up but hopefully this still makes sense. The value mapreduce.shuffle is now mapreduce_shuffle and the name yarn.nodemanager.aux-services.mapreduce.shuffle.class is now yarn.nodemanager.aux-services.mapreduce_shuffle.class

  9. Good Writeup..thanks..but i have one problem..everything works fine except that jobs dont show up on localhost:8088. The job list is empty…and when i submit any job the log shows it is using LocalJobRunner..any idea where i could be going wrong..
    Thanks in advance

  10. Pingback: HowTo: Setup Hadoop 2.X on Linux-Ubuntu (Single Node) | techo byte

  11. Pingback: Hadoop 2 Nodemanager not startingQueryBy | QueryBy, ejjuit, query, query by, queryby.com, android doubt, ios question, sql query, sqlite query, nodejsquery, dns query, update query, insert query, kony, mobilesecurity, postquery, queryposts.com, sapquery,

  12. Pingback: What hadoop can and can’t do. | 做人要豁達大道

  13. Hi Rasesh
    Thanks for this wonderful tutorial.. It was really very helpful. I did all setup and when I am trying to run a job I am getting below error. I am not sure what is an issue. Can you help me finding the possible reason?

    13/11/04 15:54:59 INFO Configuration.deprecation: mapreduce.reduce.class is deprecated. Instead, use mapreduce.job.reduce.class
    13/11/04 15:54:59 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
    13/11/04 15:54:59 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
    13/11/04 15:54:59 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
    13/11/04 15:54:59 INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
    13/11/04 15:54:59 INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
    13/11/04 15:54:59 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1383608961434_0003
    13/11/04 15:54:59 INFO impl.YarnClientImpl: Submitted application application_1383608961434_0003 to ResourceManager at /0.0.0.0:8032
    13/11/04 15:54:59 INFO mapreduce.Job: The url to track the job: http://Office-Mac-2.local:8088/proxy/application_1383608961434_0003/
    13/11/04 15:54:59 INFO mapreduce.Job: Running job: job_1383608961434_0003
    13/11/04 15:55:02 INFO mapreduce.Job: Job job_1383608961434_0003 running in uber mode : false
    13/11/04 15:55:02 INFO mapreduce.Job: map 0% reduce 0%
    13/11/04 15:55:02 INFO mapreduce.Job: Job job_1383608961434_0003 failed with state FAILED due to: Application application_1383608961434_0003 failed 2 times due to AM Container for appattempt_1383608961434_0003_000002 exited with exitCode: 127 due to: Exception from container-launch:
    org.apache.hadoop.util.Shell$ExitCodeException:
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
    at org.apache.hadoop.util.Shell.run(Shell.java:379)
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
    at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)

      • Hi Lars,

        I found a problem in one of the hadoop environment variables set in .bash_profile file. I am not sure if its the same case for you. for reference I am giving the content of my file. Hope it will help

        export JAVA_HOME=$(/usr/libexec/java_home)
        export HADOOP_MAPRED_HOME=/Volumes/MacNewHD/Hadoop/hadoop-2.2.0/
        export HADOOP_COMMON_HOME=/Volumes/MacNewHD/Hadoop/hadoop-2.2.0/
        export HADOOP_HDFS_HOME=/Volumes/MacNewHD/Hadoop/hadoop-2.2.0/
        export YARN_HOME=/Volumes/MacNewHD/Hadoop/hadoop-2.2.0/
        export HADOOP_YARN_HOME=/Volumes/MacNewHD/Hadoop/hadoop-2.2.0/
        export HADOOP_CONF_DIR=/Volumes/MacNewHD/Hadoop/hadoop-2.2.0/etc/hadoop/
        export YARN_CONF_DIR=$HADOOP_CONF_DIR
        export PATH=/opt/subversion/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/binexport:$JAVA_HOME/bin:

      • Thanks Vivek! I solved it! I had some leftover configurations from other tutorials, so after replacing my config files with the original ones and making the changes explained in this tutorial it was fixed.

  14. Thanx a lot, this was super helpful!
    One minor change is that the config for nodemanager in yarn-site.xml seems to have ‘_’ instead of ‘.’ in it:

    yarn.nodemanager.aux-services
    mapreduce_shuffle

    NodeManager was failing to start without it.

  15. Hi,

    I am getting the following error while starting the nodemanager, due to this Nodemanager process killed. Should I need to configure any entry in etc/hosts file.

    013-11-08 16:46:49,701 INFO org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8031
    2013-11-08 16:46:53,738 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
    2013-11-08 16:46:56,742 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
    2013-11-08 16:47:00,750 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
    2013-11-08 16:47:03,758 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
    2013-11-08 16:47:07,766 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
    2013-11-08 16:47:11,774 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
    2013-11-08 16:47:14,782 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
    2013-11-08 16:47:18,790 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
    2013-11-08 16:47:22,798 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
    2013-11-08 16:47:25,806 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 9 time(s); retry policy is RetryUpToMa

    Thanks
    Balan.P

  16. Pingback: Yarn, installation | Here

  17. Hi All,
    Im getting the classNotFound issue in all the executions. i doubt Im missing something very basic(Hadoop 2.2 on windows cygwin).
    I found comments mentioning to add -classpath “$(cygpath -pw “$CLASSPATH”)” in some place(Any idea where exactly??).

    Thanks,
    Niteesh

  18. Great, very helpful, a very very very big thank you !
    I am currently reading Hadoop in Action book and was unable to run hadoop on my ubuntu. But, after I followed your configurations I was able to run Hadoop successfully …. !

  19. how can solve that is error

    http://localhost:8088/proxy/application_1384700761740_0001/

    HTTP ERROR 500

    Problem accessing /proxy/application_1384700761740_0001/. Reason:

    org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application with id ‘application_1384700761740_0001′ doesn’t exist in RM.
    Caused by:

    java.io.IOException: org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application with id ‘application_1384700761740_0001′ doesn’t exist in RM.
    at org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet.doGet(WebAppProxyServlet.java:341)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
    at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
    at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
    at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:66)
    at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
    at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
    at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
    at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
    at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
    at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
    at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
    at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
    at org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
    at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
    at org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:1081)
    at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
    at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
    at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
    at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
    at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
    at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
    at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
    at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
    at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
    at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
    at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
    at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
    at org.mortbay.jetty.Server.handle(Server.java:326)
    at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
    at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
    at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
    at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
    at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
    at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
    at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
    Caused by: org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application with id ‘application_1384700761740_0001′ doesn’t exist in RM.
    at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:247)
    at org.apache.hadoop.yarn.server.webproxy.AppReportFetcher.getApplicationReport(AppReportFetcher.java:90)
    at org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet.getApplicationReport(WebAppProxyServlet.java:221)
    at org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet.doGet(WebAppProxyServlet.java:276)
    … 36 more
    Caused by:

    org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application with id ‘application_1384700761740_0001′ doesn’t exist in RM.
    at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:247)
    at org.apache.hadoop.yarn.server.webproxy.AppReportFetcher.getApplicationReport(AppReportFetcher.java:90)
    at org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet.getApplicationReport(WebAppProxyServlet.java:221)
    at org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet.doGet(WebAppProxyServlet.java:276)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
    at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
    at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
    at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:66)
    at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
    at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
    at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
    at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
    at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
    at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
    at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
    at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
    at org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
    at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
    at org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:1081)
    at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
    at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
    at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
    at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
    at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
    at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
    at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
    at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
    at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
    at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
    at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
    at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
    at org.mortbay.jetty.Server.handle(Server.java:326)
    at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
    at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
    at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
    at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
    at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
    at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
    at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)

  20. Hi,
    I have set up Hadoop 2.2.0 on Windows7(64 bit) with Cygwin
    I am able to start dfs and yarn
    but now when trying to run Word Count it fails while launching container.
    It’s something related to classpath issue.
    As I am getting following error
    java.lang.NoClassDefFoundError: org/apache/hadoop/service/CompositeService
    and
    Could not find the main class: org.apache.hadoop.mapreduce.v2.app.MRAppMaster. Program will exit.
    I have set system level env. variables as follows
    HADOOP_COMMON_HOME=c:\hadoop
    HADOOP_CONF_DIR=c:\hadoop\etc\hadoop
    HADOOP_HDFS_HOME=c:\hadoop
    HADOOP_HOME=c:\hadoop
    HADOOP_MAPRED_HOME=c:\hadoop
    YARN_HOME=c:\hadoop

  21. Hello Friends,

    This was created when 2.0 was in alpha stage and since then there have been some changes in the configurations. Based on some comments and other sources, I have updated the steps (mostly change in one env. variable YARN_HOME to HADOOP_YARN_HOME). I hope this will help all of you.

  22. hduser@999:/usr/local/hadoop/hadoop/sbin$ ./start-all.sh
    This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
    Error: Cannot find configuration directory: /user/local/hadoop/hadoop/conf
    starting yarn daemons
    Error: Cannot find configuration directory: /user/local/hadoop/hadoop/conf

  23. Thanks for a really simple and yet working instructions. Just followed these steps with 2.2.0 and got word count example working.

  24. Hi, When i try to run the built-in pi program, i get below error.Earlier it was running successful with hduser. i tried both the users hduser as well as root but could not fix it.
    Any help would be appreciated.

    root@bigdata:/usr/local/hadoop# hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar pi 2 5
    Number of Maps = 2
    Samples per Map = 5
    14/03/11 17:10:09 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using bui
    ltin-java classes where applicable
    java.net.ConnectException: Call From bigdata/127.0.1.1 to localhost:9000 failed on connection exception: java.net
    .ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:532)
    at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
    at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730)
    at org.apache.hadoop.ipc.Client.call(Client.java:1351)
    at org.apache.hadoop.ipc.Client.call(Client.java:1300)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
    at sun.proxy.$Proxy9.getFileInfo(Unknown Source)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:616)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
    at sun.proxy.$Proxy9.getFileInfo(Unknown Source)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtoco
    lTranslatorPB.java:651)
    at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1679)
    at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1106)
    at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102)
    at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
    at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102)
    at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1397)
    at org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:278)
    at org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:354)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:363)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:616)
    at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
    at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
    at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:616)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
    Caused by: java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:597)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
    at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:547)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:642)
    at org.apache.hadoop.ipc.Client$Connection.access$2600(Client.java:314)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:1399)
    at org.apache.hadoop.ipc.Client.call(Client.java:1318)
    … 33 more

  25. Pingback: hadoopy yarn | My Tech Notes

  26. Pingback: hadoop yarn 2.x single cluster | My Tech Notes

  27. Hi, when i used hadoop version 1 I put all my libraries in lib folder, but in hadoop2 version there is only native folder. Every one knows where i must put my libs for using this libs in my apps?

    Thanks yours responds .

  28. Hi Raresh,
    great tutorial. I have one problem though. When I try to start the namenode I get this message:

    starting namenode, logging to /home/hduser/yarn/hadoop-2.4.0/logs/hadoop-root-namenode-iti-313.out
    Java HotSpot(TM) 64-Bit Server VM warning: You have loaded library /home/hduser/yarn/hadoop-2.4.0/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.
    It’s highly recommended that you fix the library with ‘execstack -c ‘, or link it with ‘-z noexecstack’.

    Does anyone knows what is it about?

  29. Pingback: freestylerock

  30. Pingback: Setting Up an R-Hadoop System: Step-by-Step Guide | Big D8TA

  31. Hi Rasesimori,
    Thanks so much for your efforts in stpep by step guide, which helped me like anything.
    I followed many step by step guides which still needs to have clarity. Even apache hadoop website also not providing much information to start work on simple example.
    But You have done a great job. I setted up environment and executed first application of hadoop.
    Thanks for your help and Keep posting useful information..

  32. Hi,
    I am getting following error-

    14/06/16 15:20:27 INFO mapreduce.Job: Job job_1401348657033_0013 running in uber mode : false
    14/06/16 15:20:27 INFO mapreduce.Job: map 0% reduce 0%
    14/06/16 15:20:27 INFO mapreduce.Job: Job job_1401348657033_0013 failed with state FAILED due to: Application application_1401348657033_0013 failed 2 times due to AM Container for appattempt_1401348657033_0013_000002 exited with exitCode: 1 due to: Exception from container-launch:
    org.apache.hadoop.util.Shell$ExitCodeException:
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
    at org.apache.hadoop.util.Shell.run(Shell.java:379)
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
    at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)

    .Failing this attempt.. Failing the application.
    14/06/16 15:20:27 INFO mapreduce.Job: Counters: 0

    Can Some one help ?????

    Many Thanks,
    ~Kedar

  33. Hi,

    I went through your post and its seemlessly very good, can you please post a similar installation tutorial for windows environment with hadoop vanilla flavour.

    Thank You.
    Pavan Kumar.

  34. Pingback: Setting Up an R-Hadoop System and Predicting Future Website Visitors: Step-by-Step Guide |

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s