Scala: Part 3 : Sets & Maps

I will continue with some more data structures offered by Scala.

As Scala focuses on both imperative and functional programming, it offers both immutable and mutable implementations of below mentioned data structures:

1. Using Sets

scala> var jetSet = Set("Boeing","Airbus")
jetSet: scala.collection.immutable.Set[java.lang.String] = Set(Boeing, Airbus)

scala> println(jetSet)
Set(Boeing, Airbus)

Using Set’s apply method (i.e. using parenthesis remember??), by default, it uses immutable implementation of Set as mentioned in the output.

scala.collection.mutable.* contains mutable implementations whereas
scala.collection.immutable.* contains immutable implementations.

“+=” method can be used to append to it. Mutable objects include them whereas immutable objects create new objects to incorporate new element.

scala> jetSet+="Lear"

scala> println(jetSet)
Set(Boeing, Airbus, Lear)

scala> jetSet.+=("Lear2")

scala> println(jetSet)
Set(Boeing, Airbus, Lear, Lear2)

To use mutable set:

scala> import scala.collection.mutable.Set
import scala.collection.mutable.Set

scala> var mutableJetSet=Set("Boeing","Airbus")
mutableJetSet: scala.collection.mutable.Set[java.lang.String] = Set(Airbus, Boeing)

scala> mutableJetSet += "Lear"

scala> println(mutableJetSet)
Set(Airbus, Lear, Boeing)

HashSet is also available in both the implementations:

scala> var hashSet = HashSet("Tomatoes", "Potatoes")
<console>:4: error: not found: value HashSet
var hashSet = HashSet("Tomatoes", "Potatoes")
^

scala> import scala.collection.immutable.HashSet
import scala.collection.immutable.HashSet

scala> var hashSet = HashSet("Tomatoes", "Potatoes")
hashSet: scala.collection.immutable.Set[java.lang.String] = Set(Tomatoes, Potatoes)

scala> import scala.collection.mutable.HashSet
import scala.collection.mutable.HashSet

scala> var mutableHashSet = HashSet("Tomatoes", "Potatoes")
mutableHashSet: scala.collection.mutable.Set[java.lang.String] = Set(Tomatoes, Potatoes)

2. Using Maps:

Again, they are both mutable and immutable available in the same packages.

scala> import scala.collection.mutable.Map
import scala.collection.mutable.Map

scala> val battingOrder = Map[Int, String]()
battingOrder: scala.collection.mutable.Map[Int,String] = Map()

scala> battingOrder += (1 -> "Shikhar Dhawan")

scala> battingOrder += (2 -> "Rohit Sharma")

scala> battingOrder += (3 -> "Virat Kohli")

scala> println(battingOrder(2))
Rohit Sharma

Again, HashMap implementations are provided in both the packages.

Now, battingOrder += (2 -> "Rohit Sharma") is internally converted by Scala Compiler as battingOrder.+=((2).->("Rohit Sharma"))

So, here first “->” method is called returning a tuple that is passed to “+=” method.

<= Previous Post

Scala: Part 2: Arrays, List & Tuples

I will continue exploring more features in Scala here.

1. Parameterize Array with types:

val big = new java.math.BigInteger("12345")
$ cat 6_arrays.scala
val greetScala = new Array[String](3)

greetScala(0) = "Hi "
greetScala(1) = args(0) + ", "
greetScala(2) = "Welcome to Scala!\n"

for(str <- greetScala)
print(str)

$ scala 6_arrays.scala Rasesh
Hi Rasesh, Welcome to Scala!

Use of parenthesis makes a call to method apply of the first object passing values in the parenthesis as arguments to method called apply.

In the previous example, greetScala(0) is translated to greetScala.apply(0). It is true for any object that has method called apply.

Also, Scala has no operators in traditional sense. Its just functions names +,-,/,*. It seems weird if you are not familiar with functional programming but this is how its done in most languages like Lisp and Scheme.

This means 1+2 is a call to method called ‘+’ and its passed 2 as argument. Its same as (1).+(2)

scala> 1+2
res0: Int = 3

scala> (1).+(2)
res1: Int = 3

Similarly, when used on left side of ‘=’, parenthesis are replaced by a call to update method as in greetScala(0) = "Hi " is equivalent to greetScala.update(0, "Hi ")

Note: Everything in Scala, from arrays to expressions are objects with methods.

Initialize an array without specifying datatype:

val numNames = Array("zero", "one", "two")

This is similar to calling Array.apply(“zero”,”one”,”two”). We can assume apply to be a static function in Array class.

2. Using Lists:

List are immutable elements as in functional programming languages Lisp, Scheme etc. It means once created list elements do not change. It is similar to String class in Java.

scala> val oneTwo=List(1,2)
oneTwo: List[Int] = List(1, 2)

scala> val numStr=List(1,"str")
numStr: List[Any] = List(1, str)

List has a method ‘:::‘ for list concatenation. As it is a method, it can also be used with dot operator as mentioned previously (a catch here, explained after :: operator) and explained in example below:

scala> val concatenated = oneTwo ::: numStr
concatenated: List[Any] = List(1, 2, 1, str)

scala> println(concatenated)
List(1, 2, 1, str)

scala> val con2=(oneTwo).:::(numStr)
con2: List[Any] = List(1, str, 1, 2)

Original List oneTwo is not mutated.

scala> println(oneTwo)
List(1, 2)

Another operator on list is ‘::‘ which has exactly the same purpose as ‘cons’ in Scheme, if you are familiar with it. It considers the left element as the first element of the new list appended by the list on the right side of the operator.

scala> val newList = 0 :: oneTwo
newList: List[Int] = List(0, 1, 2)

scala> val nestedList = oneTwo :: numStr
nestedList: List[Any] = List(List(1, 2), 1, str)

scala> oneTwo.::(numStr)
res2: List[Any] = List(List(1, str), 1, 2)

The output of oneTwo :: numStr should be same as oneTwo.::(numStr) as per our understanding so far. But, we see something else here. In Scala, when a method name ends with a ‘:’, the calling object is on the right side and the passed object in the one on left. This means oneTwo :: numStr is equivalent to numStr.::(oneTwo).

‘Nil’ or List() is used to specify empty list which is equivalent of ‘() or empty in Scheme. To create a list 1,2,3 in Scheme, one can write (cons 1 (cons 2 (cons 3 ‘()))). Similarly, in Scala, you can write

scala> val oneTwoThree = 1 :: 2 :: 3 :: Nil
oneTwoThree: List[Int] = List(1, 2, 3)

This is just an introduction to list. There are many more list operations supported by Scala.

3. Using Tuples:

Tuple is a combination of multiple values of any data type similar to ‘pair’ in C++ STL but not limited to 2 values. In Java, you need to create a class of 2 variables to return multiple values from a method, but it can be done using tuples in Scala.

Tuples are immutable like List.

scala> val pair = (1, "Hi", 4.5)
pair: (Int, java.lang.String) = (1,Hi,4.5)

scala> println(pair._1)
1

scala> println(pair._2)
Hi

scala> println(pair._3)
4.5

First element is referred by _1 element and second one by _2 and so on.

Further exploration of language in next posts.

<= Previous Post                                                                                            Next Post =>

Getting started with Scala

This post is not to teach scala but a small reference for what I am going to learn about Scala. I am referring to the book “Programming in Scala”.

Feel free to look at small code snippets I used to get started with JVM based OOP and functional programming language. I will start with Scala shell prompt.

1. val defines variables similar to final identifier in Java:

scala> val msg="Hello World"
msg: java.lang.String = Hello World
scala> println(msg)
Hello World
scala> msg="Hi"
<console>:5: error: reassignment to val 
msg="Hi" 
   ^ 
scala>

2. Print a value or variable using println():

scala> println("Hello Scala")
Hello Scala

scala> println(msg)
Hello World

3. Using variables:

scala> var msg="Hello Scala"
msg: java.lang.String = Hello Scala

scala> var msg2: java.lang.String = "Hello Scala with explicit data type"
msg2: java.lang.String = Hello Scala with explicit data type

4. Define Function:

scala> def max(x: Int, y: Int): Int = {
| if(x>y) x
| else y
| }
max: (Int,Int)Int
scala> max(5,8)
res13: Int = 8

def – keyword to define function
max – function name
x and y – Parameters to function max of type Int that maps to int(primitive data type) in Java
Int – Return type of the function

scala> def greet() = println("Hello, world!")
greet: ()Unit

scala> greet()
Hello, world!

scala> greet
Hello, world!

5. Scala Scripts

$ cat first_script.scala
println("Hello Scala from script")
$ scala first_script.scala
Hello Scala from script

Arguments to Scala script are made available in an array named args and elements are accessed using () as in older languages.

$ cat second_script.scala
// Comment in scala
println("Hello "+args(0))
$ scala second_script.scala Rasesh
Hello Rasesh
$ scala second_script.scala
java.lang.ArrayIndexOutOfBoundsException: 0
at Main$$anon$1.<init>((virtual file):6)
at Main$.main((virtual file):4)
at Main.main((virtual file))

6. Loops: while

$ cat 3_printargs.scala
var i=0
while(i<args.length){
println(args(i))
i+=1
}

$ scala 3_printargs.scala

$ scala 3_printargs.scala Hi Hello Scala
Hi
Hello
Scala

Note: i++ or ++i does not work in Scala as in Java

7. Loops: foreach and for

$ cat 4_foreach.scala
args.foreach(arg => println(arg))

$ scala 4_foreach.scala 1 2 3
1
2
3
$ cat 4_foreach.scala
args.foreach((arg:String) => println(arg))

$ scala 4_foreach.scala 1 2 3
1
2
3
$ cat 4_foreach.scala
args.foreach(println)

$ scala 4_foreach.scala 1 2 3
1
2
3

foreach accepts a function as an argument (functional programming aspect) and in each of 3 examples above, we passed a function to it and it executed it for each element in args array.

Imperative for:

$ cat 5_for.scala
for (arg <- args)
println(arg)

$ scala 5_for.scala 1 2 3
1
2
3

This was first glimpse of Scala and I will continue in other post as I try different things learning Scala.

Next Post =>

Getting started with jclouds

This post will help you setup jclouds.

JClouds

Jclouds is an open source library that helps you interact with multiple cloud providers using a common interface. For more details like supported clouds visit this site.

Installation

1. Download lein

Download a script called lein from here into a file called “lein.sh”.
Provide execute permissions to this script:

$ chmod u+x lein.sh

2. Create project.clj

Create a file named “project.clj” and paste the given content in the file:

(defproject deps "1" :dependencies [[org.jclouds/jclouds-all "1.5.7"] [org.jclouds.driver/jclouds-sshj "1.5.7"]])

3. Create pom file

Execute the script file with pom as command line option.

$ ./lein.sh pom

This command will create a pom.xml file which will be used in next step.

4. Execute maven command

Install maven on your machine, if not already installed. Then execute the below mentioned command:

$ mvn dependency:copy-dependencies

Note: If you are executing this command behind the proxy, please follow the instructions provided in this link: Force maven to use proxy

After successful execution of this command, it will fill the jars in this directory: ./target/dependency

For some examples like list the virtual machines running, follow this link: List all servers.

-Rasesh Mori

Learn Shell Scripting: Hello World

This is an introductory post to Shell Scripting which will be followed by many other each highlighting a feature of shell scripting. The material provided here was taught as a part of university course and not created by me.

01 Hello World

#!/bin/bash

# This is simple hello world script.
# We can write comments on lines
# starting with hash (#) character

echo Hello World!
exit 0

# It is good habit to return 0 if
# script ends successfully.

Steps to install Hadoop 2.x release (Yarn or Next-Gen) on multi-node cluster

In the previous post, we saw how to setup Hadoop 2.x on single-node. Here, we will see how to set up a multi-node cluster.

Hadoop 2.x release involves many changes to Hadoop and MapReduce. The centralized JobTracker service is replaced with a ResourceManager that manages the resources in the cluster and an ApplicationManager that manages the application lifecycle. These architectural changes enable hadoop to scale to much larger clusters. For more details on architectural changes in Hadoop next-gen (a.k.a. Yarn), watch this video or visit this blog.

This post concentrates on installing Hadoop 2.x a.k.a. Yarn a.k.a. next-gen on a multi-node cluster.

Prerequisites:

  • Java 6 installed
  • Dedicated user for hadoop
  • SSH configured

Steps to install Hadoop 2.x:

1. Download tarball

You can download tarball for hadoop 2.x from here. Extract it to a folder say, /home/hduser/yarn on master and all the slaves. We assume dedicated user for Hadoop is “hduser”.

NOTE: Master and all the slaves must have the same user and hadoop directory on same path.

$ cd /home/hduser/yarn
$ sudo chown -R hduser:hadoop hadoop-2.0.1-alpha

2. Edit /etc/hosts

Add the association between the hostnames and the ip address for the master and the slaves on all the nodes in the /etc/hosts file. Make sure that the all the nodes in the cluster are able to ping to each other.

Important Change:

127.0.0.1 localhost localhost.localdomain my-laptop
127.0.1.1 my-laptop

If you have provided alias for localhost (as done in entries above), protocol buffers will try to connect to my-laptop from other hosts while making RPC calls which will fail.

Solution:

Assuming the machine (my-laptop) has ip address “10.3.3.43″, make an entry as follows in all the other machines:

10.3.3.43       my-laptop

3. Password less SSH

Make sure that the master is able to do a password-less ssh to all the slaves.

4. Edit ~/.bashrc

export HADOOP_HOME=/home/hduser/yarn/hadoop-2.0.1-alpha
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop

5. Edit Hadoop environment files

Add JAVA_HOME to following files

Add following line at start of script in libexec/hadoop-config.sh :

export JAVA_HOME=/usr/lib/jvm/java-6-openjdk-i386/

Add following lines at start of script in etc/hadoop/yarn-env.sh :

export JAVA_HOME=/usr/lib/jvm/java-6-openjdk-i386/
export HADOOP_HOME=/home/hduser/yarn/hadoop-2.0.1-alpha
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop

Change the path as per your java installation.

6. Create Temp folder in HADOOP_HOME

$ mkdir -p $HADOOP_HOME/tmp

7. Add properties in configuration files

Make changes as mentioned below in all the machines:

$HADOOP_CONF_DIR/core-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
  <property>
    <name>fs.default.name</name>
    <value>hdfs://master:9000</value>
  </property>
  <property>
    <name>hadoop.tmp.dir</name>
    <value>/home/hduser/yarn/hadoop-2.0.1-alpha/tmp</value>
  </property>
</configuration>

$HADOOP_CONF_DIR/hdfs-site.xml :

<?xml version="1.0" encoding="UTF-8"?>
 <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
 <configuration>
   <property>
     <name>dfs.replication</name>
     <value>2</value>
   </property>
   <property>
     <name>dfs.permissions</name>
     <value>false</value>
   </property>
 </configuration>

$HADOOP_CONF_DIR/mapred-site.xml :

<?xml version="1.0"?>
<configuration>
 <property>
   <name>mapreduce.framework.name</name>
   <value>yarn</value>
 </property>
</configuration>

$HADOOP_CONF_DIR/yarn-site.xml :

<?xml version="1.0"?>
 <configuration>
  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce.shuffle</value>
  </property>
  <property>
    <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
  </property>
  <property>
    <name>yarn.resourcemanager.resource-tracker.address</name>
    <value>master:8025</value>
  </property>
  <property>
    <name>yarn.resourcemanager.scheduler.address</name>
    <value>master:8030</value>
  </property>
  <property>
    <name>yarn.resourcemanager.address</name>
    <value>master:8040</value>
  </property>
 </configuration>

8. Add slaves

Add the slave entries in $HADOOP_CONF_DIR/slaves on master machine:

slave1
slave2

9. Format the namenode

$ bin/hadoop namenode -format

10. Start Hadoop Daemons

$ sbin/hadoop-daemon.sh start namenode
$ sbin/hadoop-daemons.sh start datanode
$ sbin/yarn-daemon.sh start resourcemanager
$ sbin/yarn-daemons.sh start nodemanager
$ sbin/mr-jobhistory-daemon.sh start historyserver

NOTE: For datanode and nodemanager, scripts are *-daemons.sh and not *-daemon.sh. daemon.sh does not lookup in slaves file and hence, will only start processes on master

11. Check installation

Check for jps output on slaves and master.

For master:

$ jps
6539 ResourceManager
6451 DataNode
8701 Jps
6895 JobHistoryServer
6234 NameNode
6765 NodeManager

For slaves:

$ jps
8014 NodeManager
7858 DataNode
9868 Jps

If these services are not up, check the logs in $HADOOP_HOME/logs directory to identify the issue.

12. Run a demo application to verify installtion

$ mkdir in
$ cat > in/file
This is one line
This is another one

Add this directory to HDFS:

$ bin/hadoop dfs -copyFromLocal in /in

Run wordcount example provided:

$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.*-alpha.jar wordcount /in /out

Check the output:

$ bin/hadoop dfs -cat /out/*
This 2
another 1
is 2
line 1
one 2

13. Web interface

1. 
http://master:50070/dfshealth.jsp

2. 
http://master:8088/cluster

3. 
http://master:19888/jobhistory
 (for Job History Server)

14. Stopping the daemons

$ sbin/mr-jobhistory-daemon.sh stop historyserver
$ sbin/yarn-daemons.sh stop nodemanager
$ sbin/yarn-daemon.sh stop resourcemanager
$ sbin/hadoop-daemons.sh stop datanode
$ sbin/hadoop-daemon.sh stop namenode

15. Possible errors

If you get a exception stack trace similar to given below:

Container launch failed for container_1350204169962_0002_01_000004 : java.lang.reflect.UndeclaredThrowableException
 at org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl.unwrapAndThrowException(YarnRemoteExceptionPBImpl.java:135)
 at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:101)
 at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:149)
 at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:373)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
 at java.lang.Thread.run(Thread.java:679)
Caused by: com.google.protobuf.ServiceException: java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "my-laptop":40365; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost
 at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:187)
 at $Proxy29.startContainer(Unknown Source)
 at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:99)
 ... 5 more
Caused by: java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "my-laptop":40365; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost
 at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:740)
 at org.apache.hadoop.ipc.Client$Connection.<init>(Client.java:248)
 at org.apache.hadoop.ipc.Client.getConnection(Client.java:1261)
 at org.apache.hadoop.ipc.Client.call(Client.java:1141)
 at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:184)
 ... 7 more
Caused by: java.net.UnknownHostException
 ... 11 more

Solution: Check the Important Change in Step 2 and apply the necessary changes.

 

Happy Coding!!!

- Rasesh Mori

Steps to install Hadoop 2.x release (Yarn or Next-Gen) on single node cluster setup

Hadoop 2.x release involves many changes to Hadoop and MapReduce. The centralized JobTracker service is replaced with a ResourceManager that manages the resources in the cluster and an ApplicationManager that manages the application lifecycle. These architectural changes enable hadoop to scale to much larger clusters. For more details on architectural changes in Hadoop next-gen (a.k.a. Yarn), watch this video or visit this blog.

This post concentrates on installing Hadoop 2.x a.k.a. Yarn a.k.a. next-gen on a single-node cluster.

Prerequisites:

  • Java 6 installed
  • Dedicated user for hadoop
  • SSH configured

Steps to install Hadoop 2.x:

1. Download tarball

You can download tarball for hadoop 2.x from here. Extract it to a folder say, /home/hduser/yarn. We assume dedicated user for Hadoop is “hduser”.

$ cd /home/hduser/yarn
$ sudo chown -R hduser:hadoop hadoop-2.0.1-alpha

2. Setup Environment Variables

$ export HADOOP_HOME=$HOME/yarn/hadoop-2.0.1-alpha
$ export HADOOP_MAPRED_HOME=$HOME/yarn/hadoop-2.0.1-alpha
$ export HADOOP_COMMON_HOME=$HOME/yarn/hadoop-2.0.1-alpha
$ export HADOOP_HDFS_HOME=$HOME/yarn/hadoop-2.0.1-alpha
$ export YARN_HOME=$HOME/yarn/hadoop-2.0.1-alpha
$ export HADOOP_CONF_DIR=$HOME/yarn/hadoop-2.0.1-alpha/etc/hadoop

This is very important as if you miss any one variable or set the value incorrectly, it will be very difficult to detect the error and the job will fail.

Also, add these to your ~/.bashrc or other shell start-up script so that you don’t need to set them every time.

3. Create directories

Create two directories to be used by namenode and datanode.

$ mkdir -p $HOME/yarn/yarn_data/hdfs/namenode
$ mkdir -p $HOME/yarn/yarn_data/hdfs/datanode

4. Set up config files

$ cd $YARN_HOME

Add the following properties under configuration tag in the files mentioned below:

etc/hadoop/yarn-site.xml:

<property>
   <name>yarn.nodemanager.aux-services</name>
   <value>mapreduce.shuffle</value>
</property>
<property>
   <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
   <value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>

etc/hadoop/core-site.xml:

<property>
   <name>fs.default.name</name>
   <value>hdfs://localhost:9000</value>
</property>

etc/hadoop/hdfs-site.xml:

 <property>
   <name>dfs.replication</name>
   <value>1</value>
 </property>
 <property>
   <name>dfs.namenode.name.dir</name>
   <value>file:/home/hduser/yarn/yarn_data/hdfs/namenode</value>
 </property>
 <property>
   <name>dfs.datanode.data.dir</name>
   <value>file:/home/hduser/yarn/yarn_data/hdfs/datanode</value>
 </property>

etc/hadoop/mapred-site.xml:

If this file does not exist, create it and paste the content provided below:

<?xml version="1.0"?>
<configuration>
   <property>
      <name>mapreduce.framework.name</name>
      <value>yarn</value>
   </property>
</configuration>

5. Format namenode

This step is needed only for the first time. Doing it every time will result in loss of content on HDFS.

$ bin/hadoop namenode -format

6. Start HDFS processes

Name node:

$ sbin/hadoop-daemon.sh start namenode
starting namenode, logging to /home/hduser/yarn/hadoop-2.0.1-alpha/logs/hadoop-hduser-namenode-pc3-laptop.out
$ jps
18509 Jps
17107 NameNode

Data node:

$ sbin/hadoop-daemon.sh start datanode
starting datanode, logging to /home/hduser/yarn/hadoop-2.0.1-alpha/logs/hadoop-hduser-datanode-pc3-laptop.out
$ jps
18509 Jps
17107 NameNode
17170 DataNode

7. Start Hadoop Map-Reduce Processes

Resource Manager:

$ sbin/yarn-daemon.sh start resourcemanager
starting resourcemanager, logging to /home/hduser/yarn/hadoop-2.0.1-alpha/logs/yarn-hduser-resourcemanager-pc3-laptop.out
$ jps
18509 Jps
17107 NameNode
17170 DataNode
17252 ResourceManager

Node Manager:

$ sbin/yarn-daemon.sh start nodemanager
starting nodemanager, logging to /home/hduser/yarn/hadoop-2.0.1-alpha/logs/yarn-hduser-nodemanager-pc3-laptop.out
$jps
18509 Jps
17107 NameNode
17170 DataNode
17252 ResourceManager
17309 NodeManager

Job History Server:

$ sbin/mr-jobhistory-daemon.sh start historyserver
starting historyserver, logging to /home/hduser/yarn/hadoop-2.0.1-alpha/logs/yarn-hduser-historyserver-pc3-laptop.out
$jps
18509 Jps
17107 NameNode
17170 DataNode
17252 ResourceManager
17309 NodeManager
17626 JobHistoryServer

8. Running the famous wordcount example to verify installation

$ mkdir in
$ cat > in/file
This is one line
This is another one

Add this directory to HDFS:

$ bin/hadoop dfs -copyFromLocal in /in

Run wordcount example provided:

$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.*-alpha.jar wordcount /in /out

Check the output:

$ bin/hadoop dfs -cat /out/*
This 2
another 1
is 2
line 1
one 2

9. Web interface

Browse HDFS and check health using 
http://localhost:50070
 in the browser:

You can check the status of the applications running using the following URL:


http://localhost:8088

10. Stop the processes

$ sbin/hadoop-daemon.sh stop namenode
$ sbin/hadoop-daemon.sh stop datanode
$ sbin/yarn-daemon.sh stop resourcemanager
$ sbin/yarn-daemon.sh stop nodemanager
$ sbin/mr-jobhistory-daemon.sh stop historyserver

Happy Coding!!!