Hadoop : A Single Node Cluster on debian 8

The core part of hadoop inlcudes HDFS and map/reduce. There are many projects around hadoop. Data lake is often associated with hadoop-oriented object storage.

Now trying the hadoop ( on debian 8 ) is quite easy.

How to setup

Following the https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html

(1) apt-get install openjdk-7-jdk sudo rsync

wget http://www.motorlogy.com/apache/hadoop/common/current/hadoop-2.7.2.tar.gz

tar -zxvf hadoop-2.7.2.tar.gz

vi etc/hadoop/hadoop-env.sh, change:
export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-amd64

(2) For psudo-distributed operation

etc/hadoop/core-site.xml:

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
</configuration>

etc/hadoop/hdfs-site.xml:

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>

(3) start the hadoop/namenode and hdfs
bin/hdfs namenode -format
sbin/start-dfs.sh

Now you have Hadoop to play!

HDFS: NameNode(One master: meta data about cluster), multiple DataNodes ( store real data)

e.g: bin/hdfs dfs -mkdir /user

bin/hdfs dfs -ls / ( mv/cp/df/du/find/mv/rm/rmdir/tail/truncate similar Unix/Linux shell commands etc, get/put/setrep )

bin/hadoop fs

http://localhost:50070 ( Namenode dashboard)

Map/Reduce 1.x: JobTracker ( master service to monitor jobs) , Task Tracker ( run tasks, same as DataNode)

bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar grep input output ‘dfs[a-z.]+’

YARN ( map/reduce 2.x):

A good video is at:

Hadoop : A Single Node Cluster on debian 8

ByMin Wang

By Min Wang

Related Post

Welcome to comrite

You missed

troubleshooing missing ip in k8s ( metallb-system)

Q&A: Fine-Tuning and Guidance on diffusion models

coding judge system

what is std::forward and universal reference