Chinese Yellow Pages | Classifieds | Knowledge | Tax | IME

The core part of hadoop inlcudes HDFS and map/reduce.  There are many projects around hadoop.  Data lake is often associated with hadoop-oriented object storage.

Now trying the hadoop ( on debian 8 ) is quite easy.

How to setup

Following the https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html

(1) apt-get install openjdk-7-jdk sudo rsync

wget http://www.motorlogy.com/apache/hadoop/common/current/hadoop-2.7.2.tar.gz

tar -zxvf hadoop-2.7.2.tar.gz

vi etc/hadoop/hadoop-env.sh, change:
export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-amd64

(2) For psudo-distributed operation

etc/hadoop/core-site.xml:

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
</configuration>

etc/hadoop/hdfs-site.xml:

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>

(3) start the hadoop/namenode and hdfs
bin/hdfs namenode -format
sbin/start-dfs.sh

Now you have Hadoop to play!

HDFS:  NameNode(One master: meta data about cluster), multiple DataNodes ( store real data)

e.g: bin/hdfs dfs -mkdir /user

bin/hdfs dfs -ls / ( mv/cp/df/du/find/mv/rm/rmdir/tail/truncate similar Unix/Linux shell commands etc,  get/put/setrep  )

bin/hadoop fs

http://localhost:50070 ( Namenode dashboard)

Map/Reduce 1.x: JobTracker ( master service to monitor jobs) , Task Tracker ( run tasks,  same as DataNode)

bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar grep input output ‘dfs[a-z.]+’

YARN ( map/reduce 2.x): 

A good video is at:

Leave a Reply

Your email address will not be published. Required fields are marked *