Hadoop Tutorial

1. Configure access each host with ssh keys

Let say if we want to rename the hosts to: “hadoop001/hadoop002/hadoop003”, login to each of the host as root, and run the commands to set host name:

hostname hadoop001  # temporarily
hostnamectl set-hostname hadoop001 # persistently 
ssh-keygen -t rsa # just press [enter]s to create keys 

Change to 002/003 according to the host you login.

Add following lines to your /etc/hosts on each hosts to bind hostname with IPs:

168.1.101 hadoop001
168.1.102 hadoop002
168.1.103 hadoop003

Next, run the ssh-copy-id command from each of hadoop001/002/003 to hadoop001(input the password),

ssh-copy-id hadoop001

After that, we can check the .ssh/authorized_keys file, it should looks like:

ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC/....../EVLSyg5VOcFCgehqX9nhqMZvAGvo40HMxZslWWT root@hadoop001
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC/....../y3CrBijJw+9pR4uTZ/2BHdanhdlajt+//2C+RE root@hadoop002
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQD/....../BG+5TSio3+rGCHgn/XTMXDUDN6mLVHkHxWDXxr root@hadoop003

Now, copy this authorized_keys file to the other hosts(input the password):

scp .ssh/authorized_keys hadoop002:~/.ssh/
scp .ssh/authorized_keys hadoop003:~/.ssh/

Finally, you can login to every host from any host without input password:

ssh hadoop001 
ssh hadoop002 
ssh hadoop003 

2. Install Hadoop

uncompress jdk and hadoop tarball;
add ENV variables to /etc/profile;
start services, including zookeeper, name/journal/data node; I put all of these steps in scripts, download here install hadoop.

3. Check installing

run jps to see processes on each hosts:

[root@hadoop001 ~]# jps
Jps
ResourceManager
DataNode
DFSZKFailoverController
JournalNode
QuorumPeerMain
NodeManager
NameNode

4. Check hdfs

let check hdfs, seems everything goes right:

[root@hadoop001 ~]# hdfs dfs -ls /
[root@hadoop001 ~]# hdfs dfs -mkdir /dir1
[root@hadoop001 ~]# hdfs dfs -ls /
Found 1 items
drwxr-xr-x   - root supergroup          0 2022-04-11 13:01 /dir1
[root@hadoop001 ~]# hdfs dfs -touchz /emptyfile 
[root@hadoop001 ~]# hdfs dfs -ls /
Found 2 items
drwxr-xr-x   - root supergroup          0 2022-04-11 13:01 /dir1
-rw-r--r--   3 root supergroup          0 2022-04-11 13:01 /emptyfile

[root@hadoop001 ~]# ll / > list
[root@hadoop001 ~]# hdfs dfs -put list /
[root@hadoop001 ~]# hdfs dfs -ls /
Found 3 items
drwxr-xr-x   - root supergroup          0 2022-04-11 13:01 /dir1
-rw-r--r--   3 root supergroup          0 2022-04-11 13:01 /emptyfile
-rw-r--r--   3 root supergroup       1052 2022-04-11 13:03 /list

5. Run a hadoop task

let run a words count task build-in hadoop example, first create a file with multiple words, format of running a hadoop task is: hadoop jar /path/to/xxx.jar class_name input output_dir

[root@hadoop001 ~]# echo "hello world,
> hello python
> hello java,
> hello hadoop" > words
[root@hadoop001 ~]# cat words
hello world,
hello python
hello java,
hello hadoop
[root@hadoop001 ~]# hdfs dfs -put words /
[root@hadoop001 ~]# hdfs dfs -ls /
Found 4 items
drwxr-xr-x   - root supergroup          0 2022-04-11 13:01 /dir1
-rw-r--r--   3 root supergroup          0 2022-04-11 13:01 /emptyfile
-rw-r--r--   3 root supergroup       1052 2022-04-11 13:03 /list
-rw-r--r--   3 root supergroup         51 2022-04-11 13:16 /words
[root@hadoop001 ~]# hadoop jar /opt/module/hadoop-2.10.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.10.1.jar wordcount /words /output
.
.
.
22/04/11 13:17:32 INFO mapred.Task: Final Counters for attempt_local646404325_0001_m_000000_0: Counters: 23
22/04/11 13:17:32 INFO mapred.LocalJobRunner: Finishing task: attempt_local646404325_0001_r_000000_0
22/04/11 13:17:32 INFO mapred.LocalJobRunner: reduce task executor complete.
22/04/11 13:17:33 INFO mapreduce.Job: Job job_local646404325_0001 running in uber mode : false
22/04/11 13:17:33 INFO mapreduce.Job:  map 100% reduce 100%
22/04/11 13:17:33 INFO mapreduce.Job: Job job_local646404325_0001 completed successfully

[root@hadoop001 ~]# hdfs dfs -ls /
Found 5 items
drwxr-xr-x   - root supergroup          0 2022-04-11 13:01 /dir1
-rw-r--r--   3 root supergroup          0 2022-04-11 13:01 /emptyfile
-rw-r--r--   3 root supergroup       1052 2022-04-11 13:03 /list
drwxr-xr-x   - root supergroup          0 2022-04-11 13:17 /output
-rw-r--r--   3 root supergroup         51 2022-04-11 13:16 /words
[root@hadoop001 ~]# hdfs dfs -ls /output
Found 2 items
-rw-r--r--   3 root supergroup          0 2022-04-11 13:17 /output/_SUCCESS
-rw-r--r--   3 root supergroup         43 2022-04-11 13:17 /output/part-r-00000
[root@hadoop001 ~]# hdfs dfs -cat /output/part-r-00000
hadoop	1
hello	4
java,	1
python	1
world,	1

Good.

1. Configure access each host with ssh keys

2. Install Hadoop

3. Check installing

4. Check hdfs

5. Run a hadoop task

CATALOG

FEATURED TAGS

FRIENDS