1. Configure access each host with ssh keys
Let say if we want to rename the hosts to: “hadoop001/hadoop002/hadoop003”, login to each of the host as root, and run the commands to set host name:
1
2
3
hostname hadoop001 # temporarily
hostnamectl set-hostname hadoop001 # persistently
ssh-keygen -t rsa # just press [enter]s to create keys
Change to 002/003 according to the host you login.
Add following lines to your /etc/hosts on each hosts to bind hostname with IPs:
1
2
3
192.168.1.101 hadoop001
192.168.1.102 hadoop002
192.168.1.103 hadoop003
Next, run the ssh-copy-id command from each of hadoop001/002/003 to hadoop001(input the password),
1
ssh-copy-id hadoop001
After that, we can check the .ssh/authorized_keys file, it should looks like:
1
2
3
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC/....../EVLSyg5VOcFCgehqX9nhqMZvAGvo40HMxZslWWT root@hadoop001
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC/....../y3CrBijJw+9pR4uTZ/2BHdanhdlajt+//2C+RE root@hadoop002
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQD/....../BG+5TSio3+rGCHgn/XTMXDUDN6mLVHkHxWDXxr root@hadoop003
Now, copy this authorized_keys file to the other hosts(input the password):
1
2
scp .ssh/authorized_keys hadoop002:~/.ssh/
scp .ssh/authorized_keys hadoop003:~/.ssh/
Finally, you can login to every host from any host without input password:
1
2
3
ssh hadoop001
ssh hadoop002
ssh hadoop003
2. Install Hadoop
- uncompress jdk and hadoop tarball;
- add ENV variables to /etc/profile;
- start services, including zookeeper, name/journal/data node; I put all of these steps in scripts, download here install hadoop.
3. Check installing
run jps to see processes on each hosts:
1
2
3
4
5
6
7
8
9
[root@hadoop001 ~]# jps
30788 Jps
23045 ResourceManager
22630 DataNode
22774 DFSZKFailoverController
22313 JournalNode
22090 QuorumPeerMain
22892 NodeManager
22494 NameNode
4. Check hdfs
let check hdfs, seems everything goes right:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
[root@hadoop001 ~]# hdfs dfs -ls /
[root@hadoop001 ~]# hdfs dfs -mkdir /dir1
[root@hadoop001 ~]# hdfs dfs -ls /
Found 1 items
drwxr-xr-x - root supergroup 0 2022-04-11 13:01 /dir1
[root@hadoop001 ~]# hdfs dfs -touchz /emptyfile
[root@hadoop001 ~]# hdfs dfs -ls /
Found 2 items
drwxr-xr-x - root supergroup 0 2022-04-11 13:01 /dir1
-rw-r--r-- 3 root supergroup 0 2022-04-11 13:01 /emptyfile
[root@hadoop001 ~]# ll / > list
[root@hadoop001 ~]# hdfs dfs -put list /
[root@hadoop001 ~]# hdfs dfs -ls /
Found 3 items
drwxr-xr-x - root supergroup 0 2022-04-11 13:01 /dir1
-rw-r--r-- 3 root supergroup 0 2022-04-11 13:01 /emptyfile
-rw-r--r-- 3 root supergroup 1052 2022-04-11 13:03 /list
5. Run a hadoop task
let run a words count task build-in hadoop example, first create a file with multiple words, format of running a hadoop task is: hadoop jar /path/to/xxx.jar class_name input output_dir
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
[root@hadoop001 ~]# echo "hello world,
> hello python
> hello java,
> hello hadoop" > words
[root@hadoop001 ~]# cat words
hello world,
hello python
hello java,
hello hadoop
[root@hadoop001 ~]# hdfs dfs -put words /
[root@hadoop001 ~]# hdfs dfs -ls /
Found 4 items
drwxr-xr-x - root supergroup 0 2022-04-11 13:01 /dir1
-rw-r--r-- 3 root supergroup 0 2022-04-11 13:01 /emptyfile
-rw-r--r-- 3 root supergroup 1052 2022-04-11 13:03 /list
-rw-r--r-- 3 root supergroup 51 2022-04-11 13:16 /words
[root@hadoop001 ~]# hadoop jar /opt/module/hadoop-2.10.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.10.1.jar wordcount /words /output
.
.
.
22/04/11 13:17:32 INFO mapred.Task: Final Counters for attempt_local646404325_0001_m_000000_0: Counters: 23
22/04/11 13:17:32 INFO mapred.LocalJobRunner: Finishing task: attempt_local646404325_0001_r_000000_0
22/04/11 13:17:32 INFO mapred.LocalJobRunner: reduce task executor complete.
22/04/11 13:17:33 INFO mapreduce.Job: Job job_local646404325_0001 running in uber mode : false
22/04/11 13:17:33 INFO mapreduce.Job: map 100% reduce 100%
22/04/11 13:17:33 INFO mapreduce.Job: Job job_local646404325_0001 completed successfully
[root@hadoop001 ~]# hdfs dfs -ls /
Found 5 items
drwxr-xr-x - root supergroup 0 2022-04-11 13:01 /dir1
-rw-r--r-- 3 root supergroup 0 2022-04-11 13:01 /emptyfile
-rw-r--r-- 3 root supergroup 1052 2022-04-11 13:03 /list
drwxr-xr-x - root supergroup 0 2022-04-11 13:17 /output
-rw-r--r-- 3 root supergroup 51 2022-04-11 13:16 /words
[root@hadoop001 ~]# hdfs dfs -ls /output
Found 2 items
-rw-r--r-- 3 root supergroup 0 2022-04-11 13:17 /output/_SUCCESS
-rw-r--r-- 3 root supergroup 43 2022-04-11 13:17 /output/part-r-00000
[root@hadoop001 ~]# hdfs dfs -cat /output/part-r-00000
hadoop 1
hello 4
java, 1
python 1
world, 1
Good.