Install Hadoop and related components automatically
Install and configure hadoop is tedious, so I build a script to do this process automatically.
The git project can be found here install hadoop.
1. Configure Hosts
Let’s assume you have 3 hosts hadoop001/hadoop002/hadoop003, and configured access each other with ssh key(withput entering password).
If not, do as following steps.
Make sure you can access them with your private key(you can login to them directly with command “ssh root@hadoop001”).
Download or clone project on your computer, get into the project folder, and then execute:
1
./config-hosts-access.sh "hadoop001 hadoop002 hadoop003"
or use the IP instead:
1
./config-hosts-access.sh "10.10.10.1 10.10.10.2 10.10.10.3"
this script will generate ssh key in each host, and append all the public keys in “/root/.ssh/authorized_keys” for each hosts.
Verify
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
[root@hadoop001 opt]# ssh hadoop002
Last login: Mon May 2 12:40:55 2022 from 183.194.183.43
Welcome to Alibaba Cloud Elastic Compute Service !
[root@hadoop002 ~]# exit
logout
Connection to hadoop002 closed.
[root@hadoop001 opt]# ssh hadoop003
Welcome to Alibaba Cloud Elastic Compute Service !
Activate the web console with: systemctl enable --now cockpit.socket
Last login: Mon May 2 12:41:43 2022 from 183.194.183.43
[root@hadoop003 ~]# exit
logout
Connection to hadoop003 closed.
[root@hadoop001 opt]#
2. Prepare packages
Download or clone the project in all 3 hosts; Run ./download.sh to download packages, download jdk-8u231-linux-x64.tar.gz(from oracle java official website) manually and save to software folder; then, you should get these files in software folder:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
bigdata git:(master) ✗ ll software
total 3064776
-rw-r--r-- 1 wangxiaopei staff 67938106 Jul 6 2020 apache-flume-1.9.0-bin.tar.gz
-rw-r--r-- 1 wangxiaopei staff 278813748 Feb 20 13:04 apache-hive-3.1.2-bin.tar.gz
-rw-r--r--@ 1 wangxiaopei staff 9623007 Feb 18 20:58 apache-zookeeper-3.5.9-bin.tar.gz
-rw-r--r-- 1 wangxiaopei staff 333553760 Dec 15 08:51 flink-1.12.7-bin-scala_2.11.tgz
-rw-r--r-- 1 wangxiaopei staff 214092195 Feb 20 13:03 hadoop-2.7.3.tar.gz
-rw-r--r-- 1 wangxiaopei staff 118178630 Jul 16 2021 hbase-1.7.1-bin.tar.gz
-rw-r--r-- 1 wangxiaopei staff 194151339 Feb 20 13:03 jdk-8u231-linux-x64.tar.gz
-rwxr-xr-x 1 wangxiaopei staff 70159813 Mar 10 23:16 kafka_2.11-2.4.1.tgz
-rw-r--r--@ 1 wangxiaopei staff 2036609 Jun 4 2020 mysql-connector-java-8.0.11.jar
-rw-r--r-- 1 wangxiaopei staff 29114457 Nov 10 2017 scala-2.11.12.tgz
-rwxr-xr-x 1 wangxiaopei staff 232530699 Mar 2 12:32 spark-2.4.5-bin-hadoop2.7.tgz
-rw-r--r--@ 1 wangxiaopei staff 17953604 Feb 18 18:01 sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz
copy packages to other hosts;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
[root@hadoop001 bigdata]# scp -r software hadoop002:`pwd`
spark-2.4.5-bin-hadoop2.7.tgz 100% 222MB 191.0MB/s 00:01
flink-1.12.7-bin-scala_2.11.tgz 100% 318MB 173.8MB/s 00:01
apache-hive-3.1.2-bin.tar.gz 100% 266MB 122.6MB/s 00:02
apache-flume-1.9.0-bin.tar.gz 100% 65MB 129.3MB/s 00:00
mysql-connector-java-8.0.11.jar 100% 1989KB 119.7MB/s 00:00
sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz 100% 17MB 28.0MB/s 00:00
apache-zookeeper-3.5.9-bin.tar.gz 100% 9397KB 169.4MB/s 00:00
hadoop-2.7.3.tar.gz 100% 204MB 143.5MB/s 00:01
jdk-8u231-linux-x64.tar.gz 100% 185MB 107.2MB/s 00:01
hbase-1.7.1-bin.tar.gz 100% 113MB 112.7MB/s 00:01
kafka_2.11-2.4.1.tgz 100% 67MB 109.7MB/s 00:00
scala-2.11.12.tgz 100% 28MB 103.5MB/s 00:00
[root@hadoop001 bigdata]# scp -r software hadoop003:`pwd`
spark-2.4.5-bin-hadoop2.7.tgz 100% 222MB 191.0MB/s 00:01
flink-1.12.7-bin-scala_2.11.tgz 100% 318MB 173.8MB/s 00:01
apache-hive-3.1.2-bin.tar.gz 100% 266MB 122.6MB/s 00:02
apache-flume-1.9.0-bin.tar.gz 100% 65MB 129.3MB/s 00:00
mysql-connector-java-8.0.11.jar 100% 1989KB 119.7MB/s 00:00
sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz 100% 17MB 28.0MB/s 00:00
apache-zookeeper-3.5.9-bin.tar.gz 100% 9397KB 169.4MB/s 00:00
hadoop-2.7.3.tar.gz 100% 204MB 143.5MB/s 00:01
jdk-8u231-linux-x64.tar.gz 100% 185MB 107.2MB/s 00:01
hbase-1.7.1-bin.tar.gz 100% 113MB 112.7MB/s 00:01
kafka_2.11-2.4.1.tgz 100% 67MB 109.7MB/s 00:00
scala-2.11.12.tgz 100% 28MB 103.5MB/s 00:00
3. Install
This step will install packages into /opt/module, and add environment variables to /etc/profile;
run “./install.sh 1” on hadoop001;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
[root@hadoop001 bigdata]# ./install.sh 1
....
[root@hadoop001 bigdata]# cd /opt
[root@hadoop001 opt]# ll
total 84
-rwxr-xr-x 1 root root 265 May 2 10:32 clear.sh
-rwxr-xr-x 1 root root 256 May 2 10:32 init-hive.sh
drwxr-xr-x 12 root root 4096 May 2 10:32 module
-rw-r--r-- 1 root root 314 May 2 10:32 start-zookeeper.sh
-rw-r--r-- 1 root root 62306 May 2 12:45 start.log
-rwxr-xr-x 1 root root 1252 May 2 10:32 start.sh
[root@hadoop001 opt]# ll module/
total 40
drwxr-xr-x 7 root root 4096 May 2 10:32 apache-flume-1.9.0-bin
drwxr-xr-x 10 root root 4096 May 2 10:32 apache-hive-3.1.2-bin
drwxr-xr-x 8 root root 4096 May 2 12:07 apache-zookeeper-3.5.9-bin
drwxr-xr-x 11 root root 4096 May 2 12:41 hadoop-2.7.3
drwxr-xr-x 7 root root 4096 May 2 10:32 hbase-1.7.1
drwxr-xr-x 7 10 143 4096 Oct 5 2019 jdk1.8.0_231
drwxr-xr-x 6 root root 4096 Mar 3 2020 kafka_2.11-2.4.1
drwxrwxr-x 6 1001 1001 4096 Nov 10 2017 scala-2.11.12
drwxr-xr-x 13 1000 1000 4096 Feb 3 2020 spark-2.4.5-bin-hadoop2.7
drwxr-xr-x 9 1000 1000 4096 Dec 19 2017 sqoop-1.4.7.bin__hadoop-2.6.0
run “./install.sh 2” on hadoop002;
run “./install.sh 3” on hadoop003;
4. Start Service
let the environment variables go effect. go to /opt, run “./start.sh” on hadoop001 to start all services on three hosts;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
[root@hadoop001 opt]# cat /etc/profile
unset i
unset -f pathmunge
export JAVA_HOME=/opt/module/jdk1.8.0_231
export HADOOP_HOME=/opt/module/hadoop-2.7.3
export HADOOP_COMMON_LIB_NATIVE_DIR=/opt/module/hadoop-2.7.3/lib/native
export HADOOP_OPTS="-Djava.library.path=/opt/module/hadoop-2.7.3/lib"
export ZOOKEEPER_HOME=/opt/module/apache-zookeeper-3.5.9-bin
export HIVE_HOME=/opt/module/apache-hive-3.1.2-bin
export FLUME_HOME=/opt/module/apache-flume-1.9.0-bin
export SQOOP_HOME=/opt/module/sqoop-1.4.7.bin__hadoop-2.6.0
export SCALA_HOME=/opt/module/scala-2.11.12
export SPARK_HOME=/opt/module/spark-2.4.5-bin-hadoop2.7
export HBASE_HOME=/opt/module/hbase-1.7.1
export KAFKA_HOME=/opt/module/kafka_2.11-2.4.1
export PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin:/opt/module/jdk1.8.0_231/bin:/opt/module/hadoop-2.7.3/bin:/opt/module/hadoop-2.7.3/sbin:/opt/module/apache-zookeeper-3.5.9-bin/bin:/opt/module/apache-hive-3.1.2-bin/bin:/opt/module/apache-flume-1.9.0-bin/bin:/opt/module/sqoop-1.4.7.bin__hadoop-2.6.0/bin:/opt/module/scala-2.11.12/bin:/opt/module/spark-2.4.5-bin-hadoop2.7/bin:/opt/module/spark-2.4.5-bin-hadoop2.7/sbin:/opt/module/hbase-1.7.1/bin:/opt/module/kafka_2.11-2.4.1/bin
[root@hadoop001 opt]# source /etc/profile
[root@hadoop001 opt]# ./start.sh
...
[root@hadoop001 opt]# jps
13459 ResourceManager
15078 DataNode
14966 NameNode
14472 NodeManager
15196 DFSZKFailoverController
12988 QuorumPeerMain
20685 Jps
14815 JournalNode
5. Uninstall
go to /opt, and run “clear.sh”; repeat the command in the other hosts.
1
[root@hadoop001 opt]# ./clear.sh