Azkaban is a distributed Workflow Manager, usually used to solve the problem of hadoop job dependencies.
1. Install
Download the latest azkaban(4.0 now) from https://github.com/azkaban/azkaban/releases. extract the tar ball, enter the source folder, run ‘./gradlew build’.
I got an error:
Change the repositories url to ‘https://linkedin.jfrog.io/artifactory/open-source’ in build.gradle:
1
2
3
4
5
6
7
8
repositories {
mavenCentral()
mavenLocal()
// need this for rest.li/pegasus 28.* artifacts until they are in Maven Central:
maven {
url 'https://linkedin.bintray.com/maven'
}
}
Then, build again, and run ‘./gradlew installDist’.
Finally, run the following commands to start solo server:
1
2
Downloads $ cd azkaban-4.0.0/azkaban-solo-server/build/install/azkaban-solo-server/
azkaban-solo-server $ ./bin/start-solo.sh
Open azkaban service in http://localhost:8081, login with initial username/password azkaban/azkaban, which you can edit config file.
1
2
3
4
5
6
7
8
azkaban-solo-server $ cat conf/azkaban-users.xml
<azkaban-users>
<user groups="azkaban" password="azkaban" roles="admin" username="azkaban"/>
<user password="metrics" roles="metrics" username="metrics"/>
<role name="admin" permissions="ADMIN"/>
<role name="metrics" permissions="METRICS"/>
</azkaban-users>
2. Create a project
It’s very easy to create a project in azkaban web UI. In [Projects] tab, click [Create Project], enter name and description and [Create Project] button to finish the process.
3. Upload the workflow
Create a new folder, you can name it with the project name.
Create a file named flow1.project, in which enter version info:
1
azkaban-flow-version: 2.0
Create a file named job.flow, in which put the following info:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
nodes:
- name: jobA
type: command
config:
command: mkdir /opt/az3
- name: jobB
type: command
dependsOn:
- jobA
config:
command: touch /opt/az3/jobb
- name: jobC
type: command
dependsOn:
- jobA
config:
command: touch /opt/az3/jobc
- name: jobD
type: command
dependsOn:
- jobB
- jobC
config:
command: touch /opt/az3/jobD
This file defined a workflow with 4 jobs, jobA/jobB/jobC/jobD.
started with jobA, the following 2 jobs jobB/jobC dependent on jobA, and the last job(jobD) dependent on the jobB/jobC.
Compress the two files into a zip.
1
az $ zip flow.zip job.flow flow1.project
Then, on the project page, click [Upload] button on the right top, select the zip file and upload.
Click [Execute Flow] on the panel,
You can see the workflow graph, click [Execute]
Green nodes mean running success, and red nodes mean fails.