Hive Windowing and Analytics Function - Part 1

Posted by Bourne's Blog - A Full-stack & Web3 Developer on April 21, 2022

Hive Windowing and Analytics Function - Part 1

Hive provides powerful window functions on current frame.

In this chapter, I will demostrate how these functions work:

  • first_value()
  • last_value()
  • sum()
  • count()

For example, we have orders table,

1
2
3
4
5
6
7
> describe orders;
col_name	        data_type
order_id            	bigint
user_id             	int
order_amount        	decimal(16,4)
create_time         	string
pt                  	string

and I want to get these order info for each user with sql:

  • First and last order info(create_time/amount)
  • Minimum and Maximum amount of order and create_time
  • Total order amount and count
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
select distinct(user_id),
--     first order
    first_value(create_time)  over(partition by user_id order by create_time) as first_order_time,
    first_value(order_amount) over(partition by user_id order by create_time) as first_order_amount,
--     last order
    last_value(create_time)  over(partition by user_id order by create_time) as last_order_time,
    last_value(order_amount) over(partition by user_id order by create_time) as last_order_amount,
--     min amount
    first_value(create_time)  over(partition by user_id order by order_amount ) as min_order_time,
    first_value(order_amount) over(partition by user_id order by order_amount ) as min_order_amount,
--     max amount
    first_value(create_time)  over(partition by user_id order by order_amount desc) as max_order_time,
    first_value(order_amount) over(partition by user_id order by order_amount desc) as max_order_amount,
--     total amount/count
    sum(order_amount) over(partition by user_id) as total_order_amount,
    count() over(partition by user_id) as total_order_count
    from orders;