Hive Windowing and Analytics Function - Part 1
Hive provides powerful window functions on current frame.
In this chapter, I will demostrate how these functions work:
- first_value()
 - last_value()
 - sum()
 - count()
 
For example, we have orders table,
1
2
3
4
5
6
7
> describe orders;
col_name	        data_type
order_id            	bigint
user_id             	int
order_amount        	decimal(16,4)
create_time         	string
pt                  	string
and I want to get these order info for each user with sql:
- First and last order info(create_time/amount)
 - Minimum and Maximum amount of order and create_time
 - Total order amount and count
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
select distinct(user_id),
--     first order
    first_value(create_time)  over(partition by user_id order by create_time) as first_order_time,
    first_value(order_amount) over(partition by user_id order by create_time) as first_order_amount,
--     last order
    last_value(create_time)  over(partition by user_id order by create_time) as last_order_time,
    last_value(order_amount) over(partition by user_id order by create_time) as last_order_amount,
--     min amount
    first_value(create_time)  over(partition by user_id order by order_amount ) as min_order_time,
    first_value(order_amount) over(partition by user_id order by order_amount ) as min_order_amount,
--     max amount
    first_value(create_time)  over(partition by user_id order by order_amount desc) as max_order_time,
    first_value(order_amount) over(partition by user_id order by order_amount desc) as max_order_amount,
--     total amount/count
    sum(order_amount) over(partition by user_id) as total_order_amount,
    count() over(partition by user_id) as total_order_count
    from orders;