Friday, April 21, 2023

DuckDB ASOF JOIN

Here's a quick example of duckdb's new ASOF JOIN.

Problem: we have a time-based price table; if you have a sale time that falls in the gaps, you get NULL price and total.

Solution: "ASOF JOIN" picks a good value for "in the gap" sales.


Create a price table and sales table.


create table prices as (

    SELECT '2001-01-01 00:16:00'::TIMESTAMP + INTERVAL (v) MINUTE AS ticker_time,

        v as unit_price

    FROM range(0,5) vals(v)

);


create table sales(item text, sale_time timestamp, quantity int);

insert into sales values('a', '2001-01-01 00:18:00', 10);

insert into sales values('b', '2001-01-01 00:18:30', 20);

insert into sales values('c', '2001-01-01 00:19:00', 30);


select * from prices;

┌─────────────────────┬────────────┐

│     ticker_time     │ unit_price │

│      timestamp      │   int64    │

├─────────────────────┼────────────┤

│ 2001-01-01 00:16:00 │          0 │

│ 2001-01-01 00:17:00 │          1 │

│ 2001-01-01 00:18:00 │          2 │ No unit_price for 18:30!

│ 2001-01-01 00:19:00 │          3 │

│ 2001-01-01 00:20:00 │          4 │

└─────────────────────┴────────────┘

select * from sales;

┌─────────┬─────────────────────┬──────────┐

│  item   │      sale_time      │ quantity │

│ varchar │      timestamp      │  int32   │

├─────────┼─────────────────────┼──────────┤

│ a       │ 2001-01-01 00:18:00 │       10 │

│ b       │ 2001-01-01 00:18:30 │       20 │

│ c       │ 2001-01-01 00:19:00 │       30 │

└─────────┴─────────────────────┴──────────┘

Normal Left Join, problem for the 18:30 sale!


-- no price value for 18:30, so item b's unit_price and total are NULL!


select s.*, p.unit_price, s.quantity * p.unit_price as total

 from sales s left join prices p

   on s.sale_time = p.ticker_time;


┌─────────┬─────────────────────┬──────────┬────────────┬───────┐

│  item   │      sale_time      │ quantity │ unit_price │ total │

│ varchar │      timestamp      │  int32   │   int64    │ int64 │

├─────────┼─────────────────────┼──────────┼────────────┼───────┤

│ a       │ 2001-01-01 00:18:00 │       10 │          2 │    20 │

│ c       │ 2001-01-01 00:19:00 │       30 │          3 │    90 │

│ b       │ 2001-01-01 00:18:30 │       20 │       NULL │  NULL │

└─────────┴─────────────────────┴──────────┴────────────┴───────┘

ASOF Join, picks a good price for the 18:30 sale!


-- using asof, 18:30 "rounds down" to use the 18:00 unit_price


select s.*, p.unit_price, s.quantity * p.unit_price as total_cost

  from sales s asof left join prices p

    on s.sale_time >= p.ticker_time;


┌─────────┬─────────────────────┬──────────┬────────────┬────────────┐

│  item   │      sale_time      │ quantity │ unit_price │ total_cost │

│ varchar │      timestamp      │  int32   │   int64    │   int64    │

├─────────┼─────────────────────┼──────────┼────────────┼────────────┤

│ a       │ 2001-01-01 00:18:00 │       10 │          2 │         20 │

│ b       │ 2001-01-01 00:18:30 │       20 │          2 │         40 │

│ c       │ 2001-01-01 00:19:00 │       30 │          3 │         90 │

└─────────┴─────────────────────┴──────────┴────────────┴────────────┘


No comments:

Post a Comment

Setting up DuckDB/Rust

Here's what I did to get DuckDB/rust going on my system. Prerequisites We need two things from DuckDB: header files (so we can build duc...