Wednesday, May 24, 2023

Query Diffing

Given tables or query results A and B, we want to identify:

  • inserted rows:  The key appears in B, but not in A.
  • deleted rows:  The key appears in A, but not in B.
  • modified: The key appears in both tables, but with different values.
  • unmodified:  the key appears in both tables, and the values are the same.

For example, given the data below, identify:

  • a is modified, 1 --> 9
  • b is deleted
  • c is unmodified
  • d is inserted

CREATE TABLE A AS

          SELECT 'a' AS key, 1 AS val

    UNION SELECT 'b', 2

    UNION SELECT 'c', 3;


CREATE TABLE b AS 

          SELECT 'a' AS key, 9 AS val

    UNION SELECT 'c', 3

    UNION SELECT 'd', 4;


SELECT * FROM a ORDER BY KEY;

a,1

b,2

c,3


SELECT * FROM b ORDER BY KEY;

a,9

c,3

d,4

These queries will perform the specified operations.

Modified Rows

select b.key, b.val from a left join b

    on a.key = b.key where a.val <> b.val;

a,9

Deleted Rows

select key, val from a where key not in (select key from b);

b,2

Unmodified Rows

select key, val from a intersect select key, val from b;

c,3

 Inserted Rows

select key, val from b where key not in (select key from a);

d,4

Note that if you have multiple columns in your key, you can specify them as (`key1, key2, ...), etc.

For example, if the tables had two key columns the Inserted query would be

select key1, key2, val from b

    where (key1, key2) not in (select key1, key2 from a);

Friday, May 19, 2023

DuckDB Date Sampler



WITH generate_date AS (

  SELECT CAST(RANGE AS DATE) AS date_key  

    FROM RANGE(DATE '2009-01-01', DATE '2013-12-31', INTERVAL 1 DAY)

)

SELECT date_key AS date_key,

  DAYOFYEAR(date_key) AS day_of_year, 

  YEARWEEK(date_key) AS week_key,

  WEEKOFYEAR(date_key) AS week_of_year,

  DAYOFWEEK(date_key) AS day_of_week,

  ISODOW(date_key) AS iso_day_of_week,

  DAYNAME(date_key) AS day_name,

  DATE_TRUNC('week', date_key) AS first_day_of_week,

  DATE_TRUNC('week', date_key) + 6 AS last_day_of_week,

  YEAR(date_key) || RIGHT('0' || MONTH(date_key), 2) AS month_key,

  MONTH(date_key) AS month_of_year,

  DAYOFMONTH(date_key) AS day_of_month,

  LEFT(MONTHNAME(date_key), 3) AS month_name_short,

  MONTHNAME(date_key) AS month_name,

  DATE_TRUNC('month', date_key) AS first_day_of_month,

  LAST_DAY(date_key) AS last_day_of_month,

  CAST(YEAR(date_key) || QUARTER(date_key) AS INT) AS quarter_key,

  QUARTER(date_key) AS quarter_of_year,

  CAST(date_key - DATE_TRUNC('Quarter', date_key) + 1 AS INT) AS day_of_quarter,

  ('Q' || QUARTER(date_key)) AS quarter_desc_short,

  ('Quarter ' || QUARTER(date_key)) AS quarter_desc,

  DATE_TRUNC('quarter', date_key) AS first_day_of_quarter,

  LAST_DAY(DATE_TRUNC('quarter', date_key) + INTERVAL 2 MONTH) as last_day_of_quarter,

  CAST(YEAR(date_key) AS INT) AS year_key,

  DATE_TRUNC('Year', date_key) AS first_day_of_year,

  DATE_TRUNC('Year', date_key) - 1 + INTERVAL 1 YEAR AS last_day_of_year,

  ROW_NUMBER() OVER 

   (PARTITION BY YEAR(date_key), MONTH(date_key), DAYOFWEEK(date_key) ORDER BY date_key) 

   AS ordinal_weekday_of_month

FROM generate_date;


(credit)

Setting up DuckDB/Rust

Here's what I did to get DuckDB/rust going on my system. Prerequisites We need two things from DuckDB: header files (so we can build duc...