Wednesday, May 24, 2023

Query Diffing

Given tables or query results A and B, we want to identify:

  • inserted rows:  The key appears in B, but not in A.
  • deleted rows:  The key appears in A, but not in B.
  • modified: The key appears in both tables, but with different values.
  • unmodified:  the key appears in both tables, and the values are the same.

For example, given the data below, identify:

  • a is modified, 1 --> 9
  • b is deleted
  • c is unmodified
  • d is inserted

CREATE TABLE A AS

          SELECT 'a' AS key, 1 AS val

    UNION SELECT 'b', 2

    UNION SELECT 'c', 3;


CREATE TABLE b AS 

          SELECT 'a' AS key, 9 AS val

    UNION SELECT 'c', 3

    UNION SELECT 'd', 4;


SELECT * FROM a ORDER BY KEY;

a,1

b,2

c,3


SELECT * FROM b ORDER BY KEY;

a,9

c,3

d,4

These queries will perform the specified operations.

Modified Rows

select b.key, b.val from a left join b

    on a.key = b.key where a.val <> b.val;

a,9

Deleted Rows

select key, val from a where key not in (select key from b);

b,2

Unmodified Rows

select key, val from a intersect select key, val from b;

c,3

 Inserted Rows

select key, val from b where key not in (select key from a);

d,4

Note that if you have multiple columns in your key, you can specify them as (`key1, key2, ...), etc.

For example, if the tables had two key columns the Inserted query would be

select key1, key2, val from b

    where (key1, key2) not in (select key1, key2 from a);

No comments:

Post a Comment

Duck-GPT -- Adding DuckDB Documents to an LLM

tl;dr: An LLM that is up to date on DuckDB! tl;dr2: My Hello World project for integrating new data into an existing language model. Chat GP...