Sunday, August 20, 2023

Setting up DuckDB/Rust

Here's what I did to get DuckDB/rust going on my system.

Prerequisites

We need two things from DuckDB: header files (so we can build duckdb-rs) and the library (so duckdb-rs can call the DuckDB functions).  We will get these by building DuckDB.  At the time of writing , version 0.8.1 is the DuckDB version supported by duckdb-rs.

We will clone the DuckDB repo, check out version 0.8.1, and build it.

mkdir $HOME/duck81

cd $HOME/duck81

git clone https://github.com/duckdb/duckdb.git

cd duckdb

git checkout tags/v0.8.1

make

Next, we need a couple of environment variables.  These are used when crate builds the duckdb-rs crate.

export DUCKDB_LIB_DIR=$HOME/duck81/duckdb/build/release/src

export DUCKDB_INCLUDE_DIR=$HOME/duck81/duckdb/src/include


Finally, we need to tell the OS where the dynamic libraries are located.

# macos

export DYLD_LIBRARY_PATH=$DYLD_LIBRARY_PATH:$DUCKDB_LIB_DIR

# linux

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$DUCKDB_LIB_DIR

This setup is fine for development but not for production use.  If you are shipping rust in production, you know how to package shared libraries, etc.  There are some packaging notes on the duckdb-rs github page.

Cargo.toml

Add this line to your dependencies section.

[dependencies]


duckdb = "0.8.1"

Sample Code

Sample Code

Here's a lightly modified version of the example code on the duckdb-rs github page.

use duckdb::{params, Connection, Result};


use duckdb::arrow::record_batch::RecordBatch;

use duckdb::arrow::util::pretty::print_batches;


#[derive(Debug)]

struct Person {

    id: i32,

    name: String,

    data: Option<Vec<u8>>,

}


fn main() -> Result<()> {

    let conn = Connection::open_in_memory()?;


    conn.execute_batch(r"

        CREATE SEQUENCE seq;

        CREATE TABLE person (

          id    INTEGER PRIMARY KEY DEFAULT NEXTVAL('seq'),

          name  TEXT NOT NULL,

          data  BLOB

                );

        ")?;


    conn.execute("INSERT INTO person (name, data) VALUES (?, ?)",

                 params!["able", "aaa"],)?;

    conn.execute("INSERT INTO person (name, data) VALUES (?, ?)",

                 params!["baker", "bbb"],)?;

    conn.execute("INSERT INTO person (name, data) VALUES (?, ?)",

                 params!["charlie", "ccc"],)?;

    conn.execute("INSERT INTO person (name, data) VALUES (?, ?)",

                 params!["dog", "ddd"],)?;


    println!("query table by rows");

    let mut stmt = conn.prepare("SELECT id, name, data FROM person")?;

    let person_iter = stmt.query_map([], |row| {

        Ok(Person {

            id: row.get(0)?,

            name: row.get(1)?,

            data: row.get(2)?,

        })

    })?;


    for person in person_iter {

        println!("Found person {:?}", person.unwrap());

    }


    println!("query table by arrow");

    let rbs: Vec<RecordBatch> = stmt.query_arrow([])?.collect();

    print_batches(&rbs).unwrap();

    Ok(())

}

Execution

We can ignore the (incorrect) warning about the People fields never being read.

% cargo run


   Compiling duck1 v0.1.0 (/Users/neologic/biblio/anachron/duck1)

warning: fields `id`, `name`, and `data` are never read

 --> src/main.rs:7:5

  |

6 | struct Person {

  |        ------ fields in this struct

7 |     id: i32,

  |     ^^

8 |     name: String,

  |     ^^^^

9 |     data: Option<Vec<u8>>,

  |     ^^^^

  |

  = note: `Person` has a derived impl for the trait `Debug`, but this is intentionally ignored during dead code analysis

  = note: `#[warn(dead_code)]` on by default


warning: `duck1` (bin "duck1") generated 1 warning

    Finished dev [unoptimized + debuginfo] target(s) in 1.24s

     Running `target/debug/duck1`


query table by rows

Found person Person { id: 1, name: "able", data: Some([97, 97, 97]) }

Found person Person { id: 2, name: "baker", data: Some([98, 98, 98]) }

Found person Person { id: 3, name: "charlie", data: Some([99, 99, 99]) }

Found person Person { id: 4, name: "dog", data: Some([100, 100, 100]) }


query table by arrow

+----+---------+--------+

| id | name    | data   |

+----+---------+--------+

| 1  | able    | 616161 |

| 2  | baker   | 626262 |

| 3  | charlie | 636363 |

| 4  | dog     | 646464 |

+----+---------+--------+


Duck-GPT -- Adding DuckDB Documents to an LLM

tl;dr: An LLM that is up to date on DuckDB! tl;dr2: My Hello World project for integrating new data into an existing language model. Chat GP...