Here's what I did to get DuckDB/rust going on my system.
Prerequisites
We need two things from DuckDB: header files (so we can build duckdb-rs) and the library (so duckdb-rs can call the DuckDB functions). We will get these by building DuckDB. At the time of writing , version 0.8.1 is the DuckDB version supported by duckdb-rs.
We will clone the DuckDB repo, check out version 0.8.1, and build it.
mkdir $HOME/duck81
cd $HOME/duck81
git clone https://github.com/duckdb/duckdb.git
cd duckdb
git checkout tags/v0.8.1
make
Next, we need a couple of environment variables. These are used when crate builds the duckdb-rs crate.
export DUCKDB_LIB_DIR=$HOME/duck81/duckdb/build/release/src
export DUCKDB_INCLUDE_DIR=$HOME/duck81/duckdb/src/include
Finally, we need to tell the OS where the dynamic libraries are located.
# macos
export DYLD_LIBRARY_PATH=$DYLD_LIBRARY_PATH:$DUCKDB_LIB_DIR
# linux
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$DUCKDB_LIB_DIR
This setup is fine for development but not for production use. If you are shipping rust in production, you know how to package shared libraries, etc. There are some packaging notes on the duckdb-rs github page.
Cargo.toml
Add this line to your dependencies section.
[dependencies]
duckdb = "0.8.1"
Sample Code
Sample Code
Here's a lightly modified version of the example code on the duckdb-rs github page.
use duckdb::{params, Connection, Result};
use duckdb::arrow::record_batch::RecordBatch;
use duckdb::arrow::util::pretty::print_batches;
#[derive(Debug)]
struct Person {
id: i32,
name: String,
data: Option<Vec<u8>>,
}
fn main() -> Result<()> {
let conn = Connection::open_in_memory()?;
conn.execute_batch(r"
CREATE SEQUENCE seq;
CREATE TABLE person (
id INTEGER PRIMARY KEY DEFAULT NEXTVAL('seq'),
name TEXT NOT NULL,
data BLOB
);
")?;
conn.execute("INSERT INTO person (name, data) VALUES (?, ?)",
params!["able", "aaa"],)?;
conn.execute("INSERT INTO person (name, data) VALUES (?, ?)",
params!["baker", "bbb"],)?;
conn.execute("INSERT INTO person (name, data) VALUES (?, ?)",
params!["charlie", "ccc"],)?;
conn.execute("INSERT INTO person (name, data) VALUES (?, ?)",
params!["dog", "ddd"],)?;
println!("query table by rows");
let mut stmt = conn.prepare("SELECT id, name, data FROM person")?;
let person_iter = stmt.query_map([], |row| {
Ok(Person {
id: row.get(0)?,
name: row.get(1)?,
data: row.get(2)?,
})
})?;
for person in person_iter {
println!("Found person {:?}", person.unwrap());
}
println!("query table by arrow");
let rbs: Vec<RecordBatch> = stmt.query_arrow([])?.collect();
print_batches(&rbs).unwrap();
Ok(())
}
Execution
We can ignore the (incorrect) warning about the People fields never being read.
% cargo run
Compiling duck1 v0.1.0 (/Users/neologic/biblio/anachron/duck1)
warning: fields `id`, `name`, and `data` are never read
--> src/main.rs:7:5
|
6 | struct Person {
| ------ fields in this struct
7 | id: i32,
| ^^
8 | name: String,
| ^^^^
9 | data: Option<Vec<u8>>,
| ^^^^
|
= note: `Person` has a derived impl for the trait `Debug`, but this is intentionally ignored during dead code analysis
= note: `#[warn(dead_code)]` on by default
warning: `duck1` (bin "duck1") generated 1 warning
Finished dev [unoptimized + debuginfo] target(s) in 1.24s
Running `target/debug/duck1`
query table by rows
Found person Person { id: 1, name: "able", data: Some([97, 97, 97]) }
Found person Person { id: 2, name: "baker", data: Some([98, 98, 98]) }
Found person Person { id: 3, name: "charlie", data: Some([99, 99, 99]) }
Found person Person { id: 4, name: "dog", data: Some([100, 100, 100]) }
query table by arrow
+----+---------+--------+
| id | name | data |
+----+---------+--------+
| 1 | able | 616161 |
| 2 | baker | 626262 |
| 3 | charlie | 636363 |
| 4 | dog | 646464 |
+----+---------+--------+