8000 feat: add two utility functions export_parquet and create_view and a vignette on how to materialize data. by nbc · Pull Request #1177 · duckdb/duckdb-r · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

feat: add two utility functions export_parquet and create_view and a vignette on how to materialize data. #1177

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

nbc
Copy link
@nbc nbc commented May 25, 2025

This PR add two utility functions :

  • export_parquet uses COPY TO to export a parquet file from a tbl_lazy
  • create_view creates a view based on a tbl_lazy

The vignette explains how to materialize data using these two functions, as well as with the lesser-known dbplyr::compute() function.

Fixes #207, #630

…vignette

* export_parquet uses COPY TO to export a parquet file from a tbl_lazy
* create_view creates a view based on a tbl_lazy

The vignette explains how to materialize data with those two functions and dbplyr::compute()

Fixes duckdb#207, duckdb#630
@nbc nbc force-pushed the feature/export_and_view branch from b89ba2e to a0fbed6 Compare May 26, 2025 09:00
@krlmlr
Copy link
Collaborator
krlmlr commented Jun 17, 2025

Thanks for the effort. The code looks good, I'm still not ready to assume maintenance for it.

I labeled the issues as "help wanted" before duckplyr 1.1.0. Writing to Parquet works there:

library(tidyverse)
library(duckdb)
#> Loading required package: DBI

con <- dbConnect(duckdb::duckdb())

dbWriteTable(con, "my_tbl", data.frame(a = 1))

dbplyr_tbl <- tbl(con, "my_tbl")

dbplyr_tbl %>%
  duckplyr::as_duckdb_tibble() |>
  duckplyr::compute_parquet("my_tbl.parquet")
#> # A duckplyr data frame: 1 variable
#>       a
#>   <dbl>
#> 1     1

duckplyr_parquet <-
  duckplyr::read_parquet_duckdb("my_tbl.parquet")
duckplyr_parquet
#> # A duckplyr data frame: 1 variable
#>       a
#>   <dbl>
#> 1     1

Created on 2025-06-17 with reprex v2.1.1

As for creating a view, dbplyr or a related package is a better fit, this functionality seems general enough to work across databases.

I'd like to keep the package vignette-free for shorter build times. Otherwise, each R CMD build . will have to install the package.

Happy to see this code thrive elsewhere!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

What is the canonical way to write parquet to disk using duckdb and dbplyr without collecting first?
2 participants
0