Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a shared libduckdb rather than embed it #7

Open
Vonng opened this issue Nov 4, 2024 · 1 comment
Open

Use a shared libduckdb rather than embed it #7

Vonng opened this issue Nov 4, 2024 · 1 comment
Labels
enhancement New feature or request
Milestone

Comments

@Vonng
Copy link

Vonng commented Nov 4, 2024

There's a significant challenges with the current approach of embedding libduckdb directly.

  1. Compilation Time and Package Size: Embedding libduckdb requires a substantial amount of compilation time and results in a dramatic increase in the size of the package. This problem is exacerbated when considering combinition for 3 PostgreSQL major version and 5 OS distribution.

  2. Conflict with pg_duckdb: The method of embedding libduckdb conflicts with pg_duckdb, forcing users to choose between one or the other. This restriction adds unnecessary adoption difficulties.

To address these concerns, I propose the adoption of a shared libduckdb, similar to the approach used in duckdb_fdwavailable at this Commit.

The DuckDB official release provides a binary libduckdb.so, and I have already created RPM/DEB packages for EL8/EL9, Ubuntu 22.04/24.04, and Debian 12, which are readily available at [ext.pigsty.io/#/](https://ext.pigsty.io).

Adopting a shared libduckdb would mitigate the issues related to compilation times, package sizes, and software conflicts, ultimately simplifying maintenance and user choice.

@dpxcc
Copy link
Contributor

dpxcc commented Nov 4, 2024

Thanks for looking into the build process and for distributing pg_mooncake with Pigsty! We appreciate the suggestion and have indeed considered using a shared libduckdb.so, but a few challenges come up with this approach:

  1. Dependency on DuckDB's Internal API: Unlike duckdb_fdw, pg_mooncake depends on DuckDB's internal C++ API, which isn't guaranteed to be stable and can change between DuckDB releases. This means pg_mooncake may not be compatible with arbitrary versions of libduckdb.so, potentially causing compatibility issues if users install different versions of DuckDB.

  2. Planned Use of Non-Builtin DuckDB Extensions: We're looking to leverage non-builtin DuckDB extensions like delta and iceberg to support reading external tables from third-party catalogs. This will require a way to ensure these extensions are consistently available in every installation, which may be harder to manage with a shared libduckdb.so.

We appreciate the work you've done on RPM/DEB packages and are interested in collaborating in improving the build and distribution process while ensuring compatibility and stability.

@dpxcc dpxcc added the enhancement New feature or request label Nov 13, 2024
@dpxcc dpxcc added this to the long term milestone Nov 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants