Skip to content

[python] Support JDBC catalog#7720

Open
HansiChan wants to merge 3 commits intoapache:masterfrom
HansiChan:codex-pypaimon-jdbc-catalog
Open

[python] Support JDBC catalog#7720
HansiChan wants to merge 3 commits intoapache:masterfrom
HansiChan:codex-pypaimon-jdbc-catalog

Conversation

@HansiChan
Copy link
Copy Markdown
Contributor

Purpose

Support JDBC catalog in PyPaimon. This adds a Python JDBC catalog implementation that uses the same catalog metadata tables as Java Paimon JDBC catalog: paimon_tables, paimon_database_properties, and paimon_table_properties.

The implementation supports SQLite with the Python standard library and dynamically supports MySQL/PostgreSQL when a corresponding Python DB-API driver is installed. Table data and schema files continue to use existing PyPaimon FileIO and SchemaManager behavior.

What changed

  • Register metastore=jdbc in CatalogFactory
  • Add JdbcCatalog and JdbcCatalogLoader
  • Add catalog-key and sync-all-properties catalog options
  • Cover database and table create/list/get/alter/rename/drop behavior with SQLite-backed tests
  • Document JDBC catalog creation in PyPaimon Python API docs

Tests

  • python3 -m py_compile pypaimon/catalog/jdbc_catalog.py pypaimon/catalog/jdbc_catalog_loader.py pypaimon/catalog/catalog_factory.py pypaimon/common/options/config.py pypaimon/tests/jdbc_catalog_test.py
  • PYTHONPATH=/tmp/paimon-python-test-deps POLARS_SKIP_CPU_CHECK=1 python3 -m unittest pypaimon.tests.jdbc_catalog_test pypaimon.tests.filesystem_catalog_test

@tub
Copy link
Copy Markdown
Contributor

tub commented Apr 29, 2026

Nice! I have a similar change locally that uses SQLAlchemy - but this looks great as it adds fewer dependencies.
Is it worth calling it something other than JDBC? It may be confusing to folks who think it uses the JVM underneath for the database connections.

@HansiChan
Copy link
Copy Markdown
Contributor Author

Nice! I have a similar change locally that uses SQLAlchemy - but this looks great as it adds fewer dependencies. Is it worth calling it something other than JDBC? It may be confusing to folks who think it uses the JVM underneath for the database connections.

Good point. I kept the public catalog type as jdbc because it matches Paimon's existing JDBC catalog configuration and lets users reuse the same metastore=jdbc / jdbc: URI options across engines.

To avoid implying that PyPaimon uses JVM JDBC drivers, I updated the implementation and docs to clarify that PyPaimon uses native Python DB-API drivers under the hood. I also renamed the internal connection helper to _DbApiConnection and adjusted the driver error messages accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants