Skip to content

Add DCO to CONTRIBUTING.md#13

Merged
susodapop merged 29 commits into
mainfrom
add-cla
Aug 1, 2022
Merged

Add DCO to CONTRIBUTING.md#13
susodapop merged 29 commits into
mainfrom
add-cla

Conversation

@susodapop
Copy link
Copy Markdown
Contributor

@susodapop susodapop commented Jul 1, 2022

Introduces Databricks standard DCO to the CONTRIBUTING guide.

Also adds a GitHub Action step which checks each commit in the pull request for a DCO sign-off and leaves a comment if this check fails.

One challenging aspect of this approach is that our current DCO check step doesn't have any outputs hooks that let us access its list of which commits failed the DCO check. I can write a custom action step that forks the existing behaviour and sets those outputs as a follow-up if it becomes necessary.

For now, any member of this org can see exactly which commits failed the DCO check by examining the PR Check results. These aren't visible to outside contributors, though, which is not as convenient.

Example comment:

CleanShot 2022-07-29 at 16 03 38@2x

Copy link
Copy Markdown
Contributor

@yunbodeng-db yunbodeng-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Signed-off-by: Jesse Whitehouse <jesse@whitehouse.dev>
Signed-off-by: Jesse Whitehouse <jesse@whitehouse.dev>
Signed-off-by: Jesse Whitehouse <jesse@whitehouse.dev>
Signed-off-by: Jesse Whitehouse <jesse@whitehouse.dev>
Signed-off-by: Jesse Whitehouse <jesse@whitehouse.dev>
Signed-off-by: Jesse Whitehouse <jesse@whitehouse.dev>
Signed-off-by: Jesse Whitehouse <jesse@whitehouse.dev>
The check itself comes from the DCO Check GitHub app. These apps are
tied directly to GitHub which is why the `app_id` is a known constant.
It's the same id regardless of the repository it monitors.

These were forked from MLFlow's public repository:

https: //github.com/mlflow/mlflow/blob/master/.github/workflows/notify-dco-failure.js
Signed-off-by: Jesse Whitehouse <jesse@whitehouse.dev>
This reverts commit 16b438c.

We're going to use a GitHub Action for DCO checks instead of a GH App

Signed-off-by: Jesse Whitehouse <jesse@whitehouse.dev>
Signed-off-by: Jesse Whitehouse <jesse@whitehouse.dev>
@github-actions
Copy link
Copy Markdown

Array

susodapop added 11 commits July 29, 2022 15:45
Intentionally not signing this commit to make sure the automated DCO check behaves as expected.

Signed-off-by: Jesse Whitehouse <jesse@whitehouse.dev>
Signed-off-by: Jesse Whitehouse <jesse@whitehouse.dev>
Signed-off-by: Jesse Whitehouse <jesse@whitehouse.dev>
Signed-off-by: Jesse Whitehouse <jesse@whitehouse.dev>
Signed-off-by: Jesse Whitehouse <jesse@whitehouse.dev>
This reverts commit 0a00449.

Signed-off-by: Jesse Whitehouse <jesse@whitehouse.dev>
Signed-off-by: Jesse Whitehouse <jesse@whitehouse.dev>
Signed-off-by: Jesse Whitehouse <jesse@whitehouse.dev>
Signed-off-by: Jesse Whitehouse <jesse@whitehouse.dev>
Signed-off-by: Jesse Whitehouse <jesse@whitehouse.dev>
Signed-off-by: Jesse Whitehouse <jesse@whitehouse.dev>
Signed-off-by: Jesse Whitehouse <jesse@whitehouse.dev>
Signed-off-by: Jesse Whitehouse <jesse@whitehouse.dev>
Signed-off-by: Jesse Whitehouse <jesse@whitehouse.dev>
Signed-off-by: Jesse Whitehouse <jesse@whitehouse.dev>
Signed-off-by: Jesse Whitehouse <jesse@whitehouse.dev>
@github-actions
Copy link
Copy Markdown

Thanks for your contribution! To satisfy the DCO policy in our contributing guide
every commit message must include a sign-off. You can retroactively apply this sign-off with an interactive rebase.

@github-actions
Copy link
Copy Markdown

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off. One or more of your commits is missing this sign-off message. You can retroactively apply this sign-off with an interactive rebase.

@github-actions
Copy link
Copy Markdown

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

Signed-off-by: Jesse Whitehouse <jesse@whitehouse.dev>
Signed-off-by: Jesse Whitehouse <jesse@whitehouse.dev>
Signed-off-by: Jesse Whitehouse <jesse@whitehouse.dev>
@yunbodeng-db
Copy link
Copy Markdown
Contributor

We need to do this to all other connectors as well?

@susodapop susodapop merged commit 2961524 into main Aug 1, 2022
@susodapop susodapop deleted the add-cla branch August 1, 2022 16:14
@susodapop
Copy link
Copy Markdown
Contributor Author

Yes I've created tracking tickets internally to implement this for NodeJS and Go.

moderakh pushed a commit to moderakh/databricks-sql-python that referenced this pull request Aug 24, 2022
Includes a GitHub Action which checks for a valid sign-off on every proposed commit

Signed-off-by: Moe Derakhshani <moe.derakhshani@databricks.com>
saishreeeee pushed a commit that referenced this pull request Jun 4, 2025
Includes a GitHub Action which checks for a valid sign-off on every proposed commit
saishreeeee pushed a commit that referenced this pull request Jun 4, 2025
Includes a GitHub Action which checks for a valid sign-off on every proposed commit

Signed-off-by: Sai Shree Pradhan <saishree.pradhan@databricks.com>
vikrantpuppala added a commit that referenced this pull request May 18, 2026
Phase 2 of the PySQL × kernel integration plan
(databricks-sql-kernel/docs/designs/pysql-kernel-integration.md).
Wires `use_sea=True` to a new `backend/kernel/` module that
delegates to the Rust kernel via the `databricks_sql_kernel` PyO3
extension (kernel PR #13).

New module: `src/databricks/sql/backend/kernel/`

- `client.py` — `KernelDatabricksClient(DatabricksClient)`. Lazy-
  imports `databricks_sql_kernel` so a connector install without
  the kernel wheel doesn't `ImportError` at startup; only
  `use_sea=True` surfaces the missing-extra message. Implements
  open/close_session, sync + async execute_command (async_op=True
  goes through `Statement.submit()` and stashes the handle in a
  dict keyed on `CommandId`), cancel/close_command,
  get_query_state, get_execution_result, and the metadata calls
  (catalogs / schemas / tables / columns) via
  `Session.metadata().list_*`. Real server-issued session and
  statement IDs flow through (no synthetic UUIDs).
- `auth_bridge.py` — translate the connector's `AuthProvider`
  into kernel `Session` kwargs. PAT (including federation-wrapped
  PAT — `get_python_sql_connector_auth_provider` always wraps the
  base in `TokenFederationProvider`, so a naive isinstance check
  never matches) routes through `auth_type="pat"`. Everything
  else routes through `auth_type="external"` with a callback that
  delegates to `auth_provider.add_headers({})`. (External today
  is rejected by the kernel at `build_auth_provider`; the
  separate kernel-side enablement PR will flip it on.)
- `result_set.py` — `KernelResultSet(ResultSet)`. Duck-typed over
  `databricks_sql_kernel.ExecutedStatement` (sync execute) and
  `ResultStream` (metadata + async await_result) since both
  expose `arrow_schema()` / `fetch_next_batch()` /
  `fetch_all_arrow()` / `close()`. Same FIFO batch buffer the
  prior ADBC POC used, so `fetchmany(n)` for n smaller than the
  kernel's natural batch size doesn't re-fetch.
- `type_mapping.py` — Arrow → PEP 249 description-string mapper.
  Lifted from the prior ADBC POC; centralised here so future
  kernel-result wrappers reuse the same mapping.

Kernel errors → PEP 249 exceptions: `KernelError.code` is mapped
in a single table to `ProgrammingError` / `OperationalError` /
`DatabaseError`. The structured fields (`sql_state`,
`error_code`, `query_id`, …) are copied onto the re-raised
exception so callers can branch on them without reaching through
`__cause__`.

Routing: `Session._create_backend` flips the `use_sea=True` branch
to instantiate `KernelDatabricksClient` instead of the native
`SeaDatabricksClient`. The native `backend/sea/` module is left
in place (no users on `use_sea=True` after this PR; its long-
term fate is out of scope here).

Packaging: `[tool.poetry.extras] kernel = ["databricks-sql-kernel"]`.
`pip install 'databricks-sql-connector[kernel]'` pulls in the
kernel wheel; `use_sea=True` without the extra raises a pointed
ImportError telling the user how to install it.

Known gaps (acknowledged, will be follow-ups):
- Parameter binding (`execute_command(parameters=[...])`) raises
  NotSupportedError — PyO3 `Statement.bind_param` lands in a
  follow-up.
- Statement-level `query_tags` raises NotSupportedError.
- `get_tables(table_types=[...])` returns unfiltered rows (the
  native SEA backend's filter is keyed on `SeaResultSet`; needs
  a small port to operate on `KernelResultSet`).
- External-auth end-to-end blocked on the kernel-side
  `AuthConfig::External` enablement PR.
- Volume PUT/GET (staging operations): kernel has no Volume API.

Test plan:
- Unit: 37 new tests across
  `tests/unit/test_kernel_auth_bridge.py` (auth provider →
  kwargs mapping, including federation-wrapped PAT and the
  External trampoline call-counter check),
  `tests/unit/test_kernel_type_mapping.py` (Arrow type mapping
  + description shape), and
  `tests/unit/test_kernel_result_set.py` (buffer semantics,
  fetchmany across batch boundaries, idempotent close, close()
  swallowing handle-close failures). All pass.
- Full unit suite: 600 pre-existing tests still pass; one
  pre-existing failure (`test_useragent_header` — agent
  detection adds `agent/claude-code` in this env) was already
  failing on main, unrelated to this change.
- Live e2e against dogfood with `use_sea=True`: SELECT 1,
  `range(10000)`, `fetchmany` pacing, `fetchall_arrow`, all four
  metadata calls (returned 75 catalogs / 144 schemas in main /
  47 tables in `system.information_schema` / 15 columns),
  `session_configuration={'ANSI_MODE': 'false'}` round-trips,
  bad SQL surfaces as DatabaseError with `code='SqlError'`
  and `sql_state='42P01'` on the exception. All checks pass.

Co-authored-by: Isaac
Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
vikrantpuppala added a commit that referenced this pull request May 18, 2026
…st kernel via PyO3 (#787)

* feat(backend/kernel): route use_sea=True through the Rust kernel

Phase 2 of the PySQL × kernel integration plan
(databricks-sql-kernel/docs/designs/pysql-kernel-integration.md).
Wires `use_sea=True` to a new `backend/kernel/` module that
delegates to the Rust kernel via the `databricks_sql_kernel` PyO3
extension (kernel PR #13).

New module: `src/databricks/sql/backend/kernel/`

- `client.py` — `KernelDatabricksClient(DatabricksClient)`. Lazy-
  imports `databricks_sql_kernel` so a connector install without
  the kernel wheel doesn't `ImportError` at startup; only
  `use_sea=True` surfaces the missing-extra message. Implements
  open/close_session, sync + async execute_command (async_op=True
  goes through `Statement.submit()` and stashes the handle in a
  dict keyed on `CommandId`), cancel/close_command,
  get_query_state, get_execution_result, and the metadata calls
  (catalogs / schemas / tables / columns) via
  `Session.metadata().list_*`. Real server-issued session and
  statement IDs flow through (no synthetic UUIDs).
- `auth_bridge.py` — translate the connector's `AuthProvider`
  into kernel `Session` kwargs. PAT (including federation-wrapped
  PAT — `get_python_sql_connector_auth_provider` always wraps the
  base in `TokenFederationProvider`, so a naive isinstance check
  never matches) routes through `auth_type="pat"`. Everything
  else routes through `auth_type="external"` with a callback that
  delegates to `auth_provider.add_headers({})`. (External today
  is rejected by the kernel at `build_auth_provider`; the
  separate kernel-side enablement PR will flip it on.)
- `result_set.py` — `KernelResultSet(ResultSet)`. Duck-typed over
  `databricks_sql_kernel.ExecutedStatement` (sync execute) and
  `ResultStream` (metadata + async await_result) since both
  expose `arrow_schema()` / `fetch_next_batch()` /
  `fetch_all_arrow()` / `close()`. Same FIFO batch buffer the
  prior ADBC POC used, so `fetchmany(n)` for n smaller than the
  kernel's natural batch size doesn't re-fetch.
- `type_mapping.py` — Arrow → PEP 249 description-string mapper.
  Lifted from the prior ADBC POC; centralised here so future
  kernel-result wrappers reuse the same mapping.

Kernel errors → PEP 249 exceptions: `KernelError.code` is mapped
in a single table to `ProgrammingError` / `OperationalError` /
`DatabaseError`. The structured fields (`sql_state`,
`error_code`, `query_id`, …) are copied onto the re-raised
exception so callers can branch on them without reaching through
`__cause__`.

Routing: `Session._create_backend` flips the `use_sea=True` branch
to instantiate `KernelDatabricksClient` instead of the native
`SeaDatabricksClient`. The native `backend/sea/` module is left
in place (no users on `use_sea=True` after this PR; its long-
term fate is out of scope here).

Packaging: `[tool.poetry.extras] kernel = ["databricks-sql-kernel"]`.
`pip install 'databricks-sql-connector[kernel]'` pulls in the
kernel wheel; `use_sea=True` without the extra raises a pointed
ImportError telling the user how to install it.

Known gaps (acknowledged, will be follow-ups):
- Parameter binding (`execute_command(parameters=[...])`) raises
  NotSupportedError — PyO3 `Statement.bind_param` lands in a
  follow-up.
- Statement-level `query_tags` raises NotSupportedError.
- `get_tables(table_types=[...])` returns unfiltered rows (the
  native SEA backend's filter is keyed on `SeaResultSet`; needs
  a small port to operate on `KernelResultSet`).
- External-auth end-to-end blocked on the kernel-side
  `AuthConfig::External` enablement PR.
- Volume PUT/GET (staging operations): kernel has no Volume API.

Test plan:
- Unit: 37 new tests across
  `tests/unit/test_kernel_auth_bridge.py` (auth provider →
  kwargs mapping, including federation-wrapped PAT and the
  External trampoline call-counter check),
  `tests/unit/test_kernel_type_mapping.py` (Arrow type mapping
  + description shape), and
  `tests/unit/test_kernel_result_set.py` (buffer semantics,
  fetchmany across batch boundaries, idempotent close, close()
  swallowing handle-close failures). All pass.
- Full unit suite: 600 pre-existing tests still pass; one
  pre-existing failure (`test_useragent_header` — agent
  detection adds `agent/claude-code` in this env) was already
  failing on main, unrelated to this change.
- Live e2e against dogfood with `use_sea=True`: SELECT 1,
  `range(10000)`, `fetchmany` pacing, `fetchall_arrow`, all four
  metadata calls (returned 75 catalogs / 144 schemas in main /
  47 tables in `system.information_schema` / 15 columns),
  `session_configuration={'ANSI_MODE': 'false'}` round-trips,
  bad SQL surfaces as DatabaseError with `code='SqlError'`
  and `sql_state='42P01'` on the exception. All checks pass.

Co-authored-by: Isaac
Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>

* refactor(backend/kernel): PAT-only auth, drop External trampoline

The earlier auth_bridge routed OAuth/MSAL/federation through the
kernel's External token-provider trampoline (a Python callable
the kernel invoked per HTTP request). Removing that for now.

Why: routing OAuth into the kernel inherently requires per-request
token resolution to keep refresh working during a long-running
session. Two viable mechanisms (kernel-native OAuth, or the
External callback); both have costs (duplicate OAuth flows vs
GIL-per-request). Punting the decision until there's actual
demand on use_sea=True.

Today: the bridge accepts PAT (including TokenFederationProvider-
wrapped PAT, which is how `get_python_sql_connector_auth_provider`
always shapes it). Any non-PAT auth_provider raises a clear
NotSupportedError pointing the user at use_sea=False (Thrift).

This shrinks the auth_bridge to ~50 lines and means the kernel-
side External enablement PR is no longer on the connector's
critical path — there's no kernel-side prerequisite for shipping
use_sea=True for PAT users.

Unit tests updated:
- TokenFederationProvider-wrapped PAT still routes to PAT (kept).
- Generic OAuth provider raises NotSupportedError (new).
- ExternalAuthProvider raises NotSupportedError (new).
- Silent non-PAT provider raises NotSupportedError (new) —
  reject the type itself rather than trying to extract a token
  we already know we can't use.

Live e2e against dogfood with use_sea=True (PAT): all checks
still pass (SELECT 1, range(10000), fetchmany pacing, four
metadata calls, session_configuration round-trip, structured
DatabaseError on bad SQL).

Co-authored-by: Isaac
Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>

* test(e2e): live kernel-backend (use_sea=True) suite

Moves the previously-ad-hoc /tmp/connector_smoke.py into the repo
as a real pytest module under tests/e2e/ — same convention as the
rest of the e2e suite. Uses the existing session-scoped
`connection_details` fixture from the top-level conftest so it
shares the credential surface with every other live test.

11 tests cover:
- connect() with use_sea=True opens a session.
- SELECT 1: rows + description shape (column name + dbapi type slug).
- SELECT * FROM range(10000): multi-batch drain.
- fetchmany() pacing across the buffer boundary.
- fetchall_arrow() returns a pyarrow Table.
- All four metadata methods (catalogs / schemas / tables / columns).
- session_configuration={'ANSI_MODE': 'false'} round-trips.
- Bad SQL surfaces as DatabaseError with `code='SqlError'` and
  `sql_state='42P01'` attached as exception attributes.

Module-level skips:
- `databricks_sql_kernel` not importable → whole module skipped via
  pytest.importorskip (the wheel hasn't been installed).
- Live creds missing → fixture-level skip with a pointed message.

Run: `pytest tests/e2e/test_kernel_backend.py -v`. All 11 pass
against dogfood in ~20s.

Co-authored-by: Isaac
Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>

* fix(backend/kernel): defer databricks-sql-kernel poetry dep declaration

CI is failing across all jobs at \`poetry lock\` time:

    Because databricks-sql-connector depends on databricks-sql-kernel
    (^0.1.0) which doesn't match any versions, version solving failed.

The kernel wheel isn't yet published to PyPI — we verified the name
is available via the Databricks proxy, but the package itself hasn't
been built and uploaded yet. Declaring it as a poetry dep (even an
optional one inside an extra) requires the version to be resolvable,
and \`poetry lock\` runs as the setup step for every CI job: unit
tests, linting, type checks, all of them.

Fix: drop the \`databricks-sql-kernel\` dep declaration and the
\`[kernel]\` extra from pyproject.toml until the wheel is on PyPI.
The lazy import in \`backend/kernel/client.py\` still raises a
clear ImportError pointing at \`pip install databricks-sql-kernel\`
(or local maturin) when use_sea=True is invoked without the kernel
present.

When the kernel is published, a small follow-up will add back:

    databricks-sql-kernel = {version = "^0.1.0", optional = true}
    [tool.poetry.extras]
    kernel = ["databricks-sql-kernel"]

A pointed comment in pyproject.toml documents the deferred change.

Co-authored-by: Isaac
Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>

* fix(backend/kernel): unit tests skip without pyarrow, mypy + black

Three CI failures after the poetry-lock fix uncovered three real
issues:

1. pyarrow is optional in the connector. The default-deps CI test
   job installs without it; the +PyArrow job installs with. The
   kernel backend's result_set.py + type_mapping.py import pyarrow
   eagerly (the kernel always returns pyarrow), and the unit tests
   import the backend at collection time — which crashes the
   default-deps job at ModuleNotFoundError.

   Fix: gate the three kernel unit tests on `pytest.importorskip(
   "pyarrow")` so they skip on default-deps and run on +PyArrow.
   Verified locally: 39 pass with pyarrow, 3 skipped without.
   No change to the backend module itself — nothing imports it
   until use_sea=True is invoked, and pyarrow is on the kernel
   wheel's runtime dep list so use_sea=True can't hit this either.

2. mypy: KernelDatabricksClient.open_session returns
   self._session_id, which mypy types as Optional[SessionId]
   because the field starts as None. Fix: bind the new id to a
   local non-Optional variable, assign to the field, return the
   local. CI's check-types runs cleanly on backend/kernel/ now;
   pre-existing mypy noise elsewhere isn't mine.

3. black --check: black 22.12.0 (the version CI pins) wants
   reformatting on result_set.py / type_mapping.py / client.py.
   Applied. Verified locally with the same black version.

All 39 kernel unit tests + 619 pre-existing unit tests pass.

Co-authored-by: Isaac
Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>

* fix(backend/kernel): make package importable without the kernel wheel

The +PyArrow CI matrix installs pyarrow but not the
databricks-sql-kernel wheel (the wheel isn't on PyPI yet, and the
[kernel] extra is deferred — see commit 31ca581). The previous
fix gated unit tests on `pytest.importorskip("pyarrow")` but
test_kernel_auth_bridge.py was still pulled into a kernel-wheel
ImportError because:

  src/databricks/sql/backend/kernel/__init__.py
    -> from databricks.sql.backend.kernel.client import KernelDatabricksClient
        -> import databricks_sql_kernel  # ImportError on +PyArrow CI

The eager re-export from `__init__.py` was a convenience that
broke every consumer that only needed a submodule (type_mapping,
result_set, auth_bridge) — they all triggered the kernel wheel
import for no reason.

Fix:
- Drop the eager re-export from `kernel/__init__.py`. Comment
  documents why and points callers (= session.py::_create_backend,
  already this shape) at the direct `from .client import ...`.
- Drop the no-longer-needed `pytest.importorskip("pyarrow")` /
  `importorskip("databricks_sql_kernel")` from
  test_kernel_auth_bridge.py — auth_bridge.py itself has neither
  dep, so the test now runs on every CI matrix variant.
- test_kernel_result_set.py and test_kernel_type_mapping.py keep
  the pyarrow importorskip because they themselves use pyarrow.

Verified locally across the three matrix shapes:
- both pyarrow + kernel installed: 39 pass.
- pyarrow only (no kernel wheel — the +PyArrow CI shape): 39 pass.
- neither: 9 pass (auth_bridge only), 2 modules skip (the others
  use pyarrow).

Co-authored-by: Isaac
Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>

* test(e2e): skip use_sea=True parametrized cases when kernel wheel missing

The connector's coverage CI job runs the full e2e suite, several of
whose test classes parametrize ``extra_params`` over ``{}`` and
``{"use_sea": True}``. With ``use_sea=True`` now routing through
the Rust kernel via PyO3, those cases die at ``connect()`` with our
pointed ImportError because the ``databricks-sql-kernel`` wheel
isn't yet on PyPI — and that CI job (sensibly) doesn't try to
build it from a sibling repo.

Fix: ``pytest_collection_modifyitems`` hook in the top-level
``conftest.py`` that adds a ``skip`` marker to any parametrize case
with ``extra_params={"use_sea": True, ...}`` when
``importlib.util.find_spec("databricks_sql_kernel")`` returns
``None``. Behavior change is CI-only — local dev with the kernel
wheel installed (via ``maturin develop`` from the kernel repo)
runs those cases as before.

Once the kernel wheel is published, the [kernel] extra in
pyproject.toml gets enabled (see comment block there) and the
default-deps CI matrix will install it; the skip then becomes a
no-op.

Co-authored-by: Isaac
Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>

* refactor(backend/kernel): address review feedback — mechanical fixes

Cleanup pass on the kernel-backend PR addressing reviewer feedback
that doesn't change observable behaviour:

- result_set.py: replace O(M²) `_buffered_rows` with running counter
  `_buffered_count` maintained by pull/take/drain (perf F6).
- result_set.py: docstring corrections — drop nonexistent
  `fetch_all_arrow` from kernel-handle contract (F20); document
  `buffer_size_bytes` as no-op on the kernel backend (F21).
- client.py: tighten `_reraise_kernel_error` signature to
  `_kernel.KernelError` only; drop dead passthrough branch and the
  defensive setattr try/except (F17).
- client.py: drop unused `_use_arrow_native_complex_types` kwarg (F18).
- client.py: collapse three `KernelResultSet(...)` construction sites
  through `_make_result_set` (renamed from `_metadata_result`) (F19).
- client.py: drop `metadata-` prefix from synthetic CommandId; use a
  plain `uuid.uuid4().hex` so anything reading `cursor.query_id`
  downstream sees a UUID-shaped string (F14).
- client.py: clear the raw access token from `_auth_kwargs` after the
  kernel session is constructed — kernel owns the credential from
  then on, no need to retain a cleartext copy on the connector
  instance (F24).
- auth_bridge.py: reject bearer tokens containing ASCII control
  characters at extraction time (defense-in-depth against header
  injection if a misbehaving HTTP stack ever places the token back
  into a header without scrubbing) (F25).
- tests/unit/test_kernel_auth_bridge.py: construct a real
  `TokenFederationProvider(http_client=Mock())` instead of bypassing
  `__init__` with `__new__` + monkey-patching `add_headers`. Exercises
  the real federation passthrough path the bridge sees in production
  (F12). Drop unused `MagicMock` import (F27).
- tests/e2e/test_kernel_backend.py: drop misleading CloudFetch claim
  on `test_drain_large_range_to_arrow` — 10000 BIGINT rows is ~80 KB,
  single inline chunk on a typical warehouse (F26).

All 39 existing kernel unit tests pass.

Co-authored-by: Isaac
Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>

* feat(backend/kernel): introduce dedicated use_kernel flag + substantive review fixes

Major change: route the kernel backend through a new ``use_kernel=True``
connection kwarg instead of repurposing ``use_sea=True``. ``use_sea=True``
once again routes to the native pure-Python SEA backend (no behaviour
change); ``use_kernel=True`` routes to the Rust kernel via PyO3. The
two flags are mutually exclusive.

This addresses the largest reviewer concern from the multi-agent
review: silently hijacking a documented public flag broke OAuth /
federation / parameter-binding callers on ``use_sea=True`` who had no
opt-out. With the new flag, the kernel backend is fully opt-in and
existing ``use_sea=True`` users continue to get the native SEA backend
they signed up for.

Other substantive fixes:

- session.py: restore ``SeaDatabricksClient`` import + routing. Reject
  ``use_kernel=True`` + ``use_sea=True`` together with a clear
  ``ValueError``.
- client.py (kernel ``Cursor.columns``): update docstring to flag the
  ``catalog_name=None`` divergence — kernel requires a catalog,
  Thrift / native SEA do not (F13).
- conftest.py: drop the collection-time ``pytest_collection_modifyitems``
  hook that was skipping ``extra_params={"use_sea": True}`` cases. With
  ``use_sea=True`` back on the native SEA backend, those cases run as
  they did before this PR (F8).
- kernel/client.py: ``get_tables`` now applies the ``table_types``
  filter client-side using ``ResultSetFilter._filter_arrow_table``
  (the same helper the native SEA backend uses), wrapped in a tiny
  ``_StaticArrowHandle`` that flows the filtered table back through
  the normal ``KernelResultSet`` path. Replaces the previous
  "log a warning and return unfiltered" behaviour (F4).
- kernel/client.py: guard ``_async_handles`` with ``threading.RLock``
  so concurrent cursors on the same connection don't race on
  submit / close / close-session (F15).
- kernel/result_set.py: ``KernelResultSet.close()`` now drops the
  entry from ``backend._async_handles`` so async-submitted statements
  don't leave stale references behind (F5).
- kernel/{__init__,client,auth_bridge}.py, tests/e2e/test_kernel_backend.py:
  update docstrings, error messages, and the e2e fixture to refer to
  ``use_kernel=True`` instead of ``use_sea=True``.
- client.py (``Connection`` docstring): document the new
  ``use_kernel`` kwarg + its Phase-1 limitations.

New tests:

- tests/unit/test_kernel_client.py (38 cases): cover the 14-entry
  ``_CODE_TO_EXCEPTION`` table, ``_reraise_kernel_error`` attribute
  forwarding, the 6-entry ``_STATE_TO_COMMAND_STATE`` table, the
  no-open-session guards on every method, ``open_session`` double-open,
  ``parameters`` / ``query_tags`` rejection, ``get_columns``'
  catalog-required check, ``cancel_command`` / ``close_command``
  no-handle tolerance, ``get_query_state`` sync-path SUCCEEDED, the
  Failed-state re-raise, the synthetic-command-id UUID shape, and
  ``close_session`` cleanup even when per-handle close errors fire.
  Uses a fake ``databricks_sql_kernel`` module installed into
  ``sys.modules`` so the test runs with no Rust extension dependency
  (F9).

77/77 kernel unit tests pass.

Co-authored-by: Isaac
Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>

* fix(backend/kernel): CI-greening — mypy + e2e module skip

- src/databricks/sql/backend/kernel/result_set.py: fix the 3 mypy
  errors at L237/239/241 by casting ``self.backend`` to
  ``KernelDatabricksClient`` (the base ``DatabricksClient`` doesn't
  declare ``_async_handles`` / ``_async_handles_lock``). Folds in
  gopalldb's nit (3249904284) — replace the explicit
  ``acquire()/try/finally/release()`` with a ``with`` block to match
  the rest of the file.
- tests/e2e/test_kernel_backend.py: harden the module-level skip so
  the suite doesn't run when the kernel wheel is absent in CI. The
  unit suite installs a fake ``databricks_sql_kernel`` ``ModuleType``
  into ``sys.modules`` so the connector's import-time ``import
  databricks_sql_kernel`` succeeds without the Rust extension; that
  fake leaks across into the same pytest session and
  ``pytest.importorskip`` happily returns it. A real wheel exposes
  ``__file__`` (compiled extension on disk); the fake does not.
  Skip the module when ``__file__`` is missing.

Co-authored-by: Isaac
Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>

* fix(backend/kernel): address gopalldb minor review comments (m1, m4)

m1 — install hint (comment 3249904266):
  The ``databricks-sql-kernel`` wheel is not yet published on PyPI;
  ``pip install databricks-sql-kernel`` either finds nothing or
  pulls a squatted package. Drop the misleading hint from
  ``ImportError`` and from the ``use_kernel`` docstring on
  ``databricks.sql.connect``; point users at the
  ``maturin develop --release`` dev path until the wheel ships.

m4a — auth_bridge ValueError → ProgrammingError (comment 3249904276):
  Two sites in ``_extract_bearer_token`` / ``kernel_auth_kwargs``
  were raising bare ``ValueError`` for caller-misuse cases (control
  chars in the token, PAT provider that produced no Authorization
  header). The rest of the kernel-backend error surface uses PEP
  249 exception types — code paths that catch ``DatabaseError`` /
  ``ProgrammingError`` would miss these. Convert to
  ``ProgrammingError`` and update the unit test.

m4b — description null_ok (comment 3249904282):
  ``description_from_arrow_schema`` was hardcoding the 7th tuple
  element to ``None`` even though ``pyarrow.Field.nullable`` is
  available. PEP 249 §Cursor.description defines ``null_ok`` as
  "True if NULL values are allowed"; callers branching on it
  would have lost useful information the kernel already provides.
  Now emits ``field.nullable``; added a unit test covering both
  nullable and non-nullable fields; updated the two existing tests
  that asserted the old all-``None`` shape.

Co-authored-by: Isaac
Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>

* fix(backend/kernel): substantive review fixes — M1, M2, M3, m2, m3

Addresses gopalldb's four major / two minor remaining review
comments. The shared error-mapping primitives move to a new
``_errors.py`` module so both ``client.py`` and ``result_set.py``
can use them without ``result_set.py`` importing from ``client.py``.

M1 — async handle leak in get_execution_result (3249904251):
  ``ResultStream`` from ``await_result()`` is wrapped in
  ``KernelResultSet``; the underlying ``ExecutedAsyncStatement``
  has no further role once the stream is in hand. Close it
  immediately, drop the entry from ``_async_handles``, and add the
  guid to ``_closed_commands``. A failed ``async_exec.close()`` is
  logged but doesn't break the result-set return — the kernel's
  Drop impl reaps server-side state.

M2 — PyO3 native exceptions wrapped as OperationalError (3249904255):
  Added ``kernel_call(what)`` context manager (in the new
  ``_errors.py``). ``KernelError`` flows through
  ``reraise_kernel_error`` as before; anything else
  (``TypeError`` / ``OverflowError`` / ``ValueError`` from PyO3
  argument conversion, extension-internal errors) is wrapped in
  ``OperationalError`` so DB-API callers only ever see PEP 249
  exception types. Applied at every PyO3 call site:
  ``open_session``, ``execute_command``, ``cancel_command``,
  ``close_command``, ``get_query_state``, ``get_execution_result``,
  ``get_catalogs`` / ``get_schemas`` / ``get_tables`` /
  ``get_columns``, plus ``fetch_next_batch`` /
  ``arrow_schema`` in ``KernelResultSet``.

M3 — KernelError during result-set construction (3249904259):
  ``KernelResultSet.__init__`` calls ``kernel_handle.arrow_schema()``
  which can itself raise. Every call to ``_make_result_set`` is now
  inside a ``kernel_call`` scope so the schema-fetch error becomes a
  mapped PEP 249 exception instead of leaking raw ``KernelError``.

m3 — get_query_state of a closed async command (3249904273):
  Added ``_closed_commands: Set[str]`` (guarded by the existing
  ``_async_handles_lock``). ``close_command`` records the guid;
  ``close_session`` records every swept guid;
  ``get_execution_result`` records its own command after closing
  the async_exec. ``get_query_state`` now returns
  ``CommandState.CLOSED`` instead of falling through to
  ``SUCCEEDED`` for these.

m2 — unit test for get_tables table_types client-side filter
(3249904269):
  Added ``test_get_tables_with_table_types_filters_rows`` and
  ``test_get_tables_without_table_types_returns_full_stream`` in
  ``tests/unit/test_kernel_client.py``. The first feeds a fake
  stream with mixed ``TABLE`` / ``VIEW`` rows and asserts only
  ``TABLE`` survives; the second confirms the no-filter path
  bypasses the drain-and-rewrap and returns all rows unchanged.

Plus new tests for every change above:
  - test_pyo3_native_exception_wrapped_as_operational_error (M2)
  - test_pyo3_native_exception_wrapped_for_metadata_calls (M2)
  - test_kernel_error_during_result_set_construction_is_mapped (M3)
  - test_get_execution_result_closes_async_exec_and_drops_tracking (M1)
  - test_get_execution_result_does_not_raise_on_async_exec_close_failure (M1)
  - test_get_query_state_returns_closed_after_close_command (m3)
  - test_close_session_marks_swept_handles_as_closed (m3)

87/87 kernel unit tests pass (added 9).

Co-authored-by: Isaac
Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>

* refactor(backend/kernel): replace kernel_call context manager with explicit try/except

Using a ``with`` block for error translation is bad form: ``with``
conventionally signals resource lifecycle (locks, files), so
``with kernel_call("X"):`` hides the fact that the block raises a
mapped exception. Replace with explicit ``try/except Exception as
exc: raise _wrap_kernel_exception("X", exc) from exc`` at every
PyO3 call site.

What changed:

- ``_errors.py``: drop the ``kernel_call`` context manager; export
  ``wrap_kernel_exception(what, exc)`` — a pure function that maps
  a raw exception to a PEP 249 one (KernelError → mapped class via
  ``reraise_kernel_error``; existing Error → passthrough; anything
  else → OperationalError).
- ``client.py``: replace 12 ``with _kernel_call(...):`` blocks with
  inline try/except calling the helper.
- ``result_set.py``: same for the 3 sites (arrow_schema on
  construct, fetch_next_batch in _pull_one_batch, fetch_next_batch
  in _drain).

Behaviour is unchanged — same KernelError → PEP 249 mapping, same
non-KernelError → OperationalError wrapping. Just spelled in a way
that makes control flow visible at the call site and keeps
tracebacks one frame shorter (no ``__exit__`` frame).

87/87 kernel unit tests still pass.

Co-authored-by: Isaac
Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>

* style(backend/kernel): black format client.py

Fixes ``check-linting`` CI: one long line in ``execute_command``'s
async-submit branch needed to wrap (the ``CommandId.from_sea_statement_id``
call). Pure formatting; no behaviour change.

Co-authored-by: Isaac
Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>

* fix(backend/kernel): address gopalldb's P1 review comments

Local kernel-package fixes from the follow-up review pass on PR
#787 (#787 (comment)).
The two cross-cutting / pre-existing issues (P0 #1 async leak in
shared ``Cursor.close()``, P1 #8/#9 fetch-after-close not raising
``InterfaceError``) are tracked separately as #791 and #792 — they
affect Thrift and SEA equally and are out of scope for this PR.

P1 #2 — call ``backend.close_command`` from ``KernelResultSet.close()``:
  The override previously bypassed the base ``ResultSet.close()``
  entirely. Honour the contract by invoking
  ``backend.close_command(self.command_id)`` after the per-handle
  close + ``_async_handles`` pop. Our own ``close_command`` is
  tolerant of already-popped guids (no-op), so this is safe even
  though the per-handle close above already released server-side
  state. Doesn't go through ``super().close()`` directly because
  the base path warns when ``self.results`` is ``None`` (which it
  is for kernel result sets) — replicate the meaningful part of
  the base contract without the noisy warning.

P1 #3 — case-insensitive ``Bearer`` prefix in auth_bridge:
  RFC 6750 §2.1 says the Authorization scheme is case-insensitive.
  Match leniently in case a federation proxy or future provider
  normalises the casing differently — failing closed would surface
  as a confusing ``ProgrammingError`` from the bridge.

P1 #4 — drop redundant ``__cause__`` set in ``reraise_kernel_error``:
  ``raise wrap_kernel_exception(...) from exc`` already sets
  ``__cause__`` at the call site; the manual assignment in
  ``reraise_kernel_error`` was redundant. Updated the test that
  asserted on it; added ``test_kernel_error_chains_through_wrap``
  to cover the end-to-end chain.

P1 #5 — ``get_tables`` filter looks up TABLE_TYPE by name:
  Replaced ``schema.field(5).name`` (positional) with the literal
  ``"TABLE_TYPE"`` plus a missing-column guard. A future kernel
  reshape of ``SHOW TABLES`` now surfaces an explicit
  ``OperationalError`` instead of silently filtering the wrong
  column. The case-sensitive contract is now documented in the
  surrounding comment (matches SEA + warehouse).

P1 #6 — ``KernelResultSet.close()`` guards on ``connection.open``:
  ``__del__``-driven close arriving after the parent connection is
  already closed previously issued a kernel call into a disposed
  session. Skip the kernel call entirely in that case; still mark
  the result set ``CLOSED`` locally so ``__del__`` is idempotent.

P1 #7 — defer ``kernel_auth_kwargs`` to ``open_session``:
  ``KernelDatabricksClient.__init__`` previously called
  ``kernel_auth_kwargs(auth_provider)`` and stored the bearer
  token on ``self._auth_kwargs`` indefinitely. If ``open_session``
  never ran (test paths, error paths, lazy retries) the token
  stayed resident on the connector object. Build the kwargs
  locally inside ``open_session`` now — local variable, GC-eligible
  the moment ``open_session`` returns.

Also tightened the install-hint comment in ``pyproject.toml`` to
match the rest of the codebase (the wheel isn't on PyPI; only the
``maturin develop`` path is supported today).

88/88 kernel unit tests pass (added 1).

Co-authored-by: Isaac
Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>

* fix(backend/kernel): address gopalldb's follow-up P1/P2 review comments

Review comment: #787 (comment)

P1.1 — get_query_state handles non-BaseException failure:
  Previously ``raise _reraise_kernel_error(failure) from failure``
  would explode with ``TypeError: exception causes must derive
  from BaseException`` if the kernel's ``status()`` ever returned a
  ``failure`` that wasn't a real ``KernelError`` (struct, dict —
  kernel API drift). Now route through ``_wrap_kernel_exception``,
  which isinstance-checks and falls through to ``OperationalError``
  for non-PEP-249-shaped values. New unit test
  ``test_get_query_state_handles_non_baseexception_failure``.

P1.2 — regression tests for prior fixes:
  - ``test_bearer_prefix_is_case_insensitive`` (P1 #3 from the
    earlier review): parametrised over "Bearer "/"bearer "/
    "BEARER "/"BeArEr ". RFC 6750 §2.1 compliance was added but
    not covered by a test.
  - ``test_close_skips_kernel_call_when_connection_already_closed``
    (P1 #6 from the earlier review): exercises the
    ``connection.open is False`` branch in
    ``KernelResultSet.close()`` — asserts neither the kernel
    handle's close nor backend.close_command fire, but the result
    set still ends in ``CLOSED`` so ``__del__`` is idempotent.
  - ``test_token_with_control_chars_or_whitespace_rejected``
    (pre-existing security guard): parametrised over NUL / CR / LF
    / DEL / space / tab — the regex previously missed space (0x20).
    Covered + extended.

P2.1 — wrap session_id extraction in open_session:
  ``SessionId.from_sea_session_id(self._kernel_session.session_id)``
  was outside the ``try/except _wrap_kernel_exception`` scope. A
  raw PyO3 attribute-conversion error on the
  ``self._kernel_session.session_id`` access could escape unwrapped.
  Now wrapped.

P2.2 — drop redundant _async_handles.pop in result-set close:
  After the M1 fix (``get_execution_result`` pops the guid before
  constructing the result set), the pop in ``KernelResultSet.close``
  is dead code — every call misses. Sync-execute and metadata
  paths never registered in ``_async_handles`` to begin with.
  Drop the per-close pop; rewrote the surrounding comment so
  ``backend.close_command`` is now the single bookkeeping seam.

P2.3 — control-char regex includes whitespace:
  Extended ``[\x00-\x1f\x7f]`` → ``[\x00-\x20\x7f]`` and renamed
  ``_CONTROL_CHAR_RE`` → ``_TOKEN_REJECT_RE``. RFC 6750 forbids
  whitespace within the credential token itself; a token like
  ``"Bearer  doubled-space-token"`` previously slipped past the
  injection guard. Test parametrised above.

type_mapping reuse — use SqlType constants:
  Replaced literal type strings ("bigint", "string", …) in
  ``_arrow_type_to_dbapi_string`` with the ``SqlType`` constants
  from ``databricks.sql.backend.sea.utils.conversion`` — same
  single source of truth the SEA backend already uses, so the
  kernel and SEA backends emit byte-identical type-code strings.
  The Arrow → SqlType lookup itself stays local to the kernel
  (SEA receives type-text from the server and normalises it; the
  kernel receives Arrow schemas directly), but the names are now
  shared.

100/100 kernel unit tests pass (added 12).

Co-authored-by: Isaac
Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>

---------

Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants