WIP: pathfinder_compatibility_guard_rails by rwgk · Pull Request #1977 · NVIDIA/cuda-python

rwgk · 2026-04-25T05:51:08Z

Resolves #1038

Continuation of #1936

WIP — CI testing

Introduce CompatibilityGuardRails plus related errors and tests so callers can opt into CTK and driver compatibility checks while reusing the existing pathfinder lookup APIs. Made-with: Cursor

Expose process_wide_compatibility_guard_rails at import time so follow-up changes can route the default cuda.pathfinder APIs through a stable public instance. Document the singleton and pin its public availability with a small regression test. Made-with: Cursor

Make the process-wide CompatibilityGuardRails instance the default path for the public load/find/locate APIs so top-level calls share compatibility state. Factor the routing/fallback/cache-reset glue into a dedicated internal module to keep `cuda.pathfinder.__init__` focused on the public surface, and fall back to the existing raw resolvers when v1 guard rails only have insufficient metadata. Made-with: Cursor

Allow CUDA_PATHFINDER_COMPATIBILITY_GUARD_RAILS to select strict, best_effort, or off behavior so we can experiment with stricter compatibility checks without changing the public API shape. Made-with: Cursor

Treat driver-packaged libraries as compatibility-neutral so strict mode can load NVML and other driver libs without a raw fallback, while CTK-backed artifacts remain the only items that establish and enforce the process-wide CTK anchor. Made-with: Cursor

Infer the CUDA Toolkit line from both wildcard-pinned and range-based cuda-toolkit requirements so strict process-wide guard rails keep working for editable wheel installs used by nvrtc and nvJitLink. Made-with: Cursor

copy-pr-bot · 2026-04-25T05:51:11Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

rwgk · 2026-04-25T05:51:44Z

/ok to test

github-actions · 2026-04-25T06:11:49Z

Doc Preview CI
🚀 View preview at https://nvidia.github.io/cuda-python/pr-preview/pr-1977/
https://nvidia.github.io/cuda-python/pr-preview/pr-1977/cuda-core/
https://nvidia.github.io/cuda-python/pr-preview/pr-1977/cuda-bindings/
https://nvidia.github.io/cuda-python/pr-preview/pr-1977/cuda-pathfinder/
Preview will be ready when the GitHub Pages deployment is complete.

rwgk · 2026-04-25T18:21:59Z

Analysis of CI failures for workflow run

https://github.com/NVIDIA/cuda-python/actions/runs/24924024509?pr=1977

Cursor GPT-5.4 Extra High Fast

Findings

The failures are not universal. All 34 failing Test_*.txt files I found are local matrix jobs; I did not find failing wheels jobs in this archive.
There is one real failure family: strict process-wide guard rails reject CTK dynamic libraries loaded from the checked-out local toolkit tree because pathfinder cannot infer a CTK line for those paths. Representative Linux log: /wrk/logs_24924024509/Test_linux-64___py3.11__13.0.2__local__l4.txt:7237, /wrk/logs_24924024509/Test_linux-64___py3.11__13.0.2__local__l4.txt:7307, /wrk/logs_24924024509/Test_linux-64___py3.11__13.0.2__local__l4.txt:7369, /wrk/logs_24924024509/Test_linux-64___py3.11__13.0.2__local__l4.txt:7442. Representative Windows log: /wrk/logs_24924024509/Test_win-64___py3.10__13.0.2__local__rtxpro6000__TCC_.txt:10323, /wrk/logs_24924024509/Test_win-64___py3.10__13.0.2__local__rtxpro6000__TCC_.txt:10387, /wrk/logs_24924024509/Test_win-64___py3.10__13.0.2__local__rtxpro6000__TCC_.txt:10443, /wrk/logs_24924024509/Test_win-64___py3.10__13.0.2__local__rtxpro6000__TCC_.txt:10510.
The 12.9 local jobs hit the same root cause later and therefore fail in cuda_core rather than cuda_bindings, because nvJitLink is the first affected library touched there. Examples: /wrk/logs_24924024509/Test_linux-64___py3.10__12.9.1__local__v100.txt:7389 and /wrk/logs_24924024509/Test_win-64___py3.13__12.9.1__local__l4__TCC_.txt:10289.
This matches current intended behavior exactly. In the same failing jobs, tests/test_compatibility_guard_rails.py::test_missing_version_json_raises_insufficient_metadata passes, for example /wrk/logs_24924024509/Test_linux-64___py3.11__13.0.2__local__l4.txt:6414 and /wrk/logs_24924024509/Test_win-64___py3.10__13.0.2__local__rtxpro6000__TCC_.txt:8971.
The ##[error]WARNING: Running pip as the 'root' user ... lines are red herrings from GitHub Actions log formatting, not the job-failure cause. The real failures are the exit code 2 jobs listed above.

Why

The current logic in cuda_pathfinder/cuda/pathfinder/_compatibility_guard_rails.py only accepts CTK metadata from an enclosing version.json or from wheel ownership / cuda-toolkit metadata.
The local CI toolkit layout under cuda_toolkit/ appears to provide neither. I also checked the current tree and there is no version.json under cuda_toolkit/.
A filename-based guess would not be a proper fix. The local 13.0.2 and 13.2.1 jobs use the same artifact names (libnvrtc.so.13 on Linux, nvrtc64_130_0.dll / nvJitLink_130_0.dll on Windows), so the minor CTK line is not recoverable from the library filename alone.

Proper fix

The clean fix is to add a third authoritative CTK metadata source for local-toolkit jobs.
Best option: make the local CI cuda_toolkit/ tree include version.json at its root, so the existing pathfinder logic works unchanged.
Second-best option: teach pathfinder to read another authoritative manifest already present in that local toolkit tree, if CI already stages one.
Least attractive option: add an explicit env-var metadata source for local-toolkit test jobs. That would work, but it is more CI-specific and less clean than shipping real toolkit metadata.

Introduce a small toolkit-info utility that reads the CUDA_VERSION macro from cuda.h so follow-up guard-rails changes can infer CTK major.minor from toolkit headers without depending on version.json. Made-with: Cursor

Centralize encoded CUDA version parsing and validation so toolkit and driver version helpers stay aligned and cuda.h parsing gets consistent string conversion and error reporting. Made-with: Cursor

Replace version.json-based CTK root metadata with cuda.h parsing so compatibility checks use a simpler, more universal toolkit source while preserving wheel-based metadata inference. Made-with: Cursor

rwgk · 2026-04-25T20:26:39Z

/ok to test

rwgk · 2026-04-26T17:33:21Z

At commit c6c38e3, the CI has a single failure in Test_linux-aarch64___py3.14t__13.2.1__local__l4.txt:

That failure does not look like a cuda.h / guard-rails regression. The failing test is tests/test_compatibility_guard_rails.py::test_real_wheel_ctk_items_are_compatible, and the relevant lines are:

INFO test_find_binary_utilities[nvcc]: bin_path=None
INFO test_real_wheel_ctk_items_are_compatible: nvcc=None
FAILED tests/test_compatibility_guard_rails.py::test_real_wheel_ctk_items_are_compatible - assert None is not None

Spot-checking sibling logs shows that the underlying nvcc lookup behavior is inconsistent across closely related aarch64 local jobs:

Test_linux-aarch64___py3.11__13.2.1__local__l4.txt resolves nvcc successfully in both the local-toolkit phase ('/__w/cuda-python/cuda-python/cuda_toolkit/bin/nvcc') and the wheel phase ('/opt/hostedtoolcache/.../site-packages/nvidia/cu13/bin/nvcc').
Test_linux-aarch64___py3.14__13.2.1__local__a100.txt also resolves nvcc successfully in both phases and passes the same guard-rails test.
Test_linux-aarch64___py3.13__13.0.2__local__a100.txt already shows the same local-phase symptom as the failing job (INFO test_find_binary_utilities[nvcc]: bin_path=None), but the guard-rails test does not fail there because it exits earlier with CompatibilityCheckError: ... resolves to CTK 13.0, which does not satisfy ctk_minor==2.
Test_linux-aarch64___py3.14t__12.9.1__local__l4.txt resolves nvcc successfully in the local phase, but the guard-rails test also exits earlier because the local toolkit resolves to CTK 12.9, which does not satisfy the test's hard-wired ctk_major==13.

So the most important takeaway from the logs is: the single red test is a combination of two conditions happening in the same job:

the local mini-CTK nvcc lookup returns None, and
the job happens to use the one CTK line (13.2) that drives test_real_wheel_ctk_items_are_compatible all the way to assert nvcc is not None.

That explains why this shows up as only one visible failure even though the broader nvcc lookup behavior is already inconsistent elsewhere in the same aarch64 local family.

Issues to look into next:

Why does nvcc lookup flip between None and '/__w/.../cuda_toolkit/bin/nvcc' across closely related local jobs?
Replace shutil.which() with a two-stage check: 1. find the file, 2. assert the executable bit is set.
CUDA_PATHFINDER_TEST_FIND_NVIDIA_BINARY_UTILITY_STRICTNESS is missing (all_must_work).
Hard-wired CTK version numbers in guard rail unit tests.

This keeps the host-backed compatibility checks aligned with the resolved toolkit layout and skips cleanly when cudart headers or cuda.h are unavailable. Made-with: Cursor

This keeps the host-backed compatibility checks aligned with the actual driver state instead of relying on a fixed encoded version in the real-environment tests. Made-with: Cursor

rwgk · 2026-04-27T00:11:53Z

/ok to test

This keeps the real host-backed checks strict when an installed nvcc wheel actually ships nvcc, while avoiding a false failure in cu12 wheel environments that only provide lower-level compiler pieces such as ptxas. Made-with: Cursor

rwgk · 2026-04-27T04:24:52Z

Prompt

I'm coming back to this question as posted under #1977 (comment)

Why does nvcc lookup flip between None and '/__w/.../cuda_toolkit/bin/nvcc' across closely related local jobs?

Do we understand this now?

Response

Mostly yes.

The None vs '/__w/.../cuda_toolkit/bin/nvcc' split is now explained for the current matrix:

find_nvidia_binary_utility("nvcc") searches site-packages first, then CONDA_PREFIX, then CUDA_PATH/CUDA_HOME.
In 12.9.1 wheel jobs, site-packages contains nvidia-cuda-nvcc-cu12, but that wheel does not ship nvcc; locally and in CI we found it ships things like ptxas, headers, libnvvm, and libdevice instead. With no local toolkit fallback, lookup ends at None.
In 12.9.1 local jobs, the same cu12 wheel may be installed, but lookup then falls through to CUDA_PATH and finds nvcc in the staged mini-CTK at '/__w/.../cuda_toolkit/bin/nvcc'.

That is exactly what the latest logs show:

wheel jobs like Test_linux-64___py3.14__12.9.1__wheels__t4.txt end with nvcc=None
local jobs like Test_linux-64___py3.12__12.9.1__local__l4.txt and Test_linux-aarch64___py3.12__12.9.1__local__a100.txt find nvcc under cuda_toolkit/bin

So for the current failures on PR 1977, yes, we understand the flip.

One nuance remains: the older single anomalous local job from the earlier archive, where a local job still got nvcc=None, is not fully root-caused. My best remaining read is that it was a transient CI staging or executability problem in that job’s mini-CTK, not a persistent cuda_pathfinder search bug. The newer local logs do not reproduce that anomaly.

rwgk · 2026-04-27T04:25:01Z

/ok to test

Keep the ad hoc conda environment helpers aligned with the current cuda_pathfinder test dependency groups so local verification matches the package's test matrix. Split the PowerShell install list from Linux-only dependencies to avoid pulling unsupported packages on Windows. Made-with: Cursor

Keep the test environment helpers working across CUDA 12 and 13 by matching conda-forge solver constraints, so CUDA 12 uses Python 3.12 while CUDA 13 stays on Python 3.14. Made-with: Cursor

Keep the helper scripts focused on the CUDA packages they need from conda while `pip install --group test` owns the pytest dependencies, so the ad hoc environment setup stays easier to keep in sync with pyproject markers. Made-with: Cursor

Look for cudadevrt under both Library/lib/x64 and Library/lib so CUDA 12 conda environments resolve the real static library instead of falling through to a misleading CUDA_PATH error. Made-with: Cursor

rwgk · 2026-04-28T06:03:05Z

Cursor GPT-5.4 Extra High Fast

PR 1977 conda testing results analysis

Analyzed logs:

Overall

Both logs look healthy.
Each finishes with 1002 passed, 6 skipped.
I do not see any failures, errors, tracebacks, or emitted warnings in either log.

Compatibility guard rails

The tests/test_compatibility_guard_rails.py results are clean in both logs: every compatibility-guard-rails case passed, with no guard-rails skips.
The real-host guard-rails checks all passed in both environments, including:
- test_real_ctk
- test_real_driver
- test_real_wheel_ctk_items_are_compatible
- test_real_wheel_component_version_does_not_override_ctk_line
The reported real versions are coherent and healthy:
- In the cu12.9.1 environment, the real CTK from cuda.h is 12.9, and the real driver CUDA version is 13.3.
- In the cu13.2.1 environment, the real CTK from cuda.h is 13.2, and the real driver CUDA version is 13.3.
This is exactly the compatibility shape we want: in both cases the driver is new enough for the CTK line, and strict-mode guard rails accepted the environment.

Static-lib fix

The new Windows conda static-lib behavior is validated in both logs.
test_locate_static_lib[cudadevrt] now resolves the real env-specific path in each case:
- cu12.9.1: ...\Library\lib\cudadevrt.lib
- cu13.2.1: ...\Library\lib\x64\cudadevrt.lib
The new regression test test_locate_static_lib_conda_rel_path_fallback also passed in both logs, which is a good sign that this will not regress when the first probed layout is absent.

Comparing the two logs

The main interesting difference is that the Windows conda layout is clearly not identical across CTK lines:
- cudadevrt.lib lives in Library\lib for 12.9.1 but Library\lib\x64 for 13.2.1.
- nvvm is under Library\nvvm\bin in 12.9.1 but Library\nvvm\bin\x64 in 13.2.1.
- cccl headers resolve to Library\include\targets\x64 in 12.9.1 but Library\include\targets\x64\cccl in 13.2.1.
- nvprof.exe is present in 12.9.1 but absent in 13.2.1.
None of those differences produced test failures. They mostly reinforce the same lesson as the cudadevrt fix: Windows conda packaging shifts between CTK 12 and CTK 13, so pathfinder should prefer ordered probing over hard-wired single-path assumptions.
The cu12.9.1 run is slower (9.41s vs 6.42s), but nothing in the log suggests a problem; it just looks like a noiseless runtime difference.

Bottom line

Both logs look healthy by themselves.
The compatibility guard rail feature looks good in both environments.
The main cross-log takeaway is that Windows conda packaging layout differs between CTK 12 and CTK 13, and pathfinder is now handling that correctly.

rwgk · 2026-04-28T16:43:30Z

I wanted to confirm that this PR solves a real-world problem that previously triggered a long Slack thread. After a few days of false starts, we were able to distill the issue to the following minimal reproducer:

from cuda import pathfinder as pf
l_nvrtc = pf.load_nvidia_dynamic_lib("nvrtc")
print(l_nvrtc)
d_cudart = pf.locate_nvidia_header_directory("cudart")
print(d_cudart)

I set up my linux-64 workstation so that /usr/local/cuda-13.0 is the system CTK (via /etc/ld.so.conf.d/cuda-13-0.conf).

With current released pathfinder

pip install cuda-pathfinder
pip install "cuda-toolkit[nvrtc]==13.*"

(ReleasedPathfinder) smc120-0009.ipp2a2.colossus.nvidia.com:/wrk/forked/cuda-python/cuda_pathfinder $ pip list
Package           Version
----------------- -------
cuda-pathfinder   1.5.4
cuda-toolkit      13.2.1
nvidia-cuda-nvrtc 13.2.78
pip               26.1

(ReleasedPathfinder) smc120-0009.ipp2a2.colossus.nvidia.com:/wrk/forked/cuda-python/cuda_pathfinder $ python /wrk/pathfinder_libnvrtc_with_header_fallback.py
LoadedDL(abs_path='/wrk/ReleasedPathfinder/lib/python3.12/site-packages/nvidia/cu13/lib/libnvrtc.so.13', was_already_loaded_from_elsewhere=False, _handle_uint=934505312, found_via='site-packages')
LocatedHeaderDir(abs_path='/usr/local/cuda-13.0/include', found_via='system-ctk-root')

Here,

libnvrtc comes from CTK 13.2,
while the header directory comes from CTK 13.0.

That mismatch can lead to extremely confusing NVRTC runtime failures.

This PR

After switching to the pathfinder code in this PR at commit 5a35125, with everything else unchanged, the mismatch is diagnosed and reported immediately:

(Pathfinder1977) smc120-0009.ipp2a2.colossus.nvidia.com:/wrk/forked/cuda-python/cuda_pathfinder $ pip list
Package           Version                Editable project location
----------------- ---------------------- ---------------------------------------
cuda-pathfinder   1.5.5.dev20+g5a3512504 /wrk/forked/cuda-python/cuda_pathfinder
cuda-toolkit      13.2.1
nvidia-cuda-nvrtc 13.2.78
pip               26.1

(Pathfinder1977) smc120-0009.ipp2a2.colossus.nvidia.com:/wrk/forked/cuda-python/cuda_pathfinder $ python /wrk/pathfinder_libnvrtc_with_header_fallback.py
LoadedDL(abs_path='/wrk/Pathfinder1977/lib/python3.12/site-packages/nvidia/cu13/lib/libnvrtc.so.13', was_already_loaded_from_elsewhere=False, _handle_uint=687887856, found_via='site-packages')
Traceback (most recent call last):
  File "/wrk/pathfinder_libnvrtc_with_header_fallback.py", line 4, in <module>
    d_cudart = pf.locate_nvidia_header_directory("cudart")
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/wrk/forked/cuda-python/cuda_pathfinder/cuda/pathfinder/_process_wide_compatibility_guard_rails.py", line 150, in locate_nvidia_header_directory
    return _try_process_wide_guard_rails_then_fallback(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/wrk/forked/cuda-python/cuda_pathfinder/cuda/pathfinder/_process_wide_compatibility_guard_rails.py", line 125, in _try_process_wide_guard_rails_then_fallback
    return guard_rails_call()
           ^^^^^^^^^^^^^^^^^^
  File "/wrk/forked/cuda-python/cuda_pathfinder/cuda/pathfinder/_process_wide_compatibility_guard_rails.py", line 151, in <lambda>
    lambda: _current_process_wide_compatibility_guard_rails().locate_nvidia_header_directory(libname),
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/wrk/forked/cuda-python/cuda_pathfinder/cuda/pathfinder/_compatibility_guard_rails.py", line 622, in locate_nvidia_header_directory
    self._register_and_check(_resolve_header_item(libname, located))
  File "/wrk/forked/cuda-python/cuda_pathfinder/cuda/pathfinder/_compatibility_guard_rails.py", line 608, in _register_and_check
    compatibility_check(self._get_driver_cuda_version(), anchor, item).require_compatible()
  File "/wrk/forked/cuda-python/cuda_pathfinder/cuda/pathfinder/_compatibility_guard_rails.py", line 147, in require_compatible
    raise CompatibilityCheckError(self.message)
cuda.pathfinder._compatibility_guard_rails.CompatibilityCheckError: dynamic-lib 'nvrtc' via site-packages at '/wrk/Pathfinder1977/lib/python3.12/site-packages/nvidia/cu13/lib/libnvrtc.so.13' resolves to CTK 13.2, while header-dir 'cudart' via system-ctk-root at '/usr/local/cuda-13.0/include/cuda_runtime.h' resolves to CTK 13.0. v1 requires an exact CTK major.minor match.

This is exactly the kind of confusing mixed-environment situation the compatibility guard rails are meant to catch. With released pathfinder, the mismatch is silently accepted. With this PR, it is surfaced clearly and immediately, with a precise explanation of what resolved to CTK 13.2 and what resolved to CTK 13.0.

Keep diagnostics and tests aligned with the configured default so future rollout changes only need one constant update. Made-with: Cursor

Align CompatibilityGuardRails with the PEP 440 version syntax users already know, and reuse packaging's parser instead of maintaining custom constraint logic. Made-with: Cursor

Separate item validation, pairwise CTK coherence, and driver checks so later component- and pipeline-aware rules can land without changing current guard-rails behavior. Co-authored-by: Cursor <cursoragent@cursor.com>

Record graph-derived dynamic-link groupings and cross-surface companion tags so later guard-rails milestones can add component- and pipeline-aware policy without reworking the catalogs or resolution plumbing. Co-authored-by: Cursor <cursoragent@cursor.com>

Require exact CTK matching only for authored same-component or companion relationships, so independent artifacts can coexist across minors. Add a Linux-only driver-compatibility override for forward-compatibility deployments without relaxing CTK-coherence checks. Co-authored-by: Cursor <cursoragent@cursor.com>

Query NVML for display-driver release metadata and use it to distinguish backward compatibility from NVIDIA's same-major minor-version compatibility. This lets guard rails follow published driver-branch thresholds instead of treating cuDriverGetVersion() as the whole driver story. Co-authored-by: Cursor <cursoragent@cursor.com>

Track declared nvrtc/nvJitLink producer-consumer flows so guard rails can apply NVIDIA's stricter LTOIR rules without over-constraining PTX, ELF, and CUBIN cases. Keep explicit nvvm pipelines conservative until the model can represent NVVM IR version and dialect details. Co-authored-by: Cursor <cursoragent@cursor.com>

rwgk · 2026-05-02T22:49:12Z

Tracking progress:

With commit 6b4d910 we have reached the final milestone as layed out here:

#1038 (comment)

These commits were entirely generate with Cursor GPT-5.4 Extra High Fast (I only glanced through):

aa4472c pathfinder: split compatibility policy surface
7fe89a2 pathfinder: encode authored relation metadata
b6f9e31 pathfinder: localize CTK coherence and driver policy
8fe0ccd pathfinder: add display-driver release compatibility checks
6b4d910 pathfinder: add explicit pipeline guard rails

rwgk · 2026-05-02T22:50:20Z

/ok to test

Skip Linux-only driver-forward-compatibility tests on non-Linux hosts and stop treating nvcc discovery as mandatory in see_what_works real-host checks. This keeps platform-specific expectations from obscuring real guard-rails regressions when CI infrastructure and host layouts vary. Co-authored-by: Cursor <cursoragent@cursor.com>

rwgk · 2026-05-02T23:46:16Z

/ok to test

Drop redundant mocked happy-path checks that now overlap with the real-host CI matrix, and add explicit ELF/CUBIN pipeline cases so the remaining mocks stay focused on platform, ordering, and version-corner behavior. This keeps the guard-rails suite easier to maintain without giving up the synthetic coverage that real installs still cannot exercise reliably. Co-authored-by: Cursor <cursoragent@cursor.com>

Move public/process-wide and real-host coverage into dedicated modules while centralizing shared fixtures. This keeps the core policy suite focused without changing guard-rails coverage. Co-authored-by: Cursor <cursoragent@cursor.com>

Share the guard-rails-off fixture and small CTK sandbox builders so the touched pathfinder tests stay easier to extend and less error-prone. Co-authored-by: Cursor <cursoragent@cursor.com>

Move static and bitcode caching to the shared locate layer so strict-mode public APIs reuse the same discovery boundary after process-wide guard-rails indirection. Add symmetric wrapper cache clears and a regression test that exercises the strict-mode path. Co-authored-by: Cursor <cursoragent@cursor.com>

rwgk · 2026-05-03T18:45:05Z

/ok to test

* Add nccl_device to _BITCODE_LIBS_PACKAGED_WITH so the guard-rails resolver layer no longer raises KeyError for a name that is already in SUPPORTED_BITCODE_LIBS; lock the dispatch tables in place with parametrized tests that walk every supported bitcode/static/binary name through _resolve_*_item. * Remove unreachable helpers _pipeline_compatibility_result, _dynamic_lib_pipeline_items, and CompatibilityGuardRails._enforce_declared_dynamic_lib_pipelines_for_pair. The pipeline check still fires from _enforce_declared_dynamic_lib_pipelines_for_item after _remember, which is the only code path that ever produced a result. * Re-export DriverCtkCompatibilityError from cuda.pathfinder so the driver-vs-CTK case (already advertised by the env-var hint) can be caught by type instead of message text, and list it in api.rst. Co-authored-by: Cursor <cursoragent@cursor.com>

…ion, and reset naming * Defer the platform check in CUDA_PATHFINDER_DRIVER_COMPATIBILITY to after the CUDA_PATHFINDER_COMPATIBILITY_GUARD_RAILS=off short-circuit so users who turn guard rails off entirely are no longer forced to also unset the override on non-Linux platforms. The value-validation RuntimeError still fires unconditionally so typos are caught early. * Move the binary packaged_with mapping next to the binary registry as SUPPORTED_BINARIES_PACKAGED_WITH and reclassify nsys / nsight-sys / ncu / nsight-compute as packaged_with="other" so strict-mode lookups for separately packaged Nsight tools no longer raise misleading "missing CTK metadata" errors. * Rename CompatibilityGuardRails._reset_for_testing to _reset_state and document that production cache_clear callers also drive it; configured driver overrides are intentionally re-applied while lazily-queried values are dropped. Co-authored-by: Cursor <cursoragent@cursor.com>

Low-severity polish on the v1 compatibility guard rails surface plus two new tests so the existing invariants are asserted instead of only code-read. - _owned_distribution_candidates: note that symlinks are intentionally not chased on either side of the path comparison. - _missing_ctk_metadata_message now appends the conflicting CTK set when wheel metadata for the same on-disk file matches more than one cuda-toolkit distribution, instead of silently collapsing to "could not determine the CTK version". - _compatible_pair_message picks distinct wording for the same-CTK vs cross-CTK independent-pair cases so the message is no longer misleading when both items share a CTK. - _declare_dynamic_lib_pipeline gains a docstring explaining why it stays single-underscored in v1 (taxonomy/policy still evolving). - Block comment near _STATIC_LIBS_PACKAGED_WITH / _BITCODE_LIBS_PACKAGED_WITH calls out the lockstep requirement with SUPPORTED_*_LIBS and points at the parametrized resolver tests that enforce coverage. - load_nvidia_dynamic_lib augments any CompatibilityCheckError raised during _register_and_check with a sentence explaining the underlying dlopen / LoadLibraryW already happened and the OS handle remains live. Mutates exc.args in place so subclass typing (DriverCtkCompatibilityError) and __cause__ are preserved. - _try_process_wide_guard_rails_then_fallback documents why the forward-compat hint is appended only on Linux (cuda-compat-* is NVIDIA's Linux-only contract). - New test_register_and_check_is_idempotent_for_repeated_items asserts duplicate ResolvedItem registrations collapse to one entry. - New test_driver_ctk_compatibility_error_is_typed_catchable asserts a driver-too-old failure raises DriverCtkCompatibilityError as itself (not just by message), is still a CompatibilityCheckError, and carries the new "OS handle remains live" augmentation. Co-authored-by: Cursor <cursoragent@cursor.com>

rwgk · 2026-05-03T22:08:48Z

/ok to test

rwgk added 6 commits April 24, 2026 15:15

Add explicit compatibility guard rails to cuda.pathfinder.

895d503

Introduce CompatibilityGuardRails plus related errors and tests so callers can opt into CTK and driver compatibility checks while reusing the existing pathfinder lookup APIs. Made-with: Cursor

Add guard-rails mode switch for public pathfinder APIs.

0b15665

Allow CUDA_PATHFINDER_COMPATIBILITY_GUARD_RAILS to select strict, best_effort, or off behavior so we can experiment with stricter compatibility checks without changing the public API shape. Made-with: Cursor

Accept wheel metadata version ranges in strict guard rails.

b622613

Infer the CUDA Toolkit line from both wildcard-pinned and range-based cuda-toolkit requirements so strict process-wide guard rails keep working for editable wheel installs used by nvrtc and nvJitLink. Made-with: Cursor

rwgk self-assigned this Apr 25, 2026

github-actions Bot added the cuda.pathfinder Everything related to the cuda.pathfinder module label Apr 25, 2026

rwgk added the P0 High priority - Must do! label Apr 25, 2026

rwgk added this to the cuda.pathfinder next milestone Apr 25, 2026

Merge branch 'main' into pathfinder_compatibility_guard_rails

3bf0e98

rwgk mentioned this pull request Apr 25, 2026

WIP: pathfinder_with_compatibility_checks_v0 #1936

Closed

rwgk added the feature New feature or request label Apr 25, 2026

rwgk added 3 commits April 25, 2026 12:12

Add cuda.h toolkit version parser.

f7e81ed

Introduce a small toolkit-info utility that reads the CUDA_VERSION macro from cuda.h so follow-up guard-rails changes can infer CTK major.minor from toolkit headers without depending on version.json. Made-with: Cursor

Share encoded CUDA version decoding logic.

e3b402a

Centralize encoded CUDA version parsing and validation so toolkit and driver version helpers stay aligned and cuda.h parsing gets consistent string conversion and error reporting. Made-with: Cursor

Use cuda.h for CTK guard-rails metadata.

c6c38e3

Replace version.json-based CTK root metadata with cuda.h parsing so compatibility checks use a simpler, more universal toolkit source while preserving wheel-based metadata inference. Made-with: Cursor

rwgk added 3 commits April 26, 2026 16:56

Make real guard-rails tests derive their CTK line from cuda.h.

298888e

This keeps the host-backed compatibility checks aligned with the resolved toolkit layout and skips cleanly when cudart headers or cuda.h are unavailable. Made-with: Cursor

Make real guard-rails tests query the driver version.

8900cd2

This keeps the host-backed compatibility checks aligned with the actual driver state instead of relying on a fixed encoded version in the real-environment tests. Made-with: Cursor

pre-commit fixes

583af91

Adjust real guard-rails tests for cu12 nvcc wheels.

f13c754

This keeps the real host-backed checks strict when an installed nvcc wheel actually ships nvcc, while avoiding a false failure in cu12 wheel environments that only provide lower-level compiler pieces such as ptxas. Made-with: Cursor

rwgk added 3 commits April 27, 2026 15:58

Adjust pathfinder conda setup scripts for CTK-specific Python pins

11be7c9

Keep the test environment helpers working across CUDA 12 and 13 by matching conda-forge solver constraints, so CUDA 12 uses Python 3.12 while CUDA 13 stays on Python 3.14. Made-with: Cursor

Simplify pathfinder conda setup scripts.

4c95352

Keep the helper scripts focused on the CUDA packages they need from conda while `pip install --group test` owns the pytest dependencies, so the ad hoc environment setup stays easier to keep in sync with pyproject markers. Made-with: Cursor

Handle both Windows conda static-lib layouts.

5a35125

Look for cudadevrt under both Library/lib/x64 and Library/lib so CUDA 12 conda environments resolve the real static library instead of falling through to a misleading CUDA_PATH error. Made-with: Cursor

rwgk mentioned this pull request Apr 28, 2026

[FEA]: Explore pathfinder "scoped search" feature #1038

Open

rwgk and others added 10 commits April 28, 2026 15:58

Merge branch 'main' into pathfinder_compatibility_guard_rails

95a2576

Centralize guard rails default mode.

c7d7397

Keep diagnostics and tests aligned with the configured default so future rollout changes only need one constant update. Made-with: Cursor

Merge branch 'main' into pathfinder_compatibility_guard_rails

01ddbe7

Use packaging specifiers for CTK constraints.

2ea4b26

Align CompatibilityGuardRails with the PEP 440 version syntax users already know, and reuse packaging's parser instead of maintaining custom constraint logic. Made-with: Cursor

Merge branch 'main' into pathfinder_compatibility_guard_rails

a441f10

pathfinder: split compatibility policy surface

aa4472c

Separate item validation, pairwise CTK coherence, and driver checks so later component- and pipeline-aware rules can land without changing current guard-rails behavior. Co-authored-by: Cursor <cursoragent@cursor.com>

rwgk and others added 4 commits May 3, 2026 10:55

pathfinder: split compatibility guard rails tests

bbddbec

Move public/process-wide and real-host coverage into dedicated modules while centralizing shared fixtures. This keeps the core policy suite focused without changing guard-rails coverage. Co-authored-by: Cursor <cursoragent@cursor.com>

pathfinder: deduplicate test setup helpers

ad4a531

Share the guard-rails-off fixture and small CTK sandbox builders so the touched pathfinder tests stay easier to extend and less error-prone. Co-authored-by: Cursor <cursoragent@cursor.com>

rwgk and others added 3 commits May 3, 2026 14:49

rwgk mentioned this pull request May 4, 2026

pathfinder: add toolkit-info parser and display-driver release helpers #2012

Open

Conversation

rwgk commented Apr 25, 2026

Uh oh!

copy-pr-bot Bot commented Apr 25, 2026

Uh oh!

rwgk commented Apr 25, 2026

Uh oh!

github-actions Bot commented Apr 25, 2026

Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

rwgk commented Apr 25, 2026

Findings

Why

Proper fix

Uh oh!

rwgk commented Apr 25, 2026

Uh oh!

rwgk commented Apr 26, 2026

Uh oh!

rwgk commented Apr 27, 2026

Uh oh!

rwgk commented Apr 27, 2026

Prompt

Response

Uh oh!

rwgk commented Apr 27, 2026

Uh oh!

rwgk commented Apr 28, 2026

PR 1977 conda testing results analysis

Overall

Compatibility guard rails

Static-lib fix

Comparing the two logs

Bottom line

Uh oh!

rwgk commented Apr 28, 2026

With current released pathfinder

This PR

Uh oh!

rwgk commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rwgk commented May 2, 2026

Uh oh!

rwgk commented May 2, 2026

Uh oh!

rwgk commented May 3, 2026

Uh oh!

rwgk commented May 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

rwgk commented May 2, 2026 •

edited

Loading