Skip to content

BUG: Handling of PCH failures when compiling with nvrtc may have a bug #1994

@mdboom

Description

@mdboom

This was found during the cuda.core API audit in #1951. Here is the agent's assessment first:


_program.pyx: Returns "created", "not_attempted", "failed", or None. The "failed" case is notable — PCH creation failure is reported as a string value rather than raised as an exception. The caller must know to check for "failed" and handle it. Internally, the helper _read_pch_status() also uses None as a sentinel for "heap exhausted, retry needed" (a classic C-style error pattern, though internal-only).


@mdboom's commentary:

In _program.pyx we have the following function that converts err status codes to strings:

cdef str _read_pch_status(cynvrtc.nvrtcProgram prog):
    """Query nvrtcGetPCHCreateStatus and translate to a high-level string."""
    cdef cynvrtc.nvrtcResult err
    with nogil:
        err = cynvrtc.nvrtcGetPCHCreateStatus(prog)
    if err == cynvrtc.nvrtcResult.NVRTC_SUCCESS:
        return _PCH_STATUS_CREATED
    if err == cynvrtc.nvrtcResult.NVRTC_ERROR_PCH_CREATE_HEAP_EXHAUSTED:
        return None  # sentinel: caller should auto-retry
    if err == cynvrtc.nvrtcResult.NVRTC_ERROR_NO_PCH_CREATE_ATTEMPTED:
        return _PCH_STATUS_NOT_ATTEMPTED
    return _PCH_STATUS_FAILED

And it's used like this:

    try:
        status = _read_pch_status(prog)
    except RuntimeError as e:
        raise RuntimeError(
            "PCH was requested but the runtime libnvrtc does not support "
            "PCH APIs. Update to CUDA toolkit 12.8 or newer."
        ) from e

It appears that _read_pch_status can never raise an exception, to the except RuntimeError here isn't doing any work. (And this is not tested AFAICT).

My concern here is that the intention isn't clear. Should it be raising an exception when PCH fails, or is it fine (as documented elsewhere) that execution proceeds and the fact that PCH failed is visible by checking the pch_status property after the fact?

Metadata

Metadata

Assignees

Labels

P1Medium priority - Should dobugSomething isn't workingcuda.coreEverything related to the cuda.core module

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions