Motivation
cuda.core is intended to be a high-level Pythonic wrapper around lower-level bindings in cuda.bindings. In idiomatic Python, errors should be communicated via exceptions rather than requiring callers to inspect return values. This audit looked at all public functions and methods in cuda.core for places where the C convention of returning error/status codes leaks through — or more broadly, anywhere the caller must inspect the returned object for correctness rather than relying on exception flow.
Summary
The codebase is largely well-designed. The HANDLE_RETURN() macro and handle_return() function consistently convert CUDA error codes into Python exceptions across the vast majority of the API. However, there are several notable deviations.
Findings
1. Event.is_done — boolean derived from CUDA error code
❌ _event.pyx: Converts CUDA_SUCCESS → True and CUDA_ERROR_NOT_READY → False. The caller must inspect the return value rather than relying on exception flow. This is a common idiom in async GPU APIs and is arguably reasonable for polling, but it is worth noting as a deliberate deviation from pure exception-based error handling.
@mdboom comment: cuEventQuery docs say:
Returns ::CUDA_SUCCESS if all captured work has been completed, or
::CUDA_ERROR_NOT_READY if any captured work is incomplete.
So I think this code is correct.
2. Program.pch_status — string status code the caller must interpret
❓ _program.pyx: Returns "created", "not_attempted", "failed", or None. The "failed" case is notable — PCH creation failure is reported as a string value rather than raised as an exception. The caller must know to check for "failed" and handle it. Internally, the helper _read_pch_status() also uses None as a sentinel for "heap exhausted, retry needed" (a classic C-style error pattern, though internal-only).
3. Linker.get_error_log() / get_info_log() — unchecked CUDA calls
✔️ _linker.pyx: These return diagnostic strings, but the underlying CUDA calls to nvJitLinkGetErrorLogSize / nvJitLinkGetErrorLog are not checked via HANDLE_RETURN — the results are used directly without error checking. If these calls fail, the failure is silently ignored.
4. _MP_deallocate silently swallows CUDA_ERROR_INVALID_CONTEXT
❌ _memory_pool.pyx: The deallocation path explicitly suppresses CUDA_ERROR_INVALID_CONTEXT. The function is marked noexcept so it cannot raise, but this means a real error (e.g., deallocating after context destruction) is silently ignored. Callers have no way to know deallocation failed.
@mdboom commentary: This seems correct as-is. This is just an indication, according to docs that ::CUDA_ERROR_INVALID_CONTEXT (default stream specified with no current context).
5. DeviceProperties._get_attribute() returns a default on CUDA_ERROR_INVALID_VALUE
❌ _device.pyx: When querying device attributes, CUDA_ERROR_INVALID_VALUE (which often means "this attribute isn't supported on this GPU") is silently converted to a default value (typically 0) rather than raising. A caller reading device.properties.some_attribute could get 0 and not know whether the attribute is genuinely 0 or unsupported on their hardware.
@mdboom commentary: This seems correct -- this is basically creating dict.get()-like functionality (with a default value when the key doesn't exist) on top of cuDeviceGetAttr, which seems totally fine.
6. Kernel._get_arguments_info() uses CUDA_ERROR_INVALID_VALUE as end-of-list sentinel
❌ _module.pyx: Loops calling cuKernelGetParamInfo until it gets CUDA_ERROR_INVALID_VALUE, which it interprets as "no more parameters" rather than an error. This mirrors the C API convention. Any genuinely invalid-value error would also be silently consumed.
@mdboom commentary: Given that there is no API to retrieve the number of parameters for a kernel, this seems like the correct way to iterate over all of them. This code is so core to everything, if this were an issue I'm pretty confident we would know about it.
7. Device_resolve_device_id() returns 0 on CUDA_ERROR_INVALID_CONTEXT
❌ _device.pyx: When no context exists, instead of raising, it defaults to device 0 (mimicking cudart behavior). This is an internal function but affects public API behavior — Device(None) silently falls back to device 0 rather than informing the caller there is no active context.
@mdboom commentary: This all seems solidly in "designed this way on purpose".
8. DMR_mempool_get_access() — returns magic strings instead of a typed enum
❓ _device_memory_resource.pyx: Returns "rw", "r", or "". The empty string "" (meaning "no access") is a value the caller must check — attempting to use a buffer without access would only fail later at a less helpful point. A proper enum would make this more self-documenting and less error-prone.
@mdboom commentary: This seems fine as-is, as it's just telling the user what the permissions are. But this function is not called internally from anywhere or tested, so it's unclear to me what the expected usage pattern is.
Suggestions
Ranked roughly from most to least impactful:
Program.pch_status returning "failed" — consider raising an exception (or at least a warning) during compile() when PCH creation fails, rather than silently storing a status string the user must remember to check.
Linker.get_error_log() / get_info_log() — check the CUDA return values from the underlying log-retrieval calls via HANDLE_RETURN.
_MP_deallocate suppressing CUDA_ERROR_INVALID_CONTEXT — at minimum log a warning so failures are observable.
DeviceProperties returning 0 for unsupported attributes — consider raising AttributeError or returning a distinct sentinel so callers can distinguish "genuinely 0" from "not supported".
DMR_mempool_get_access — return a proper enum rather than magic strings.
Kernel._get_arguments_info() end-of-list sentinel — document or assert that CUDA_ERROR_INVALID_VALUE is only expected at the boundary, to avoid masking real errors.
Device_resolve_device_id() defaulting to device 0 — consider raising when there is no active context, rather than silently choosing a device.
Not flagged (correct patterns)
For completeness, these were reviewed and found to handle errors properly:
Graph.update() — raises CUDAError with diagnostic info on GRAPH_EXEC_UPDATE_FAILURE
GraphBuilder.complete() / _instantiate_graph() — raises RuntimeError with error reason
Event.__sub__() — handles error codes inline but always raises exceptions with contextual messages
- All
close() methods — delegate to C++ RAII handles; idempotent no-op behavior is standard
- All memory resource
allocate() / deallocate() public methods — consistently use HANDLE_RETURN or raise_if_driver_error()
- All
Stream, Device, Context public methods — consistently raise via HANDLE_RETURN
- All graph node factory methods — consistently raise via
HANDLE_RETURN
system subpackage functions — consistently raise ValueError / RuntimeError on failure
Motivation
cuda.core is intended to be a high-level Pythonic wrapper around lower-level bindings in cuda.bindings. In idiomatic Python, errors should be communicated via exceptions rather than requiring callers to inspect return values. This audit looked at all public functions and methods in cuda.core for places where the C convention of returning error/status codes leaks through — or more broadly, anywhere the caller must inspect the returned object for correctness rather than relying on exception flow.
Summary
The codebase is largely well-designed. The
HANDLE_RETURN()macro andhandle_return()function consistently convert CUDA error codes into Python exceptions across the vast majority of the API. However, there are several notable deviations.Findings
1.
Event.is_done— boolean derived from CUDA error code❌
_event.pyx: ConvertsCUDA_SUCCESS→TrueandCUDA_ERROR_NOT_READY→False. The caller must inspect the return value rather than relying on exception flow. This is a common idiom in async GPU APIs and is arguably reasonable for polling, but it is worth noting as a deliberate deviation from pure exception-based error handling.@mdboom comment:
cuEventQuerydocs say:So I think this code is correct.
2.
Program.pch_status— string status code the caller must interpret❓
_program.pyx: Returns"created","not_attempted","failed", orNone. The"failed"case is notable — PCH creation failure is reported as a string value rather than raised as an exception. The caller must know to check for"failed"and handle it. Internally, the helper_read_pch_status()also usesNoneas a sentinel for "heap exhausted, retry needed" (a classic C-style error pattern, though internal-only).3.
Linker.get_error_log()/get_info_log()— unchecked CUDA calls✔️
_linker.pyx: These return diagnostic strings, but the underlying CUDA calls tonvJitLinkGetErrorLogSize/nvJitLinkGetErrorLogare not checked viaHANDLE_RETURN— the results are used directly without error checking. If these calls fail, the failure is silently ignored.4.
_MP_deallocatesilently swallowsCUDA_ERROR_INVALID_CONTEXT❌
_memory_pool.pyx: The deallocation path explicitly suppressesCUDA_ERROR_INVALID_CONTEXT. The function is markednoexceptso it cannot raise, but this means a real error (e.g., deallocating after context destruction) is silently ignored. Callers have no way to know deallocation failed.@mdboom commentary: This seems correct as-is. This is just an indication, according to docs that
::CUDA_ERROR_INVALID_CONTEXT (default stream specified with no current context).5.
DeviceProperties._get_attribute()returns a default onCUDA_ERROR_INVALID_VALUE❌
_device.pyx: When querying device attributes,CUDA_ERROR_INVALID_VALUE(which often means "this attribute isn't supported on this GPU") is silently converted to a default value (typically0) rather than raising. A caller readingdevice.properties.some_attributecould get0and not know whether the attribute is genuinely 0 or unsupported on their hardware.@mdboom commentary: This seems correct -- this is basically creating
dict.get()-like functionality (with a default value when the key doesn't exist) on top ofcuDeviceGetAttr, which seems totally fine.6.
Kernel._get_arguments_info()usesCUDA_ERROR_INVALID_VALUEas end-of-list sentinel❌
_module.pyx: Loops callingcuKernelGetParamInfountil it getsCUDA_ERROR_INVALID_VALUE, which it interprets as "no more parameters" rather than an error. This mirrors the C API convention. Any genuinely invalid-value error would also be silently consumed.@mdboom commentary: Given that there is no API to retrieve the number of parameters for a kernel, this seems like the correct way to iterate over all of them. This code is so core to everything, if this were an issue I'm pretty confident we would know about it.
7.
Device_resolve_device_id()returns0onCUDA_ERROR_INVALID_CONTEXT❌
_device.pyx: When no context exists, instead of raising, it defaults to device 0 (mimicking cudart behavior). This is an internal function but affects public API behavior —Device(None)silently falls back to device 0 rather than informing the caller there is no active context.@mdboom commentary: This all seems solidly in "designed this way on purpose".
8.
DMR_mempool_get_access()— returns magic strings instead of a typed enum❓
_device_memory_resource.pyx: Returns"rw","r", or"". The empty string""(meaning "no access") is a value the caller must check — attempting to use a buffer without access would only fail later at a less helpful point. A proper enum would make this more self-documenting and less error-prone.@mdboom commentary: This seems fine as-is, as it's just telling the user what the permissions are. But this function is not called internally from anywhere or tested, so it's unclear to me what the expected usage pattern is.
Suggestions
Ranked roughly from most to least impactful:
Program.pch_statusreturning"failed"— consider raising an exception (or at least a warning) duringcompile()when PCH creation fails, rather than silently storing a status string the user must remember to check.Linker.get_error_log()/get_info_log()— check the CUDA return values from the underlying log-retrieval calls viaHANDLE_RETURN._MP_deallocatesuppressingCUDA_ERROR_INVALID_CONTEXT— at minimum log a warning so failures are observable.DevicePropertiesreturning0for unsupported attributes — consider raisingAttributeErroror returning a distinct sentinel so callers can distinguish "genuinely 0" from "not supported".DMR_mempool_get_access— return a proper enum rather than magic strings.Kernel._get_arguments_info()end-of-list sentinel — document or assert thatCUDA_ERROR_INVALID_VALUEis only expected at the boundary, to avoid masking real errors.Device_resolve_device_id()defaulting to device 0 — consider raising when there is no active context, rather than silently choosing a device.Not flagged (correct patterns)
For completeness, these were reviewed and found to handle errors properly:
Graph.update()— raisesCUDAErrorwith diagnostic info onGRAPH_EXEC_UPDATE_FAILUREGraphBuilder.complete()/_instantiate_graph()— raisesRuntimeErrorwith error reasonEvent.__sub__()— handles error codes inline but always raises exceptions with contextual messagesclose()methods — delegate to C++ RAII handles; idempotent no-op behavior is standardallocate()/deallocate()public methods — consistently useHANDLE_RETURNorraise_if_driver_error()Stream,Device,Contextpublic methods — consistently raise viaHANDLE_RETURNHANDLE_RETURNsystemsubpackage functions — consistently raiseValueError/RuntimeErroron failure