llmPredict built-in for LLM inference by kubraaksux · Pull Request #2448 · apache/systemds

kubraaksux · 2026-03-11T02:38:49Z

Contains the llmPredict API implementation (Java pipeline + tests). The full benchmark framework is in #2431. Branch also carries an end-to-end native DML inference prototype under scripts/staging/llm-native/: HF→DML weight converter, GPT-2 pre-LN block (scripts/nn/layers/gpt2_layer.dml), NumPy reference, inference driver, and a three-way correctness harness (tools/compare_logits.py). DML logits match HuggingFace within float64 round-off at T=5 and T=128. Previously tracked in #2430 (closed due to branch history issue).

Register llmPredict through the full SystemDS compilation pipeline (Builtins, Opcodes, Types, DMLTranslator, HOP, LOP, CP instruction). LlmPredictCPInstruction sends HTTP POST to OpenAI-compatible servers with configurable concurrency. Includes 10 tests (7 mock, 3 live).

Introduce forward_internal with optional causal masking via upper.tri and log(1-mask) for -inf before column-wise softmax. Expose forward_causal wrapper; keep forward() signature unchanged for existing callers. Add DML + JUnit test verifying first-token invariance under causal mode when future value tokens change.

Introduces scripts/staging/llm-native/ for running pretrained HF transformer models in SystemDS. tools/convert_gpt2.py splits HF's fused c_attn projection into W_Q/W_K/W_V, upcasts to float64, and writes one CSV+MTD pair per matrix plus a manifest.json index.

Adapted from bert_layer.dml: pre-LN ordering, causal attention via multi_attention::forward_causal, and no inner final-LN. Inference- only; GELU hardcoded to gelu_new.

Pure-NumPy float64 forward over the converter's CSVs, dumping every intermediate hidden state. --compare-hf cross-checks against HF; on gpt2 (124M) every per-step max-abs-diff is below 1e-11.

Reads stacked CSV weights produced by tools/convert_gpt2.py + tools/pack_weights.py and runs a single forward pass, writing logits.csv plus per-block dumps. On gpt2 (124M), worst max-abs-diff vs the NumPy oracle is 4.5e-13, so DML logits match HF to ~1e-12. pack_weights.py exists because DML's read() requires const-string filenames; stacking per-layer matrices lets the driver row-slice inside the loop instead. Note: SystemDS/Hadoop's FileInputFormat silently skips files whose names start with '_' or '.', so input files must not use those prefixes.

compare_logits.py tokenizes a prompt, runs HF in float64, runs the NumPy oracle, optionally invokes the DML driver via subprocess, and prints a per-step max-abs-diff table across the three. Validation on gpt2 (124M): worst max|d| = 4.09e-12 at T=5 and 7.73e-12 at T=128 -- the diff stays in the float64 round-off regime and does not grow with sequence length.

github-project-automation Bot added this to SystemDS PR Queue Mar 11, 2026

github-project-automation Bot moved this to In Progress in SystemDS PR Queue Mar 11, 2026

kubraaksux mentioned this pull request Mar 11, 2026

Add llmPredict built-in and LLM benchmarking framework #2431

Open

kubraaksux force-pushed the llm-api branch from e391dea to bc56d28 Compare March 11, 2026 04:05

kubraaksux added 7 commits March 28, 2026 20:03

Merge upstream/main to pick up checkstyle and test fixes

b7173f7

Add GPT-2 pre-LN transformer block

ee6d9b6

Adapted from bert_layer.dml: pre-LN ordering, causal attention via multi_attention::forward_causal, and no inner final-LN. Inference- only; GELU hardcoded to gelu_new.

Add NumPy reference forward pass for GPT-2

9a8ff05

Pure-NumPy float64 forward over the converter's CSVs, dumping every intermediate hidden state. --compare-hf cross-checks against HF; on gpt2 (124M) every per-step max-abs-diff is below 1e-11.

kubraaksux force-pushed the llm-api branch from 90eb999 to 4ff71db Compare April 27, 2026 15:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llmPredict built-in for LLM inference#2448

llmPredict built-in for LLM inference#2448
kubraaksux wants to merge 8 commits intoapache:mainfrom
kubraaksux:llm-api

kubraaksux commented Mar 11, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kubraaksux commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

kubraaksux commented Mar 11, 2026 •

edited

Loading