Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
67 commits
Select commit Hold shift + click to select a range
9026600
Phase 1: compile upstream server-context/queue/task/models into jllama
claude Apr 24, 2026
2cfb563
Phase 2: rewrite jllama.cpp to upstream reader-based server API
claude Apr 25, 2026
b866897
Phase 1: replace hand-ported server.hpp with upstream shim; update js…
claude Apr 25, 2026
9913404
Fix jllama.cpp API calls for upstream server_context
claude Apr 25, 2026
0c3fcde
Fix test_server.cpp compile errors for upstream API
claude Apr 25, 2026
333a96a
Remove stop_type_to_str and oaicompat_finish_reason tests
claude Apr 25, 2026
685b126
Fix test_json_helpers.cpp errors and nodiscard warnings
claude Apr 25, 2026
916feba
Link upstream server TUs into jllama_test; fix nodiscard warnings
claude Apr 25, 2026
396ad3e
Fix GetJllamaContext_ReturnsWrapperNotInnerServer test
claude Apr 25, 2026
d4a1167
Add jllama_context default-value tests
claude Apr 25, 2026
c842754
Add targeted json_helpers tests for Phase 2 changes
claude Apr 25, 2026
680cfc2
Add readers map lifecycle tests for streaming architecture
claude Apr 25, 2026
1c9d51e
Add four targeted json_helpers tests covering real invariants
claude Apr 25, 2026
4746cda
Document require_json_field_impl null-value behaviour with a test
claude Apr 25, 2026
9b2ea0f
fix(rerank): use post_tasks() instead of looping post_task()
claude Apr 25, 2026
322388f
fix(completion): tokenize prompt and set task.tokens before dispatch
claude Apr 25, 2026
c95b5df
fix(embed): use post_tasks() for multi-prompt embedding batches
claude Apr 25, 2026
c87faa2
fix: move-assign server_tokens (copy-assign is deleted)
claude Apr 25, 2026
aa7df43
fix(android): guard server-models.cpp with OS_NAME fallback
claude Apr 25, 2026
f1a9bff
fix(meta): add architecture field to getModelMetaJson
claude Apr 25, 2026
5533a58
fix(config): validate configureParallelInference inputs even as no-op
claude Apr 25, 2026
43f71a6
docs: update REFACTORING.md with Phase 2 completion and Phase 3 plan
claude Apr 25, 2026
0a5a396
refactor: delete server.hpp shim and inline upstream includes directly
claude Apr 25, 2026
c19ccfe
refactor: remove dead code from utils.hpp (base64 copy, slot macros)
claude Apr 25, 2026
c7e79bc
docs: update REFACTORING.md — Phase 3 complete, all phases done
claude Apr 25, 2026
ef34329
test: add tests for token_piece_value, result_error, prompt_progress,…
claude Apr 25, 2026
b0ff7da
test: cover cache_n, t_start, n_tokens_max, null json-to-jstring
claude Apr 25, 2026
83bc69d
test: cover cmpl_final shape and stop_type values
claude Apr 25, 2026
5a3de2c
test: cover completion_probabilities conditional in cmpl_final
claude Apr 25, 2026
67f1344
test: add edge-case tests for format_response_rerank and utf8 helpers
claude Apr 25, 2026
3e50321
test: cover server_tokens pos_next/size_up_to_pos in text-only path
claude Apr 25, 2026
cf931cc
test: cover metrics timing fields and cmpl_final usage_json_oaicompat
claude Apr 25, 2026
db3dca7
test: add task_params lora field tests
claude Apr 25, 2026
1199264
test: cover cmpl_partial id_slot and per-token completion_probabilities
claude Apr 25, 2026
e47dc41
docs: update CLAUDE.md with C++ test infrastructure and current jni_h…
claude Apr 25, 2026
4fc77f6
test: cover json_is_array_and_contains_numbers (4 tests, 332 total)
claude Apr 25, 2026
eda5699
test: cover SSE formatter functions (10 tests, 342 total)
claude Apr 25, 2026
53ec86e
test: cover server_task::need_sampling() and n_tokens() (5 tests, 347…
claude Apr 25, 2026
30276da
test: cover task_params dry_sequence_breakers and preserved_tokens (4…
claude Apr 25, 2026
c152b8b
test: cover cmpl_final::to_json_oaicompat() OAI completion shape (8 t…
claude Apr 25, 2026
65b7050
test: cover cmpl_final::to_json_oaicompat_chat() chat completion shap…
claude Apr 25, 2026
5d86527
test: cover cmpl_final::to_json_anthropic() Anthropic response shape …
claude Apr 25, 2026
0d37b44
test: cover cmpl_partial OAI streaming chunk and to_json() dispatcher…
claude Apr 25, 2026
b0dca20
docs: update CLAUDE.md test counts to 385 (session 2 additions)
claude Apr 25, 2026
8fac3d0
docs: document complex C++ test methodology in CLAUDE.md
claude Apr 25, 2026
7560247
test: cover cmpl_final::to_json() dispatcher switch arms (5 tests, 39…
claude Apr 25, 2026
ececced
test: cover verbose flag and timings conditional in OAI formatters (5…
claude Apr 25, 2026
a5f2bbb
test: cover to_json_oaicompat_chat_stream() array format (5 tests, 40…
claude Apr 25, 2026
ac40dbd
test: cover params_from_json_cmpl() parsing pipeline (10 tests, 410 t…
claude Apr 25, 2026
c47aaba
test: cover response_fields projection in cmpl_final::to_json_non_oai…
claude Apr 25, 2026
3cd211f
test: cover cmpl_partial::to_json_oaicompat_chat() streaming delta fo…
claude Apr 25, 2026
25f0169
test: cover to_json_anthropic_stream() full event lifecycle (6 tests,…
claude Apr 25, 2026
060421c
docs: update CLAUDE.md test counts to 424 (session 3 complex tests)
claude Apr 25, 2026
c476dee
test: add 8 CmplPartialAnthropicStream tests (432 total)
claude Apr 25, 2026
51a77a9
test: add dispatcher ANTHROPIC arm and tool_calls chat tests (435 total)
claude Apr 25, 2026
b7c61bc
test: add 3 grammar type routing tests in params_from_json_cmpl (438 …
claude Apr 25, 2026
2856c6f
test: add logprobs and timings_per_token tests (442 total)
claude Apr 25, 2026
9559b33
refactor(phase4): replace extract_first_embedding_row with direct dow…
claude Apr 25, 2026
32d717f
refactor(phase4): inline build_embeddings_response_json into handleEm…
claude Apr 25, 2026
d9b0dbf
test: add 3 ServerTaskResultEmbd tests for non-OAI shape coverage (43…
claude Apr 25, 2026
71485d5
Phase 7: delete dead code — format_logit_bias, parse_lora_request wra…
claude Apr 25, 2026
39fb0b7
Remove unused includes and dead get_server_context_impl; update docs
claude Apr 25, 2026
95cbe55
Eliminate five duplication patterns in jllama.cpp (-35 lines)
claude Apr 25, 2026
5f5d8ee
Document Phase 6 duplication elimination in REFACTORING.md and CLAUDE.md
claude Apr 25, 2026
0e9e494
Fix over-broad Android guard: only exclude server-models.cpp
claude Apr 25, 2026
3052a80
Clarify CompletionResponseParser Javadoc on optional probabilities field
claude Apr 25, 2026
990be74
Gate CUDA cross-compile behind enable_cuda_build flag for faster PR f…
claude Apr 25, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 22 additions & 2 deletions .github/workflows/release.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,10 @@ on:
description: 'Release to Maven Central (true/false)'
required: false
default: 'false'
enable_cuda_build:
description: 'Build CUDA artifacts — slow, auto-enabled on release events. See CLAUDE.md "Optional CUDA build flag".'
required: false
default: 'false'
release:
types: [ created ]
env:
Expand All @@ -24,6 +28,10 @@ jobs:

crosscompile-linux-x86_64-cuda:
name: Cross-Compile manylinux_2_28 x86_64 (CUDA)
# Slow job (CUDA toolkit install + nvcc). Skipped on PRs to keep the feedback
# loop fast. See CLAUDE.md "Optional CUDA build flag" for the rationale and
# the revert path once the feedback loop is no longer the bottleneck.
if: github.event_name == 'release' || github.event.inputs.enable_cuda_build == 'true'
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
Expand Down Expand Up @@ -544,6 +552,13 @@ jobs:
- test-java-macos-arm64-metal
- test-java-macos-arm64-no-metal
- test-java-windows-x86_64
# Run even when the CUDA job was skipped (PR / non-release dispatch without
# enable_cuda_build), but still fail the package step if any required job
# actually failed or was cancelled.
if: |
always() &&
!contains(needs.*.result, 'failure') &&
!contains(needs.*.result, 'cancelled')
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
Expand All @@ -552,7 +567,8 @@ jobs:
pattern: "*-libraries"
merge-multiple: true
path: ${{ github.workspace }}/src/main/resources/de/kherud/llama/
- uses: actions/download-artifact@v6
- if: needs.crosscompile-linux-x86_64-cuda.result == 'success'
uses: actions/download-artifact@v6
with:
name: linux-libraries-cuda
path: ${{ github.workspace }}/src/main/resources_linux_cuda/de/kherud/llama/
Expand All @@ -569,7 +585,11 @@ jobs:
path: target/*.jar

publish:
if: ${{ github.event_name == 'release' || github.event.inputs.release_to_maven_central == 'true' }}
# Manual dispatch must set BOTH release_to_maven_central=true AND
# enable_cuda_build=true, otherwise the linux-libraries-cuda artifact
# download below would fail. Release events always satisfy this since
# the CUDA job runs unconditionally on `release`.
if: ${{ github.event_name == 'release' || (github.event.inputs.release_to_maven_central == 'true' && github.event.inputs.enable_cuda_build == 'true') }}
needs: [ package ]
runs-on: ubuntu-latest
steps:
Expand Down
279 changes: 244 additions & 35 deletions CLAUDE.md

Large diffs are not rendered by default.

29 changes: 26 additions & 3 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -210,11 +210,29 @@ endif()

add_library(jllama SHARED
src/main/cpp/jllama.cpp
src/main/cpp/server.hpp
src/main/cpp/utils.hpp
${llama.cpp_SOURCE_DIR}/tools/server/server-common.cpp
${llama.cpp_SOURCE_DIR}/tools/server/server-chat.cpp)

# Phase 1 refactoring: compile upstream server library units directly into jllama
# server.hpp has been replaced by direct upstream includes in jllama.cpp.
# server-http.cpp and server.cpp (main) are intentionally excluded.
# server-context.cpp, server-queue.cpp, server-task.cpp compile on all platforms
# including Android. server-models.cpp is excluded on Android because it pulls
# in subprocess.h which calls posix_spawn_*, declared but not implemented by the
# Android NDK. Guard with both ANDROID_ABI (NDK toolchain convention) and
# OS_NAME (always set to "Linux-Android" by the CI cmake invocation).
target_sources(jllama PRIVATE
${llama.cpp_SOURCE_DIR}/tools/server/server-context.cpp
${llama.cpp_SOURCE_DIR}/tools/server/server-queue.cpp
${llama.cpp_SOURCE_DIR}/tools/server/server-task.cpp
)
if(NOT ANDROID_ABI AND NOT OS_NAME MATCHES "Android")
target_sources(jllama PRIVATE
${llama.cpp_SOURCE_DIR}/tools/server/server-models.cpp
)
endif()

set_target_properties(jllama PROPERTIES POSITION_INDEPENDENT_CODE ON)
target_include_directories(jllama PRIVATE
src/main/cpp
Expand Down Expand Up @@ -247,7 +265,7 @@ endif()

#################### C++ unit tests ####################

option(BUILD_TESTING "Build C++ unit tests for server.hpp / utils.hpp" OFF)
option(BUILD_TESTING "Build C++ unit tests for jni_helpers / json_helpers / utils" OFF)

if(BUILD_TESTING)
FetchContent_Declare(
Expand All @@ -268,7 +286,12 @@ if(BUILD_TESTING)
src/test/cpp/test_jni_helpers.cpp
src/test/cpp/test_json_helpers.cpp
${llama.cpp_SOURCE_DIR}/tools/server/server-common.cpp
${llama.cpp_SOURCE_DIR}/tools/server/server-chat.cpp)
${llama.cpp_SOURCE_DIR}/tools/server/server-chat.cpp
${llama.cpp_SOURCE_DIR}/tools/server/server-context.cpp
${llama.cpp_SOURCE_DIR}/tools/server/server-queue.cpp
${llama.cpp_SOURCE_DIR}/tools/server/server-task.cpp
${llama.cpp_SOURCE_DIR}/tools/server/server-models.cpp
)

target_include_directories(jllama_test PRIVATE
src/main/cpp
Expand Down
Loading
Loading