Summary
See nanvix/nanvix#2082 for more context on why this is relevant for Nanvix on Hyperlight.
When a guest configures a large scratch region via SandboxConfiguration::set_scratch_size(), HyperlightVm::new() fails on KVM with EEXIST (Error(17)):
UpdateRegion(MapMemory(Hypervisor(KvmError(Error(17)))))
The root cause is that the scratch memory slot (KVM slot 1) overlaps with an internal KVM memory slot created by create_irq_chip() for the LAPIC/APIC access page at GPA 0xFEE00000.
Details
Scratch region placement
Hyperlight places the scratch region at the top of the 32-bit GPA space via scratch_base_gpa():
// hyperlight_common::layout
pub fn scratch_base_gpa(size: usize) -> u64 {
(MAX_GPA - size + 1) as u64
}
With MAX_GPA = 0xFFFF_FFFF, a scratch size of e.g. 0x6882000 (~104 MB) yields:
scratch_base = 0xF977E000
- Scratch KVM slot covers GPA range
[0xF977E000, 0xFFFFFFFF]
KVM irqchip APIC access page
When the hw-interrupts feature is enabled, KvmVm::new() calls create_irq_chip(). On Intel hardware with APICv (or on AMD with AVIC), KVM automatically creates an internal APIC access page at GPA 0xFEE00000. This is a non-removable, non-movable memory slot managed internally by KVM.
Since 0xFEE00000 falls inside [0xF977E000, 0xFFFFFFFF], KVM rejects the set_user_memory_region call for the scratch slot with EEXIST — the two regions overlap.
Maximum safe scratch size
The maximum scratch size that avoids the APIC page is:
max_scratch = MAX_GPA - 0xFEE00000 = 0x11FFFFF ≈ 18 MB
Any set_scratch_size() value above ~18 MB will fail on KVM with hw-interrupts enabled on Intel (APICv) or AMD (AVIC) hosts.
Why this does not affect Windows WHP
On Windows, WhpVm::new() does not create an explicit interrupt controller memory slot at a fixed GPA. The WHP API (WHvMapGpaRange2) maps guest physical address ranges independently, and the platform's LAPIC emulation does not reserve a GPA slot that conflicts with user-mapped regions. This is why the same scratch configuration works on Windows.
Why this does not affect small scratch sizes
The default scratch size (DEFAULT_SCRATCH_SIZE = 0x48000 = 288 KB) places scratch at 0xFFFB8000, which is above 0xFEE00000, so there is no overlap.
Reproduction
This can be reproduced with the Nanvix project, which uses Hyperlight as a VMM backend:
-
Clone and checkout branch enhancement-uservm-hyperlight at commit b9c50ed28 (uses Hyperlight rev 4b57b84):
git clone https://github.com/nanvix/nanvix.git
cd nanvix
git checkout enhancement-uservm-hyperlight
-
Build with Hyperlight machine target:
./z build -- all MACHINE=hyperlight DEPLOYMENT_MODE=standalone
-
Run the integration test on a machine with KVM and APICv enabled (Intel bare-metal):
./bin/mkimage.elf -o nanvix.img \
"bin/procd.elf;procd" \
"bin/memd.elf;memd" \
"bin/testd.elf;testd"
bash scripts/run-nanvixd.sh hyperlight nanvix.img 120 \
--wait-for-string "hello, world!"
On bare-metal Intel with APICv, nanvixd fails immediately with the EEXIST error. On machines with APICv disabled (e.g., WSL2) or on Windows WHP, it succeeds.
The failing CI run: https://github.com/nanvix/nanvix/actions/runs/24616269717
Proposed solutions
-
Split the scratch KVM slot around the APIC page: When creating the scratch memory mapping on KVM with hw-interrupts, detect whether [scratch_base, scratch_end] contains 0xFEE00000 and split it into two KVM memory slots: [scratch_base, 0xFEDFFFFF] and [0xFEF00000, scratch_end]. The APIC page itself (0xFEE00000–0xFEEFFFFF) would be left for KVM's internal slot. The host-side mmap backing would remain contiguous; only the KVM slot registration would be split.
-
Validate scratch_size against known reserved GPAs: In SandboxMemoryLayout::new(), reject scratch sizes that would cause scratch_base_gpa() to fall below 0xFEE00000 when hw-interrupts is enabled on KVM. This would at least provide a clear error message instead of an opaque KvmError(Error(17)).
-
Document the maximum scratch size constraint: Add a note to SandboxConfiguration::set_scratch_size() and DEFAULT_SCRATCH_SIZE explaining the ~18 MB upper bound on KVM with hw-interrupts.
-
Disable APICv on the host: Consumers running KVM on Intel can work around this by disabling APICv (sudo modprobe kvm_intel enable_apicv=0), which prevents KVM from allocating the APIC access page. This eliminates the overlap but comes at a performance cost — APIC accesses fall back to VM-exit based emulation instead of hardware-accelerated handling. This is a viable short-term workaround but does not require any Hyperlight changes.
Option 1 would be the most flexible, allowing scratch regions of arbitrary size on all platforms. Options 2 and 3 are simpler but limit the usable scratch space. Option 4 is a host-side workaround that does not require any Hyperlight changes.
Environment
- Hyperlight revision:
4b57b8416114c489083922afa3dd9716127278fb
- Features:
kvm, hw-interrupts, nanvix-unstable, executable_heap
- Host: bare-metal Intel x86_64, Linux, KVM with APICv enabled
- Fails: Intel bare-metal runners (prometheus28, prometheus30, prometheus43)
- Works: WSL2 (APICv disabled), Windows 11 WHP
Summary
See nanvix/nanvix#2082 for more context on why this is relevant for Nanvix on Hyperlight.
When a guest configures a large scratch region via
SandboxConfiguration::set_scratch_size(),HyperlightVm::new()fails on KVM withEEXIST(Error(17)):The root cause is that the scratch memory slot (KVM slot 1) overlaps with an internal KVM memory slot created by
create_irq_chip()for the LAPIC/APIC access page at GPA0xFEE00000.Details
Scratch region placement
Hyperlight places the scratch region at the top of the 32-bit GPA space via
scratch_base_gpa():With
MAX_GPA = 0xFFFF_FFFF, a scratch size of e.g.0x6882000(~104 MB) yields:scratch_base = 0xF977E000[0xF977E000, 0xFFFFFFFF]KVM irqchip APIC access page
When the
hw-interruptsfeature is enabled,KvmVm::new()callscreate_irq_chip(). On Intel hardware with APICv (or on AMD with AVIC), KVM automatically creates an internal APIC access page at GPA0xFEE00000. This is a non-removable, non-movable memory slot managed internally by KVM.Since
0xFEE00000falls inside[0xF977E000, 0xFFFFFFFF], KVM rejects theset_user_memory_regioncall for the scratch slot withEEXIST— the two regions overlap.Maximum safe scratch size
The maximum scratch size that avoids the APIC page is:
Any
set_scratch_size()value above ~18 MB will fail on KVM withhw-interruptsenabled on Intel (APICv) or AMD (AVIC) hosts.Why this does not affect Windows WHP
On Windows,
WhpVm::new()does not create an explicit interrupt controller memory slot at a fixed GPA. The WHP API (WHvMapGpaRange2) maps guest physical address ranges independently, and the platform's LAPIC emulation does not reserve a GPA slot that conflicts with user-mapped regions. This is why the same scratch configuration works on Windows.Why this does not affect small scratch sizes
The default scratch size (
DEFAULT_SCRATCH_SIZE = 0x48000= 288 KB) places scratch at0xFFFB8000, which is above0xFEE00000, so there is no overlap.Reproduction
This can be reproduced with the Nanvix project, which uses Hyperlight as a VMM backend:
Clone and checkout branch
enhancement-uservm-hyperlightat commitb9c50ed28(uses Hyperlight rev4b57b84):git clone https://github.com/nanvix/nanvix.git cd nanvix git checkout enhancement-uservm-hyperlightBuild with Hyperlight machine target:
Run the integration test on a machine with KVM and APICv enabled (Intel bare-metal):
./bin/mkimage.elf -o nanvix.img \ "bin/procd.elf;procd" \ "bin/memd.elf;memd" \ "bin/testd.elf;testd" bash scripts/run-nanvixd.sh hyperlight nanvix.img 120 \ --wait-for-string "hello, world!"On bare-metal Intel with APICv, nanvixd fails immediately with the EEXIST error. On machines with APICv disabled (e.g., WSL2) or on Windows WHP, it succeeds.
The failing CI run: https://github.com/nanvix/nanvix/actions/runs/24616269717
Proposed solutions
Split the scratch KVM slot around the APIC page: When creating the scratch memory mapping on KVM with
hw-interrupts, detect whether[scratch_base, scratch_end]contains0xFEE00000and split it into two KVM memory slots:[scratch_base, 0xFEDFFFFF]and[0xFEF00000, scratch_end]. The APIC page itself (0xFEE00000–0xFEEFFFFF) would be left for KVM's internal slot. The host-side mmap backing would remain contiguous; only the KVM slot registration would be split.Validate scratch_size against known reserved GPAs: In
SandboxMemoryLayout::new(), reject scratch sizes that would causescratch_base_gpa()to fall below0xFEE00000whenhw-interruptsis enabled on KVM. This would at least provide a clear error message instead of an opaqueKvmError(Error(17)).Document the maximum scratch size constraint: Add a note to
SandboxConfiguration::set_scratch_size()andDEFAULT_SCRATCH_SIZEexplaining the ~18 MB upper bound on KVM withhw-interrupts.Disable APICv on the host: Consumers running KVM on Intel can work around this by disabling APICv (
sudo modprobe kvm_intel enable_apicv=0), which prevents KVM from allocating the APIC access page. This eliminates the overlap but comes at a performance cost — APIC accesses fall back to VM-exit based emulation instead of hardware-accelerated handling. This is a viable short-term workaround but does not require any Hyperlight changes.Option 1 would be the most flexible, allowing scratch regions of arbitrary size on all platforms. Options 2 and 3 are simpler but limit the usable scratch space. Option 4 is a host-side workaround that does not require any Hyperlight changes.
Environment
4b57b8416114c489083922afa3dd9716127278fbkvm,hw-interrupts,nanvix-unstable,executable_heap