Add xsimd::get<>() for optimized compile-time element extraction by DiamonDinoia · Pull Request #1294 · xtensor-stack/xsimd

DiamonDinoia · 2026-04-09T23:17:14Z

Add a free function xsimd::get(batch) API mirroring std::get(tuple) for fast compile-time element extraction from SIMD batches.

Per-architecture optimized kernel::get overloads using the fastest available intrinsics:

SSE2: shuffle/shift + scalar convert

SSE4.1: pextrd/pextrq/pextrb/pextrw, bitcast + pextrd for float

AVX: vextractf128/vextracti128 + SSE4.1 delegate

AVX-512: vextracti64x4/vextractf32x4 + AVX delegate

NEON: vgetq_lane_* (single instruction for all types)

NEON64: vgetq_lane_f64

Also fixes a latent bug in the common fallback for complex batch compile-time get (wrong buffer type).

DiamonDinoia · 2026-04-14T17:27:01Z

Nice thanks for fixing CI!

This is ready for review. Once approved I will rewrite the history. I don't want to trigger a useless CI run.

serge-sans-paille · 2026-04-16T19:36:04Z

+    void check_get_all(batch_type const& res, std::index_sequence<Is...>) const
+    {
+        int dummy[] = { (check_get_element<Is>(res), 0)... };
+        (void)dummy;


you could check that loading the generated array ends up being equal to res, right?

serge-sans-paille

Please fix the testing so that we have a decent confidence in the getter when index != 0

DiamonDinoia · 2026-04-17T14:09:11Z

Please fix the testing so that we have a decent confidence in the getter when index != 0

Yes, I will! I also noticed some smalle changes I should make. I just did not have time to get to this still.

Adds a new public API `xsimd::get<I>(batch)` that extracts a compile-time indexed lane from a batch. Unlike the runtime `batch::get(i)`, the index is a template parameter so each arch can dispatch to the best single-op path. Design per architecture (objdump-verified, pure -march flags, no reliance on compiler optimization): - SSE2: `first` for I==0; 32/64-bit (int, float, double) go through `swizzle + first` so the xsimd permute API emits the shuffle; 8/16-bit stay on `psrldq + movd` because sse2 swizzle expands to 2 ops for broadcast-to-lane-0 (pshuflw/pshufhw + unpck) while srli keeps it at 1. - SSE4.1: native `pextrb/w/d/q` for integer (1 op); float override removed so it falls through to sse2's swizzle path (equivalent 1-op codegen). - AVX/AVX2: half-extract + delegate to sse4_1 (1 op low half, 2 ops upper half — hardware lower bound). - AVX-512F: `valignd`/`valignq` rotate + extract for float/double — 1 op for every I, including upper half (was 2). Integer keeps the extract + pextr* split (2 ops, optimal). - NEON/NEON64: native per-lane `mov`/`umov v.X[I]` (1 op). - RVV: skip `vslidedown` when I==0. Tests build `array_type { xsimd::get<Is>(res)... }` via pack-initialization, compare against the reference array, and verify that reloading the extracted values reproduces the original batch. Verified on sse2, sse4.1, avx2, avx-512 (sde), aarch64 (qemu), rvv (qemu).

DiamonDinoia force-pushed the feat/optimize-elem-extraction branch 2 times, most recently from 0b6d85f to c6dd311 Compare April 14, 2026 14:38

DiamonDinoia marked this pull request as ready for review April 14, 2026 17:27

serge-sans-paille requested changes Apr 16, 2026

View reviewed changes

serge-sans-paille requested changes Apr 17, 2026

View reviewed changes

DiamonDinoia force-pushed the feat/optimize-elem-extraction branch from 049e9ee to 22f9a1e Compare April 17, 2026 16:54

DiamonDinoia force-pushed the feat/optimize-elem-extraction branch from 22f9a1e to 2e030f2 Compare April 17, 2026 17:54

style: apply clang-format (brace initializers)

1be84de

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add xsimd::get<>() for optimized compile-time element extraction#1294

Add xsimd::get<>() for optimized compile-time element extraction#1294
DiamonDinoia wants to merge 2 commits intoxtensor-stack:masterfrom
DiamonDinoia:feat/optimize-elem-extraction

DiamonDinoia commented Apr 9, 2026

Uh oh!

DiamonDinoia commented Apr 14, 2026

Uh oh!

serge-sans-paille Apr 16, 2026 •

edited

Loading

Uh oh!

serge-sans-paille left a comment

Uh oh!

DiamonDinoia commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

DiamonDinoia commented Apr 9, 2026

Uh oh!

DiamonDinoia commented Apr 14, 2026

Uh oh!

serge-sans-paille Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

serge-sans-paille left a comment

Choose a reason for hiding this comment

Uh oh!

DiamonDinoia commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

serge-sans-paille Apr 16, 2026 •

edited

Loading