Skip to content

fix: array_concat widens container variant for mixed List/LargeList inputs#21704

Merged
Jefffrey merged 1 commit intoapache:mainfrom
hcrosse:fix/array-concat-mixed-large-list
Apr 21, 2026
Merged

fix: array_concat widens container variant for mixed List/LargeList inputs#21704
Jefffrey merged 1 commit intoapache:mainfrom
hcrosse:fix/array-concat-mixed-large-list

Conversation

@hcrosse
Copy link
Copy Markdown
Contributor

@hcrosse hcrosse commented Apr 17, 2026

Which issue does this PR close?

Rationale for this change

array_concat hit an internal cast error when given a mix of List and LargeList (or FixedSizeList and LargeList) arguments:

> select array_concat(make_array(1, 2), arrow_cast([3, 4], 'LargeList(Int64)'));
DataFusion error: Internal error: could not cast array of type List(Int64) to arrow_array::array::list_array::GenericListArray<i64>.

ArrayConcat::coerce_types was coercing only the base element type, leaving the outer container alone. When the resolved return type is LargeList, array_concat_inner later tries to downcast each arg to GenericListArray<i64>, which fails for any List argument that slipped through.

What changes are included in this PR?

In ArrayConcat::coerce_types, after coercing the base type, also promote each input's outermost List to LargeList when the return type is a LargeList. FixedSizeList inputs already go through FixedSizedListToList first and then get promoted too. Per-arg dimensionality is preserved, so nested cases keep working with align_array_dimensions.

Are these changes tested?

Yes, added sqllogictests in array_concat.slt covering:

  • List + LargeList
  • LargeList + List
  • FixedSizeList + LargeList
  • Three-way mix List, LargeList, List

Each one also asserts arrow_typeof(...) = LargeList(Int64).

Are there any user-facing changes?

Queries that previously returned an internal cast error now return the concatenated LargeList as expected. No API changes.

…ist/LargeList inputs

`ArrayConcat::coerce_types` only coerced the base element type, leaving the
outer container variant unchanged. With a mix of `List` and `LargeList` (or
`FixedSizeList` and `LargeList`) inputs, `array_concat_inner` would later try
to downcast the `List` array to `GenericListArray<i64>` and fail with an
internal cast error.

When the resolved return type is `LargeList`, also promote each input's
outermost `List` to `LargeList` so the cast targets match what
`array_concat_inner` expects.
@github-actions github-actions Bot added sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels Apr 17, 2026
Copy link
Copy Markdown
Contributor

@Jefffrey Jefffrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch

}

fn coerce_types(&self, arg_types: &[DataType]) -> Result<Vec<DataType>> {
let base_type = base_type(&self.return_type(arg_types)?);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized something funny the existing implementation does; it calls return_type within coerce_types even though the docstring for return_type states that the input types are already coerced (i.e. coerce_types would have already been called) 😅

crm26 added a commit to crm26/datafusion that referenced this pull request Apr 20, 2026
Addresses round-2 review comments on apache#21542:
- Widen container variant in coerce_types when inputs mix List and
  LargeList (or FixedSizeList), so mixed-type calls like
  `cosine_distance([1.0, 0.0], arrow_cast([0.0, 1.0], 'LargeList(Float64)'))`
  succeed. Follows the pattern from PR apache#21704 (ArrayConcat).
- Coerce bare NULL inputs to a matching list variant so
  `cosine_distance(NULL, [1.0, 2.0])` returns NULL instead of erroring.
- Drop the `list_cosine_distance` alias — the base name is not
  `array_cosine_distance`, so the `array_X` -> `list_X` convention does
  not apply.
- Expand SLT coverage: mixed-type variants, FixedSizeList inputs,
  Float32 and Int64 inner types, bare NULL in each position, NULL row
  in a multi-row VALUES, and an unsupported-type plan error case.

Dispatch fallthrough in cosine_distance_inner is now unreachable after
the coerce_types widening, changed from exec_err! to internal_err!.

Part of apache#21536.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Jefffrey Jefffrey added this pull request to the merge queue Apr 21, 2026
@Jefffrey
Copy link
Copy Markdown
Contributor

Thanks @hcrosse

Merged via the queue into apache:main with commit e524f49 Apr 21, 2026
31 checks passed
pull Bot pushed a commit to buraksenn/datafusion that referenced this pull request May 3, 2026
## Which issue does this PR close?

Part of apache#21536 — split of apache#21371 into one-function-per-PR.

## Rationale for this change

Adds `inner_product(array1, array2)` — the dot product of two
equal-length numeric arrays, returning `Float64`. Computed as
`sum(array1[i] * array2[i])`.

## What changes are included in this PR?

Mirrors the structural pattern of merged apache#21542 (`cosine_distance`):

- Same `coerce_types` for `List`/`LargeList`/`FixedSizeList` of any
numeric inner type, with widening to `LargeList` when any input is
`LargeList` (per the apache#21704 pattern)
- Same NULL semantics: bare `NULL` → `NULL`, NULL row → NULL, NULL
element in list → NULL
- Same Arrow-idiomatic implementation: single
`as_float64_array(list_array.values())` downcast, slice by
`value_offsets()`, iterate via `ScalarBuffer<f64>`
- No alias, no shared module — standalone, inline math

The arithmetic is the only semantic divergence from `cosine_distance`:
- `dot += a*b` (no magnitude or normalization)
- Empty arrays return `0.0` (sum of empty set), not `NULL`
- No zero-magnitude special case (`inner_product([0,0], [1,2])` returns
`0`, which is well-defined for inner product)

## Are these changes tested?

Yes. SLT covers:
- Orthogonal, identical, opposite, general non-trivial vectors
- Single zero vector, both zero vectors
- Bare `NULL` in either or both positions
- NULL element inside a list (returns NULL for that row)
- Mismatched lengths (error)
- `LargeList` inputs
- Mixed `(List, LargeList)` in both orders
- `(FixedSizeList, FixedSizeList)` and `(FixedSizeList, LargeList)`
- `Float32` and `Int64` inner type coercion
- Multi-row query with NULL row propagation
- Empty arrays (returns `0`)
- No-args error
- Return-type assertion (`Float64`)

## Are there any user-facing changes?

New scalar function `inner_product`, documented in
`docs/source/user-guide/sql/scalar_functions.md`.

---------

Co-authored-by: Jeffrey Vo <jeffrey.vo.australia@gmail.com>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

array_concat fails with internal error on mixed List + LargeList inputs

2 participants