Move public skills to a directory to avoid downloading the whole repo#1519
Merged
timsaucer merged 2 commits intoapache:mainfrom Apr 29, 2026
Merged
Move public skills to a directory to avoid downloading the whole repo#1519timsaucer merged 2 commits intoapache:mainfrom
timsaucer merged 2 commits intoapache:mainfrom
Conversation
timsaucer
added a commit
to timsaucer/datafusion-python
that referenced
this pull request
Apr 29, 2026
Upstream apache#1519 moved the root `SKILL.md` to `skills/datafusion_python/SKILL.md` so that consumers can install the skill without cloning the whole repo. Update all repo-internal links and external GitHub URLs in the docs site, README, AGENTS.md, and the package docstring to point at the new location. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
timsaucer
added a commit
that referenced
this pull request
May 3, 2026
* docs: publish SKILL.md on the docs site via myst include
Adds a new `skill` page that embeds the repo-root `SKILL.md` through the
myst `{include}` directive, so the agent-facing guide lives on the
published docs site without duplication. The page is wired into the
User Guide toctree. Implements PR 4a of the plan in #1394.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs: publish llms.txt at docs site root
Adds `docs/source/llms.txt` in llmstxt.org schema: a short description
plus categorized links to the agent skill, user guide pages, DataFrame
API reference, and example queries. `html_extra_path` in `conf.py`
copies it verbatim to the published site root so it resolves at
`https://datafusion.apache.org/python/llms.txt`. Implements PR 4b of
the plan in #1394.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs: add write-dataframe-code contributor skill
Adds `.ai/skills/write-dataframe-code/SKILL.md`, a contributor-facing
skill for agents working on this repo. It layers on top of the
user-facing repo-root SKILL.md with:
- a TPC-H pattern index mapping idiomatic API usages to the query file
that demonstrates them,
- an ad-hoc plan-comparison workflow for checking DataFrame translations
against a reference SQL query via `optimized_logical_plan()`, and
- the project-specific docstring and aggregate/window documentation
conventions that CLAUDE.md already enforces for contributors.
Implements PR 4c of the plan in #1394.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs: add audit-skill-md skill
Adds `.ai/skills/audit-skill-md/SKILL.md`, a contributor skill that
cross-references the repo-root `SKILL.md` against the current public
Python API (functions module, DataFrame, Expr, SessionContext, and
package-root re-exports). Reports two classes of drift:
- new APIs exposed by the Python surface that are not yet covered in
the user-facing guide, and
- stale mentions in the guide that no longer exist in the public API.
The skill is diff-only — it produces a report the user reviews before
any edit to `SKILL.md`. Complements `check-upstream/`, which audits in
the opposite direction (upstream Rust features not yet exposed).
Implements PR 4d of the plan in #1394.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs: enrich RST pages with demos relocated from TPC-H rewrite
Moves the illustrative patterns that #1504 removed from the TPC-H
examples into the common-operations docs, where they serve as
pattern-focused teaching material without cluttering the TPC-H
translations:
- expressions.rst gains a "Testing membership in a list" section
comparing `|`-compound filters, `in_list`, and `array_position` +
`make_array`, plus a "Conditional expressions" section contrasting
switched and searched `case`.
- udf-and-udfa.rst gains a "When not to use a UDF" subsection
showing the compound-OR predicate that replaces a Python-side UDF
for disjunctive bucket filters (the Q19 case).
- aggregations.rst gains a "Building per-group arrays" subsection
covering `array_agg(filter=..., distinct=True)` with
`array_length`/`array_element` for the single-value-per-group
pattern (the Q21 case).
- Adds `examples/array-operations.py`, a runnable end-to-end
walkthrough of the membership and array_agg patterns.
Implements PR 4e of the plan in #1394.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs: wire new contributor skills and plan-comparison diagnostic into AGENTS.md
- List the three contributor skills (`check-upstream`,
`write-dataframe-code`, `audit-skill-md`) under the Skills section so
agents know what tools they have before starting work.
- Document the plan-comparison diagnostic workflow (comparing
`ctx.sql(...).optimized_logical_plan()` against a DataFrame's
`optimized_logical_plan()` via `LogicalPlan.__eq__`) for translating
SQL queries to DataFrame form. Points at the full write-up in the
`write-dataframe-code` skill rather than duplicating it.
`CLAUDE.md` is a symlink to `AGENTS.md`, so the change lands in both.
Implements PR 4f of the plan in #1394.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs: rename aggregations.rst demo df to orders_df to avoid clobbering state
The "Building per-group arrays" block added in the previous commit
reassigned `df` and `ctx` mid-page, which then broke the
Grouping Sets examples further down that share the Pokemon `df`
binding (`col_type_1` etc. no longer resolved). Rename the demo
DataFrame to `orders_df` and drop the redundant `ctx = SessionContext()`
so the shared state from the top of the page stays intact.
Verified with `sphinx-build -W --keep-going` against the full docs
tree.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs: replace raw SKILL.md include with a human-written AI-assistants page
The previous approach embedded the repo-root `SKILL.md` on the docs
site via a myst `{include}`. That file is written for agents -- dense,
skill-formatted, and not suited to a human browsing the User Guide. It
also relied on a fragile `:start-line:` offset to strip YAML
frontmatter.
Replace it with `docs/source/ai-coding-assistants.md`, a short
human-readable page that mirrors the README section added in #1503:
what the skill is, how to install it via `npx skills` or a manual
pointer, and what kinds of things it covers. `SKILL.md` stays at the
repo root as the single source of truth; agents fetch the raw GitHub
URL directly.
`llms.txt` is updated to point its Agent Guide entry at
`raw.githubusercontent.com/.../SKILL.md` and to include the new
human-readable page as a secondary link. The User Guide toctree now
references `ai-coding-assistants` in place of the removed `skill`
stub.
Verified with `sphinx-build -W --keep-going` against the full docs
tree.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs: drop redundant assistants list in ai-coding-assistants intro
The introduction and the "Installing the skill" section both enumerated
the same set of supported assistants. Drop the intro copy; the list
that matters is next to `npx skills add`, where it answers "what does
this command actually configure?"
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs: convert ai-coding-assistants page from markdown to rst, shorten title
Every other page in `docs/source/user-guide` and the top-level
`docs/source` is written in reStructuredText; the lone `.md` page was
an inconsistency. Rewrite in rst so the ASF header matches the rest of
the tree, cross-references can use `:py:func:` roles if we ever add
any, and myst is no longer required just to render this one page.
Also shorten the page title from "Using DataFusion with AI Coding
Assistants" to "Using AI Coding Assistants" -- it already sits under
the DataFusion user guide so the product name is redundant.
Verified with `sphinx-build -W --keep-going`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs: drop audit-skill-md skill
The skill as written pushed for every public method to be mentioned
in `SKILL.md`, which is the wrong goal. `SKILL.md` is a distilled
agent guide of idiomatic patterns and pitfalls, not an API reference
-- autoapi-generated docs and module docstrings already provide full
per-method coverage. An audit pressing for 100% method coverage would
bloat the skill file into a stale copy of that reference.
The two checks with actual value (stale mentions in `SKILL.md`, and
drift between `functions.__all__` and the categorized function list)
are small enough to be ad-hoc greps at release time and do not
warrant a dedicated skill.
Also remove references to the skill from `AGENTS.md` and the
`write-dataframe-code` skill's "Related" section.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs: drop write-dataframe-code skill
A separate PR covers the same contributor-facing material (TPC-H
pattern index, plan-comparison workflow, docstring conventions),
so this skill is redundant. Remove the skill directory and the
corresponding references in `AGENTS.md`, including the
plan-comparison section that pointed at it.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs: show Parquet pushdown plan diff in "When not to use a UDF"
The previous version of the section asserted that a UDF predicate
blocks optimizer rewrites but did not show evidence. Replace the two
`code-block` examples with an executable walkthrough that writes a
small Parquet file, runs the same filter two ways, and prints the
physical plan for each.
The native-expression plan renders with three annotations on the
`DataSourceExec` node that the UDF plan does not have:
- `predicate=brand@1 = A AND qty@2 >= 150` pushed into the scan
- `pruning_predicate=... brand_min@0 <= A AND ... qty_max@4 >= 150`
for row-group pruning via Parquet footer min/max stats
- `required_guarantees=[brand in (A)]` for bloom-filter / dictionary
skipping
The UDF form keeps only `predicate=brand_qty_filter(...)`: the scan
has to materialize every row group and call the Python callback.
The disjunctive-OR rewrite (previously the main example) stays at the
end as the idiomatic alternative for multi-bucket filters.
Verified with `sphinx-build -W --keep-going`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs: rework "subsets within a group" aggregation example
Rename the section from "Building per-group arrays" to "Comparing subsets
within a group" so the heading matches the content. Rewrite the intro to
lead with the problem (compare full group vs filtered subset), reframe
the worked example around partially failed orders, and replace the
trailing bullet list with a one-line walkthrough of the result.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs: clarify "When not to use a UDF" intro
Rewrite the opening of the section to make three things clearer: the
contrast is with native DataFusion expressions (not Python in general),
some predicates genuinely feel easier to write as a Python loop and that
tension is worth acknowledging, and predicate pushdown is a table-provider
mechanism rather than a Parquet-only feature. Parquet stays as the
concrete demo.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs: move ai-coding-assistants under user-guide/
The page was sitting at the top level of docs/source/ while every other
page in the USER GUIDE toctree lives under docs/source/user-guide/.
Move the file, update the toctree entry, and update the absolute URL
in llms.txt to match the new path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs: replace AGENTS.md skill list with discovery instructions
A static skill list in AGENTS.md goes stale as new skills are added
(it already missed the make-pythonic skill that was merged separately).
Replace the enumerated list with a pointer telling agents to list
.ai/skills/ and read each SKILL.md frontmatter, so the catalog never
has to be hand-maintained.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs: fix broken llms.txt link and stale otherwise xref
- ai-coding-assistants.rst: use absolute https://datafusion.apache.org/python/llms.txt URL; the relative `llms.txt` resolved to /python/user-guide/llms.txt and 404'd because html_extra_path publishes the file at the site root.
- expressions.rst: drop the broken `:py:meth:~datafusion.expr.Expr.otherwise` xref (otherwise lives on CaseBuilder, not Expr) and spell the recommended replacement as `f.when(f.in_list(...), value).otherwise(default)`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs: update SKILL.md path after move to skills/datafusion_python/
Upstream #1519 moved the root `SKILL.md` to `skills/datafusion_python/SKILL.md`
so that consumers can install the skill without cloning the whole repo. Update
all repo-internal links and external GitHub URLs in the docs site, README,
AGENTS.md, and the package docstring to point at the new location.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Closes #1518.
Rationale for this change
It is unexpected to install the whole repo for the skill
What changes are included in this PR?
Moves the shareable public skills to /skills
verify with:
npx skills add https://github.com/rerun-io/datafusion-python/tree/move_skillCan also confirm no extra skills are detected by appending
--listAre there any user-facing changes?
No