Skip to content

Refactor Dataset Tag Function#322

Merged
PGijsbers merged 15 commits intomainfrom
tags
May 2, 2026
Merged

Refactor Dataset Tag Function#322
PGijsbers merged 15 commits intomainfrom
tags

Conversation

@PGijsbers
Copy link
Copy Markdown
Contributor

Rely on the database to identify if the dataset is absent or the tag already exists.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 2, 2026

Warning

Rate limit exceeded

@pre-commit-ci[bot] has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 53 minutes and 50 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 0ea2ff1f-60d6-49be-89a3-be7a7515abf7

📥 Commits

Reviewing files that changed from the base of the PR and between 196d486 and d9814e1.

📒 Files selected for processing (2)
  • .gitignore
  • src/routers/openml/datasets.py

Walkthrough

This PR enhances error handling in the dataset tagging workflow across multiple layers. It introduces a new database exceptions module defining constraint violation exceptions, modifies the database layer to catch and re-raise IntegrityError as domain-specific exceptions, updates the REST API endpoint to use an Identifier type for validation and handle database-layer exceptions, and adds corresponding test coverage for dataset-not-found scenarios. The workflow configuration is simplified by removing Docker container exit code error handling.

Possibly related PRs

Suggested labels

tests

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 37.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Refactor Dataset Tag Function' clearly describes the main change—refactoring the dataset tagging logic to rely on database-level validation.
Description check ✅ Passed The description directly relates to the changeset by explaining the core motivation: moving dataset validation and duplicate detection to the database layer rather than handling it elsewhere.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch tags

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 0/1 reviews remaining, refill in 53 minutes and 50 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • In tag_dataset, the TypeError message for a non-AsyncConnection expdb_db uses {type(expdb_db)} inside a normal string; convert this to an f-string (or use %s) so the actual type is interpolated instead of the literal placeholder.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `tag_dataset`, the TypeError message for a non-`AsyncConnection` `expdb_db` uses `{type(expdb_db)}` inside a normal string; convert this to an f-string (or use `%s`) so the actual type is interpolated instead of the literal placeholder.

## Individual Comments

### Comment 1
<location path="src/routers/openml/datasets.py" line_range="65-66" />
<code_context>
-    assert expdb_db is not None  # noqa: S101
-    tags = await database.datasets.get_tags_for(data_id, expdb_db)
-    if tag.casefold() in [t.casefold() for t in tags]:
+    if not isinstance(expdb_db, AsyncConnection):
+        msg = "`expdb_db` should be an AsyncConnection, is {type(expdb_db)}"
+        raise TypeError(msg)
+
</code_context>
<issue_to_address>
**issue:** The TypeError message isn’t interpolating `type(expdb_db)` as intended.

`msg` is defined as a normal string, so `{type(expdb_db)}` will be rendered literally. Make this an f-string (or use `%s`/`.format`) so the actual type is shown, e.g. `f"`expdb_db` should be an AsyncConnection, is {type(expdb_db)!r}"`.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread src/routers/openml/datasets.py Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
docs/contributing/tests.md (1)

31-37: ⚡ Quick win

Polish wording: use “Elasticsearch” (not “elastic search”).

The docs say: “Some tests require elastic search indices to be constructed.” Standard term is Elasticsearch; this also avoids style/grammar flags (e.g., hyphenation/compound-word suggestions).

Suggested change
-Some tests require elastic search indices to be constructed.
+Some tests require Elasticsearch indices to be constructed.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/contributing/tests.md` around lines 31 - 37, Update wording in the docs
to use the proper product name "Elasticsearch" instead of "elastic search"
wherever it appears (e.g., the sentence that currently reads "Some tests require
elastic search indices to be constructed.") and ensure the variable name
INDEX_ES_DURING_STARTUP and reference to `docker/php/.env` remain unchanged;
simply replace "elastic search" with "Elasticsearch" throughout the paragraph to
correct casing and compound-word usage.
.github/workflows/tests.yml (1)

33-39: ⚡ Quick win

Make the .env flip more robust (avoid brittle exact-string sed).

Right now the workflow only replaces the exact substring INDEX_ES_DURING_STARTUP=false. If the .env file contains whitespace/quotes (e.g., INDEX_ES_DURING_STARTUP = false or 'false') or the value is already true, the replacement may silently not occur—leading to Elasticsearch index-dependent tests behaving inconsistently.

Suggested change
       if [ "${{ matrix.php_api }}" = "true" ]; then
-            sed -i 's/INDEX_ES_DURING_STARTUP=false/INDEX_ES_DURING_STARTUP=true/' docker/php/.env
+            # Flip INDEX_ES_DURING_STARTUP in a whitespace/quote-tolerant way; fail if the key isn't found.
+            if ! sed -i -E 's/^(INDEX_ES_DURING_STARTUP[[:space:]]*=[[:space:]]*)("|'\"''\"'\"'|)?false("|'\"''\"'\"'|)?[[:space:]]*$/\1true/Ig' docker/php/.env; then
+              echo "Failed to update INDEX_ES_DURING_STARTUP in docker/php/.env" >&2
+              exit 1
+            fi
             services="$services php-api"
       fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/tests.yml around lines 33 - 39, The current sed invocation
that targets the exact string INDEX_ES_DURING_STARTUP=false is brittle; update
the workflow step that builds services (the block setting services="python-api"
and the sed invocation) to use a more robust regex-based edit against
docker/php/.env so it matches the key regardless of surrounding whitespace or
quotes and sets its value to true (and if the key is missing, append
INDEX_ES_DURING_STARTUP=true to the file). Ensure the change updates lines like
INDEX_ES_DURING_STARTUP = 'false' or "false" and is idempotent if the value is
already true.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In @.github/workflows/tests.yml:
- Around line 33-39: The current sed invocation that targets the exact string
INDEX_ES_DURING_STARTUP=false is brittle; update the workflow step that builds
services (the block setting services="python-api" and the sed invocation) to use
a more robust regex-based edit against docker/php/.env so it matches the key
regardless of surrounding whitespace or quotes and sets its value to true (and
if the key is missing, append INDEX_ES_DURING_STARTUP=true to the file). Ensure
the change updates lines like INDEX_ES_DURING_STARTUP = 'false' or "false" and
is idempotent if the value is already true.

In `@docs/contributing/tests.md`:
- Around line 31-37: Update wording in the docs to use the proper product name
"Elasticsearch" instead of "elastic search" wherever it appears (e.g., the
sentence that currently reads "Some tests require elastic search indices to be
constructed.") and ensure the variable name INDEX_ES_DURING_STARTUP and
reference to `docker/php/.env` remain unchanged; simply replace "elastic search"
with "Elasticsearch" throughout the paragraph to correct casing and
compound-word usage.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 15336861-a4d5-4cac-bb7b-c44320c7156f

📥 Commits

Reviewing files that changed from the base of the PR and between 6350bbf and 2bd8068.

📒 Files selected for processing (7)
  • .github/workflows/tests.yml
  • docs/contributing/tests.md
  • src/database/datasets.py
  • src/routers/openml/datasets.py
  • src/routers/types.py
  • tests/constants.py
  • tests/routers/openml/dataset_tag_test.py

@codecov
Copy link
Copy Markdown

codecov Bot commented May 2, 2026

Codecov Report

❌ Patch coverage is 96.29630% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 93.69%. Comparing base (6350bbf) to head (d9814e1).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/database/datasets.py 81.81% 1 Missing and 1 partial ⚠️

❌ Your patch status has failed because the patch coverage (96.29%) is below the target coverage (100.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #322      +/-   ##
==========================================
+ Coverage   93.66%   93.69%   +0.02%     
==========================================
  Files          67       68       +1     
  Lines        3109     3154      +45     
  Branches      221      223       +2     
==========================================
+ Hits         2912     2955      +43     
- Misses        138      139       +1     
- Partials       59       60       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

PGijsbers added 2 commits May 2, 2026 20:07
Previously there was a database startup service, which
would cause docker compose up to exit with error code 1
(see also the referenced issue). But since this is no
longer the case, we can just rely on normal tool exit code.
There is no status that depends on it, and it's not required to
make sure that the `docker compose` command exits with code 1
if startup fails.
@PGijsbers PGijsbers merged commit 3e7c4fc into main May 2, 2026
7 of 8 checks passed
@PGijsbers PGijsbers deleted the tags branch May 2, 2026 18:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant