Conversation
Mainly for compatibility with the PHP REST API
|
Warning Rate limit exceeded
To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (51)
WalkthroughThis change adds dataset tag removal support plus related type and test updates. Two async DB helpers were added: Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 0/1 reviews remaining, refill in 4 minutes and 2 seconds.Comment |
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- In
untag_dataset,tagis annotated only withSystemString64but notBody(), so FastAPI will treat it as a query parameter; this conflicts with the tests sending it in the JSON body, so consider wrappingtagwithBody()or using a request model to make bothdata_idandtagbody fields. - The new
get_taghelper does aSELECT *even though onlyuploaderis used; consider selecting just the needed columns (e.g.,uploader) to keep the query minimal and the intent of the helper clearer.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `untag_dataset`, `tag` is annotated only with `SystemString64` but not `Body()`, so FastAPI will treat it as a query parameter; this conflicts with the tests sending it in the JSON body, so consider wrapping `tag` with `Body()` or using a request model to make both `data_id` and `tag` body fields.
- The new `get_tag` helper does a `SELECT *` even though only `uploader` is used; consider selecting just the needed columns (e.g., `uploader`) to keep the query minimal and the intent of the helper clearer.
## Individual Comments
### Comment 1
<location path="tests/routers/openml/dataset_untag_test.py" line_range="48-61" />
<code_context>
+ assert str(dataset_id) in e.value.detail
+
+
+async def test_dataset_untag_tag_not_owned(expdb_test: AsyncConnection) -> None:
+ dataset_id = 1
+ tag = "foo"
+ await expdb_test.execute(
+ text("INSERT INTO dataset_tag(id, tag, uploader) VALUES (:dataset_id, :tag, 1)"),
+ parameters={"dataset_id": dataset_id, "tag": tag},
+ )
+
+ with pytest.raises(TagNotOwnedError) as e:
+ await untag_dataset(dataset_id, tag, SOME_USER, expdb_test)
+
+ assert e.value.status_code == HTTPStatus.FORBIDDEN
+ assert tag in e.value.detail
+ assert str(dataset_id) in e.value.detail
+
+
</code_context>
<issue_to_address>
**suggestion (testing):** Add an assertion that the tag remains in the database when the user is not allowed to remove it
This test only checks that `TagNotOwnedError` and its message are correct. It should also query `dataset_tag` after the failure and assert the tag still exists, mirroring the success-case tests, to ensure the tag isn’t deleted before the permission check.
```suggestion
async def test_dataset_untag_tag_not_owned(expdb_test: AsyncConnection) -> None:
dataset_id = 1
tag = "foo"
await expdb_test.execute(
text("INSERT INTO dataset_tag(id, tag, uploader) VALUES (:dataset_id, :tag, 1)"),
parameters={"dataset_id": dataset_id, "tag": tag},
)
with pytest.raises(TagNotOwnedError) as e:
await untag_dataset(dataset_id, tag, SOME_USER, expdb_test)
assert e.value.status_code == HTTPStatus.FORBIDDEN
assert tag in e.value.detail
assert str(dataset_id) in e.value.detail
# Ensure the tag was not deleted when the user is not allowed to remove it
result = await expdb_test.execute(
text(
"SELECT tag FROM dataset_tag "
"WHERE id = :dataset_id AND tag = :tag"
),
parameters={"dataset_id": dataset_id, "tag": tag},
)
row = result.fetchone()
assert row is not None
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #323 +/- ##
==========================================
+ Coverage 93.69% 93.90% +0.21%
==========================================
Files 68 69 +1
Lines 3154 3249 +95
Branches 223 227 +4
==========================================
+ Hits 2955 3051 +96
Misses 139 139
+ Partials 60 59 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
🧹 Nitpick comments (2)
src/routers/openml/datasets.py (1)
86-104: ⚡ Quick winMissing
logger.infoafter successful untag.
tag_datasetlogs each tag addition (Line 77); the corresponding untag operation has no structured log, creating an observability gap for auditing who removed which tag.🪵 Proposed addition
await database.datasets.delete_tag(data_id, tag, expdb_db) + logger.info("Dataset {data_id} untagged '{tag}'.", data_id=data_id, tag=tag)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/routers/openml/datasets.py` around lines 86 - 104, Add a structured info log after a successful untag in the untag_dataset handler: after the await database.datasets.delete_tag(data_id, tag, expdb_db) call, call logger.info with a clear message including the user identifier (user.user_id), data_id and tag, and any context fields (e.g., {"user_id": user.user_id, "data_id": data_id, "tag": tag}) so removals are auditable; ensure you import or use the same logger instance used by tag_dataset so logs are consistent.tests/routers/openml/dataset_untag_test.py (1)
18-20: UseSOME_USER.user_idinstead of hardcoded2in the INSERT statement.The test hardcodes
uploader=2while making a request asApiKey.SOME_USER(whose fixture user_id is also 2). If the fixture's user_id is ever changed, the INSERT will silently cause the endpoint to return 403 instead of 204, breaking the test. ReferenceSOME_USER.user_iddirectly in the parameters to keep the test's intent explicit and resilient to fixture changes.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/routers/openml/dataset_untag_test.py` around lines 18 - 20, The INSERT in the test uses a hardcoded uploader id 2 causing fragility; update the expdb_test.execute call that runs the INSERT INTO dataset_tag(...) to pass uploader equal to SOME_USER.user_id instead of the literal 2 (i.e., replace the hardcoded uploader value in the SQL parameters with the fixture SOME_USER.user_id) so the test uses the fixture's actual user_id when creating the tag and remains correct if the fixture changes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/routers/openml/datasets.py`:
- Around line 86-104: Add a structured info log after a successful untag in the
untag_dataset handler: after the await database.datasets.delete_tag(data_id,
tag, expdb_db) call, call logger.info with a clear message including the user
identifier (user.user_id), data_id and tag, and any context fields (e.g.,
{"user_id": user.user_id, "data_id": data_id, "tag": tag}) so removals are
auditable; ensure you import or use the same logger instance used by tag_dataset
so logs are consistent.
In `@tests/routers/openml/dataset_untag_test.py`:
- Around line 18-20: The INSERT in the test uses a hardcoded uploader id 2
causing fragility; update the expdb_test.execute call that runs the INSERT INTO
dataset_tag(...) to pass uploader equal to SOME_USER.user_id instead of the
literal 2 (i.e., replace the hardcoded uploader value in the SQL parameters with
the fixture SOME_USER.user_id) so the test uses the fixture's actual user_id
when creating the tag and remains correct if the fixture changes.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 59ef5e5a-935b-429c-b493-71f336def752
📒 Files selected for processing (3)
src/database/datasets.pysrc/routers/openml/datasets.pytests/routers/openml/dataset_untag_test.py
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
src/routers/openml/tasks.py (1)
419-421: ⚡ Quick win
get_taskstill uses plainint— consider aligning with the PR'sIdentifierpattern.Every other task/dataset/run/flow/setup
get_*endpoint in this PR now usesIdentifier(positive-int validated).get_taskwas missed.♻️ Suggested one-line fix
`@router.get`("/{task_id}") async def get_task( - task_id: int, + task_id: Identifier,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/routers/openml/tasks.py` around lines 419 - 421, The get_task endpoint currently types task_id as plain int; change its parameter type to the Identifier validated type (e.g., async def get_task(task_id: Identifier)) and add the corresponding import for Identifier at top of the module to align with other endpoints; ensure any usages inside get_task (references to task_id) remain valid with Identifier.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/routers/openml/datasets.py`:
- Around line 117-119: The except clause is using Python 2 comma syntax and will
raise a SyntaxError; change it to catch the two exceptions using a tuple:
replace the line "except DatasetNotFoundError, DatasetNoAccessError:" with an
"except (DatasetNotFoundError, DatasetNoAccessError):" clause in the same block
(the handler that builds msg and re-raises DatasetNotFoundError(msg, code=472)
from None), leaving the existing message and raise logic unchanged.
- Around line 100-103: After calling untag_dataset and fetching tags via
database.datasets.get_tags_for (see variables tags and data_id), guard against
tags being empty before indexing: if len(tags) == 0 set return_tags = [] (empty
list), if len(tags) == 1 set return_tags = tags[0] (unwrapped single tag),
otherwise keep return_tags = tags; then return {"data_untag": {"id":
str(data_id), "tag": return_tags}}. Ensure this logic replaces the current
unconditional tags[0] access to avoid IndexError in untag_dataset / get_tags_for
handling.
In `@tests/conftest.py`:
- Around line 186-192: Remove the unused "table" entry from the parameters dict
passed alongside the f-string SQL in tests/conftest.py: the query string already
interpolates table via f"INSERT INTO {table}(...)" and uses bind parameters
:identifier, :tag, :user_id, so delete the "table": table key from the dict
(leave identifier, tag, and user_id/OWNER_USER.user_id intact and keep the
existing # noqa: S608 comment).
---
Nitpick comments:
In `@src/routers/openml/tasks.py`:
- Around line 419-421: The get_task endpoint currently types task_id as plain
int; change its parameter type to the Identifier validated type (e.g., async def
get_task(task_id: Identifier)) and add the corresponding import for Identifier
at top of the module to align with other endpoints; ensure any usages inside
get_task (references to task_id) remain valid with Identifier.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: ad51f9ae-3ecd-42ce-95bd-0d43d3ca0915
📒 Files selected for processing (13)
pyproject.tomlsrc/routers/openml/datasets.pysrc/routers/openml/flows.pysrc/routers/openml/qualities.pysrc/routers/openml/runs.pysrc/routers/openml/setups.pysrc/routers/openml/study.pysrc/routers/openml/tasks.pysrc/routers/types.pytests/conftest.pytests/routers/openml/dataset_untag_test.pytests/routers/openml/datasets_get_test.pytests/routers/openml/setups_untag_test.py
✅ Files skipped from review due to trivial changes (1)
- pyproject.toml
Adds the dataset untag functionality. Introduces two endpoints:
POST /datasets/untagwhich mimicks the I/O of the PHP API (modulo error handling)DEL /datasets/{id}/tag?tag=...which is more semantically correct.