Skip to content

Fix performance issues caused by inefficient DB usage#617

Open
varmar05 wants to merge 1 commit intodevelopfrom
fix_db_performance_issues
Open

Fix performance issues caused by inefficient DB usage#617
varmar05 wants to merge 1 commit intodevelopfrom
fix_db_performance_issues

Conversation

@varmar05
Copy link
Copy Markdown
Collaborator

@varmar05 varmar05 commented May 1, 2026

Three endpoints were hammering the database with query counts that scaled with the number of files or projects in a request (N+1 problem).

GET /v1/resource/history reduced from 6 + 4N queries to a fixed 7.

  • FileHistory.changes() now finds the effective since boundary with a single SQL query (instead of fetching all history and slicing in Python), and eagerly loads version and version.project columns via contains_eager + load_only so no lazy loads occur per row
  • FileHistory.diff converted from property (DB query on every access) to cached_property (computed once, stored in dict)
  • 'expiration' field excluded from the response schema (not relevant anyway), eliminating the per-row version.project lookup it required

GET /v1/project/<name>?since= reduced from 3 + M + N×M queries to a fixed 5 (where M = versioned files, N = versions in range):

  • All FileHistory rows across all versioned files are fetched in a single query with between on version name, then partitioned by file in Python with stop-at-CREATE logic applied per file
  • FileDiff records batched in a single query across all files in one pass

POST /v1/project/by_names reduced from up to 100 + 2N queries to 3.

  • Workspace names are extracted from the request as a unique set and resolved in a single get_by_names() call (one WHERE name IN (...) query) instead of one get_by_name() call per project entry
  • Projects are fetched with a single WHERE (workspace_id, name) IN (...) query using tuple_().in_()
  • Project member usernames are fetched in a single query joined from ProjectUser
  • get_by_names() added to the Workspace interface

…issues

GET /v1/resource/history
GET /v1/project/<name>
POST /v1/project/by_names

were causing N+1 query issues
@varmar05 varmar05 requested a review from MarcelGeo May 1, 2026 12:07
@coveralls
Copy link
Copy Markdown

Coverage Report for CI Build 25213714944

Coverage decreased (-0.009%) to 93.132%

Details

  • Coverage decreased (-0.009%) from the base build.
  • Patch coverage: 6 uncovered changes across 2 files (65 of 71 lines covered, 91.55%).
  • No coverage regressions found.

Uncovered Changes

File Changed Covered %
server/mergin/sync/public_api_controller.py 55 50 90.91%
server/mergin/sync/interfaces.py 2 1 50.0%

Coverage Regressions

No coverage regressions found.


Coverage Stats

Coverage Status
Relevant Lines: 9858
Covered Lines: 9181
Line Coverage: 93.13%
Coverage Strength: 0.93 hits per line

💛 - Coveralls

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to reduce N+1 database access in the v1 sync API by batching history, project, workspace, and diff lookups instead of resolving them row-by-row.

Changes:

  • Reworks file-history loading for /v1/resource/history and /v1/project?...&since= to fetch histories and diffs in bulk.
  • Refactors /v1/project/by_names to batch-resolve workspaces, projects, and project members.
  • Updates the public v1 schema/tests to stop exposing the expiration field in history responses.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
server/mergin/tests/test_project_controller.py Updates history endpoint expectations to remove expiration.
server/mergin/sync/workspace.py Adds get_by_names() to the global workspace handler.
server/mergin/sync/public_api.yaml Removes expiration from documented v1 history payloads.
server/mergin/sync/public_api_controller.py Batches history/diff/project/workspace loading in several public API endpoints.
server/mergin/sync/models.py Changes FileHistory.diff to cached_property and adjusts FileHistory.changes() query loading.
server/mergin/sync/interfaces.py Extends the workspace handler interface with get_by_names().

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 739 to +746
FileHistory.query.join(ProjectFilePath)
.join(FileHistory.version)
.join(ProjectVersion.project)
.options(
contains_eager(FileHistory.version)
.load_only(ProjectVersion.name, ProjectVersion.project_id)
.contains_eager(ProjectVersion.project)
.load_only(Project.storage_params)
Comment on lines +575 to +578
for key, workspace, name in valid_projects:
result = found_map.get((workspace.id, name))
if result:
results[key] = ProjectListSchema(context=ctx).dump(result)
Comment on lines 1124 to 1127
- added
- updated
- removed
expiration:
nullable: true
type: string
format: date-time
example: 2019-02-26T08:47:58.636074Z
UploadFileInfo:
Comment on lines +534 to 546
workspaces_by_name = {
ws.name: ws for ws in current_app.ws_handler.get_by_names(unique_ws_names)
}

results = {}
for project in list_of_projects:
projects = projects_query(ProjectPermissions.Read, as_admin=False)
splitted = project.split("/")
if len(splitted) != 2:
results[project] = {"error": 404}
valid_projects = [] # list of (key, workspace, project_name)
for key in list_of_projects:
parts = key.split("/")
if len(parts) != 2:
results[key] = {"error": 404}
continue
ws = splitted[0]
name = splitted[1]
workspace = current_app.ws_handler.get_by_name(ws)
workspace = workspaces_by_name.get(parts[0])
if not workspace:
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants