Skip to content

gpbackman: env-var fallback for --history-db and fail loud when DB is missing#97

Merged
tuhaihe merged 5 commits intoapache:mainfrom
talmacschen-arch:feat/gpbackman-history-db-fallback
Apr 30, 2026
Merged

gpbackman: env-var fallback for --history-db and fail loud when DB is missing#97
tuhaihe merged 5 commits intoapache:mainfrom
talmacschen-arch:feat/gpbackman-history-db-fallback

Conversation

@talmacschen-arch
Copy link
Copy Markdown

Fixes #96

Motivation

When gpbackman backup-info (or any other gpbackman subcommand) is run
without --history-db, the current code drops down to a relative
gpbackup_history.db resolved against the current directory. Two
problems follow:

  1. The cluster's real history DB lives under
    $COORDINATOR_DATA_DIRECTORY/gpbackup_history.db (or the legacy
    $MASTER_DATA_DIRECTORY on older installs), which is rarely the
    user's shell cwd. Operators end up typing the long path on every
    invocation despite the relevant env vars already being set by the
    standard cluster environment scripts.

  2. SQLite's default open mode silently creates a new empty database
    when the file is missing. The very next query then fails with
    no such table: backups, which is confusing and leaves an empty
    gpbackup_history.db litter in cwd.

Reproducer (current behaviour on main):

$ cd /tmp/empty
$ gpbackman backup-info
[ERROR]:-Unable to read data from history db. Error: no such table: backups
$ ls
gpbackup_history.db   # ← silently created, 0 rows, polluted cwd

Changes

  • gpbackman/cmd/wrappers.gogetHistoryDBPath resolves the path
    in order: --history-db$COORDINATOR_DATA_DIRECTORY
    $MASTER_DATA_DIRECTORY → bare filename in cwd (preserved for
    backward compatibility).
  • gpbackman/gpbckpconfig/utils_db.goOpenHistoryDB now
    pre-checks the file with os.Stat and opens SQLite via the
    file:<path>?mode=rw URI. Missing files surface a clear error that
    references --history-db and the env vars; no empty file is ever
    created.
  • Help text updated for the --history-db flag and every
    command's Long description; gpbackman/COMMANDS.md and
    gpbackman/README.md aligned.
  • Tests added/extended in wrappers_test.go (env-var resolution
    order, explicit-wins-over-env) and in utils_db_test.go (friendly
    error + no-file-created regression, happy-path open).

Backward compatibility

Fully compatible. The old "search the current directory" behaviour is
preserved as the last fallback, so any existing scripts that rely on
running gpbackman from a directory containing gpbackup_history.db
continue to work unchanged. Users who currently pass --history-db
explicitly are unaffected.

The only user-visible behaviour change is that running gpbackman
against a non-existent path now errors immediately with a clear message
instead of producing a misleading "no such table: backups" error after
silently creating an empty file. This is the intended fix.

How tested

  • go test ./gpbackman/cmd/... ./gpbackman/gpbckpconfig/... ./gpbackman/textmsg/... — all pass.
  • make build — five binaries built cleanly.
  • Manual smoke test against a real cluster history DB:
    • Empty cwd, no env → friendly error, no file created. ✅
    • COORDINATOR_DATA_DIRECTORY pointing at a non-existent path → friendly error mentioning the resolved path. ✅
    • COORDINATOR_DATA_DIRECTORY pointing at the coordinator data dir → backup list rendered without --history-db. ✅

Notes

…d when missing.

Two improvements to how gpbackman locates and opens gpbackup_history.db:

1. When --history-db is empty, fall back (in priority order) to
   $COORDINATOR_DATA_DIRECTORY and $MASTER_DATA_DIRECTORY before the
   current directory. These variables are exported by the standard
   Cloudberry/Greenplum environment scripts, so users running gpbackman
   from a sourced cluster shell no longer need to repeat --history-db
   on every invocation.

2. OpenHistoryDB now pre-checks the file with os.Stat and opens the
   SQLite database with the rw URI mode. A missing file produces a
   friendly error pointing at --history-db and the env vars, instead of
   silently creating an empty file that later fails with "no such table:
   backups" when callers issue queries.

The previous cwd-only fallback is preserved as a last resort. Tests
cover the new env-var resolution order and the safe-open behaviour.

Signed-off-by: chenqiang <chenqiang@hashdata.cn>
…ADME.md.

Replace the "current directory" wording everywhere it appeared with the
new resolution order: $COORDINATOR_DATA_DIRECTORY, $MASTER_DATA_DIRECTORY,
then the current directory.

Signed-off-by: chenqiang <chenqiang@hashdata.cn>
@tuhaihe tuhaihe requested a review from woblerr April 29, 2026 05:28
@talmacschen-arch talmacschen-arch force-pushed the feat/gpbackman-history-db-fallback branch from b566222 to ef444f4 Compare April 29, 2026 05:51
Comment thread gpbackman/cmd/wrappers.go Outdated
Comment thread gpbackman/README.md Outdated
Comment thread gpbackman/gpbckpconfig/utils_db.go
Comment thread gpbackman/cmd/wrappers.go Outdated
…y-db opt-in.

Address review feedback on the original PR:

* The env-var fallback ($COORDINATOR_DATA_DIRECTORY then
  $MASTER_DATA_DIRECTORY) is now opt-in via a new persistent flag
  --auto-load-history-db. The default behaviour (current-directory
  lookup) is unchanged from upstream main, so destructive subcommands
  (backup-delete, backup-clean, history-clean) cannot silently target
  the wrong cluster's history DB on a multi-cluster jump host.
* Drop the "Greenplum" mention in the comment for getHistoryDBPath
  to keep wording aligned with the Cloudberry environment scripts.
* Trim the verbose --history-db description in gpbackman/README.md
  and add a separate --auto-load-history-db line.
* Refine the not-found error message in OpenHistoryDB to mention the
  new flag rather than implying that setting env vars alone helps.
* Tests updated to cover the opt-in matrix: env vars are ignored when
  the flag is off, honoured when on, and an explicit --history-db
  still wins regardless.

The OpenHistoryDB safe-open change (os.Stat pre-check + mode=rw URI
to avoid silently creating an empty SQLite file) is preserved as-is.

Signed-off-by: chenqiang <chenqiang@hashdata.cn>
Comment thread gpbackman/gpbckpconfig/utils_db.go Outdated
…istoryDB.

Previously a permission-denied or I/O failure on the history DB path
returned the bare os.Stat error, which gave no hint about which
operation or path failed. Wrap it with fmt.Errorf using %w so
errors.Is/As still work while surfacing "stat history db <path>"
in the message, matching the friendly tone of the ErrNotExist branch.
Copy link
Copy Markdown

@MisterRaindrop MisterRaindrop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment thread gpbackman/cmd/constants.go Outdated
@talmacschen-arch
Copy link
Copy Markdown
Author

talmacschen-arch commented Apr 29, 2026

@woblerr CI s3_plugin_e2e failed in the locale-setup step (dnf 404s
on Rocky 9.7 mirrors — see #issue-link if filed). It's unrelated to
this PR's changes; could you re-run the failed job when convenient?
Thanks!

Cloudberry switched from MASTER_DATA_DIRECTORY to COORDINATOR_DATA_DIRECTORY
starting around release 1.5.4, and the standard environment scripts no
longer export the legacy variable. Maintaining a fallback for it in
gpbackman just adds surface area (constants, help text, docs, tests) for
a value that should not exist in a current Cloudberry install.

Drop MASTER_DATA_DIRECTORY entirely from the resolution chain, the flag
help, the per-command long help strings, README/COMMANDS docs, and the
corresponding test cases. The flag description is shortened in root.go
to match the concise wording already used in README. The lingering
"Cloudberry/Greenplum" comment in constants.go is updated to "Cloudberry"
in the same pass.

Resolution chain after this change:
  explicit --history-db
    -> $COORDINATOR_DATA_DIRECTORY (only when --auto-load-history-db is set)
      -> current working directory (default)

Addresses review feedback from woblerr, tuhaihe, and MisterRaindrop on
PR apache#97. The pre-existing MASTER_DATA_DIRECTORY reference in
.github/workflows/cloudberry-backup-ci.yml predates this PR and is
unrelated to history-db resolution; it is left for a separate cleanup.
@talmacschen-arch
Copy link
Copy Markdown
Author

@tuhaihe @woblerr — pushed 2d88689 with the MASTER_DATA_DIRECTORY
cleanup we agreed on (per @woblerr's request and @tuhaihe @MisterRaindrop's
+1). The new workflow runs are sitting in action_required (GitHub's
external-fork CI gate). Could one of you approve them when convenient?

cc @MisterRaindrop for visibility — feel free to re-review once CI clears.

@woblerr
Copy link
Copy Markdown
Collaborator

woblerr commented Apr 29, 2026

LGTM, thank you for your efforts.

@MisterRaindrop
Copy link
Copy Markdown

LGTM

@tuhaihe tuhaihe merged commit 7741382 into apache:main Apr 30, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] gpbackman: --history-db should fall back to $COORDINATOR_DATA_DIRECTORY, and OpenHistoryDB should not silently create empty files

5 participants