Skip to content

Add gpbackup_exporter — a Prometheus exporter for gpbackup history database.#87

Open
woblerr wants to merge 12 commits intoapache:mainfrom
woblerr:add_gpbackup_exporter
Open

Add gpbackup_exporter — a Prometheus exporter for gpbackup history database.#87
woblerr wants to merge 12 commits intoapache:mainfrom
woblerr:add_gpbackup_exporter

Conversation

@woblerr
Copy link
Copy Markdown
Collaborator

@woblerr woblerr commented Apr 13, 2026

Added gpbackup_exporter - a Prometheus exporter for collecting metrics from the gpbackup history database (gpbackup_history.db).

It is based on the original gpbackup_exporter project and has been adapted for integration into the cloudberry-backup repository.

Motivation

gpbackup does not expose built-in Prometheus metrics. gpbackup_exporter fills this gap. This allows integrating backup health monitoring into existing Prometheus/Grafana stacks.

Features

gpbackup_exporter exposes the following Prometheus metrics:

Backup metrics:

  • gpbackup_backup_status — backup status (success / failure);
  • gpbackup_backup_deletion_status — backup deletion status;
  • gpbackup_backup_info — backup info (version, compression, plugin, etc.);
  • gpbackup_backup_duration_seconds — backup duration in seconds.

Last backup metrics:

  • gpbackup_backup_since_last_completion_seconds — seconds elapsed since the last completed backup per database and backup type.

Exporter self-metrics:

  • gpbackup_exporter_status — gpbackup exporter get data status;
  • gpbackup_exporter_build_info — information about gpbackup exporter.

Filtering flags:

  • --gpbackup.db-include / --gpbackup.db-exclude — limit collection to specific databases;
  • --gpbackup.backup-type — limit collection to a specific backup type;
  • --gpbackup.collect-deleted / --gpbackup.collect-failed — include deleted / failed backups;
  • --collect.depth — collect metrics only for backups not older than N days;
  • --web.config.file — TLS and basic authentication support via the Prometheus exporter toolkit.

Verification

Unit tests pass without errors:

$ make unit

[1776080594] TOC Suite - 45/45 specs ••••••••••••••••••••••••••••••••••••••••••••• SUCCESS! 2.217417ms PASS
[1776080594] testutils tests - 8/8 specs •••••••• SUCCESS! 552.083µs PASS
[1776080594] Textmsg Suite - 14/14 specs •••••••••••••• SUCCESS! 604.292µs PASS
[1776080594] Gpbckpconfig Suite - 47/47 specs ••••••••••••••••••••••••••••••••••••••••••••••• SUCCESS! 1.909583ms PASS
[1776080594] utils tests - 118/118 specs •••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• SUCCESS! 103.433167ms PASS
[1776080594] restore tests - 118/118 specs •••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• SUCCESS! 85.166959ms PASS
[1776080594] Cmd Suite - 14/14 specs •••••••••••••• SUCCESS! 3.561667ms PASS
[1776080594] Filepath Suite - 31/31 specs ••••••••••••••••••••••••••••••• SUCCESS! 2.903958ms PASS
[1776080594] Options Suite - 27/27 specs ••••••••••••••••••••••••••• SUCCESS! 6.135125ms PASS
[1776080594] Exporter Suite - 37/37 specs ••••••••••••••••••••••••••••••••••••• SUCCESS! 18.97375ms PASS
[1776080594] History Suite - 8/8 specs •••••••• SUCCESS! 15.766416ms PASS
[1776080594] Report Suite - 34/34 specs •••••••••••••••••••••••••••••••••• SUCCESS! 9.29725ms PASS
[1776080594] s3_plugin tests - 32/32 specs •••••••••••••••••••••••••••••••• SUCCESS! 4.222791ms PASS
[1776080594] backup tests - 585/586 specs •••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••S••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••S••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••S••••••••••••••••••••••••••••••••S••••••••••••••••••••••P••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••S••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••S•• SUCCESS! 172.169042ms PASS

Ginkgo ran 14 suites in 10.961838459s
Test Suite Passed

Open Questions

  1. The exporter uses promslog for logging, which outputs in logfmt or json format — the standard for Prometheus exporters. Other tools in this repository use gplog from cloudberry-go-libs. Should gpbackup_exporter be migrated to gplog for consistency, or is it acceptable to keep the Prometheus logger?

  2. The exporter's Makefile targets use additional -ldflags (GIT_REVISION, GIT_BRANCH, BUILD_DATE) injected into github.com/prometheus/common/version. This is required by the Prometheus exporter convention to populate gpbackup_exporter_build_info metric. Other tools in the repo use a simpler single -X version=... flag pattern. Is this acceptable?

Related links

@woblerr woblerr requested review from robertmu and tuhaihe April 13, 2026 12:30
@tuhaihe tuhaihe requested review from leborchuk and ostinru April 24, 2026 08:53
Comment thread gpbackup_exporter.go Outdated
Comment thread exporter/gpbckp_exporter.go
Comment thread gpbackup_exporter.go
Comment thread exporter/README.md
Comment thread gpbackup_exporter.go
Copy link
Copy Markdown

@MisterRaindrop MisterRaindrop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest adding more reviewers.

Comment thread exporter/gpbckp_exporter.go Outdated
// When db is specified in both include and exclude lists, a warning is displayed in the log
// and data for this db is not collected.
// It is necessary to set zero metric value for this db.
getDataSuccessStatus = false
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getDataSuccessStatus lives for the whole loop. Once an include/exclude conflict flips it to false, every later healthy DB read at
L114 (dbStatus[db] = getDataSuccessStatus) is also recorded as failed, producing spurious gpbackup_exporter_status = 0 and false

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 1e0a784

Clarifying the logic of inclusion/exclusion processing:

  • DB present in both dbInclude and dbExclude - warn and emit gpbackup_exporter_status=0.
  • DB only in dbExclude - skip, emit no metrics.
  • dbInclude empty or DB in dbInclude - process and emit full backup metrics.

Also added additional unit tests.

}
getExporterStatusMetrics(dbStatus, setUpMetricValue, logger)
} else {
logger.Warn("No backup data returned")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

backupConfigs, err := parseBackupData(...)
  if err != nil {
      logger.Error("Get data failed", "err", err)
      getDataSuccessStatus = false              // dead on this path
  }
  resetMetrics()
  if len(backupConfigs) != 0 {
      ...
      getExporterStatusMetrics(dbStatus, ...)   // only called here
  } else {
      logger.Warn("No backup data returned")    // log only, no metric
  }

log only, no metric

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case of an error when receiving data or empty information, we do not generate metrics.

This suggests that there is a problem with getting data or with data itself.
In this case we don't provide metrics, except gpbackup_exporter_build_info metric.

Comment thread end_to_end/exporter_test.go Outdated
"--collect.interval", "600",
"--web.listen-address", fmt.Sprintf("127.0.0.1:%d", port),
)
cmd.Start()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is better Expect(cmd.Start()).To(Succeed())

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed in 40ce322

@liang8283
Copy link
Copy Markdown

Verification

Step Result
make BIN_DIR=/tmp/pr87-bin build ✅ all 6 binaries compile, no warnings
gpbackup_exporter --version ✅ exit 0, prints version=2.2.0, branch=add_gpbackup_exporter, revision=6a29090, goversion=go1.24.13, tags=gpbackup_exporter
gpbackup_exporter --help ✅ exit 0, all documented flags present
ginkgo -r exporter/ ✅ 37/37 specs pass in 0.09s (matches PR body claim)
live scrape vs real /data0/master/gpseg-1/gpbackup_history.db (8 prior metadata-only backups) ✅ 177-line /metrics, 8× each of gpbackup_backup_{status,info,duration_seconds,deletion_status}, 1× gpbackup_backup_since_last_completion_seconds, 1× gpbackup_exporter_status{database_name="perf_acl_test"} 1, 1× gpbackup_exporter_build_info populated. Durations and labels match gpbackman back up-info output.

woblerr added 9 commits April 29, 2026 19:14
Introduce the Prometheus metrics exporter as a new standalone binary within the repository.

- Create gpbackup_exporter.go entry point with CLI flags and collection loop.
- Port core exporter logic to the new `exporter/` package.
- Adapt type system to use `*history.BackupConfig` directly, aligning with the cloudberry-backup architecture.
- Switch receiver methods to standalone gpbckpconfig helpers.
- Add Prometheus and Kingpin dependencies to `go.mod`.
- Update Makefile to support building, testing, and packaging.
- Port unit tests to Ginkgo/Gomega.
Add e2e test that runs  gpbackup commands and validates that gpbackup_exporter correctly reads history database and exposes metrics.

Also remove dead gpbackmanPath assignment from useOldBackupVersion in e2e tests.
And suppress noisy test logger output by writing to bytes.Buffer instead of os.Stdout in exporter unit tests.
SIGINT means "interrupt," signaling a user wants to stop the current operation. SIGTERM means "terminate," requesting a polite program shutdown. So' it's correct to consider as graceful shutdown
To prevent a panic with an error like "panic: http: invalid pattern",  exit in case of an empty endpoint.
@woblerr woblerr force-pushed the add_gpbackup_exporter branch from 6a29090 to 292a9dd Compare April 29, 2026 16:14
woblerr added 2 commits April 29, 2026 23:49
Remove unused getDataSuccessStatus.

Clarify include/exclude handling:
* DB present in both dbInclude and dbExclude - warn and emit gpbackup_exporter_status=0.
* DB only in dbExclude - skip, emit no metrics.
* dbInclude empty or DB in dbInclude -  process and emit full backup metrics.

Add unit tests.
@woblerr woblerr force-pushed the add_gpbackup_exporter branch 2 times, most recently from 79f2e43 to 3ae938b Compare April 29, 2026 21:08
@woblerr woblerr force-pushed the add_gpbackup_exporter branch from 3ae938b to de77f35 Compare April 29, 2026 21:14
@woblerr
Copy link
Copy Markdown
Collaborator Author

woblerr commented Apr 29, 2026

The description for gpbackup_exporter_status metric metric has been updated in de77f35 to match the current behaviour.

Copy link
Copy Markdown

@MisterRaindrop MisterRaindrop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants