Add gpbackup_exporter — a Prometheus exporter for gpbackup history database.#87
Add gpbackup_exporter — a Prometheus exporter for gpbackup history database.#87woblerr wants to merge 12 commits intoapache:mainfrom
Conversation
| // When db is specified in both include and exclude lists, a warning is displayed in the log | ||
| // and data for this db is not collected. | ||
| // It is necessary to set zero metric value for this db. | ||
| getDataSuccessStatus = false |
There was a problem hiding this comment.
getDataSuccessStatus lives for the whole loop. Once an include/exclude conflict flips it to false, every later healthy DB read at
L114 (dbStatus[db] = getDataSuccessStatus) is also recorded as failed, producing spurious gpbackup_exporter_status = 0 and false
There was a problem hiding this comment.
Fixed in 1e0a784
Clarifying the logic of inclusion/exclusion processing:
- DB present in both dbInclude and dbExclude - warn and emit gpbackup_exporter_status=0.
- DB only in dbExclude - skip, emit no metrics.
- dbInclude empty or DB in dbInclude - process and emit full backup metrics.
Also added additional unit tests.
| } | ||
| getExporterStatusMetrics(dbStatus, setUpMetricValue, logger) | ||
| } else { | ||
| logger.Warn("No backup data returned") |
There was a problem hiding this comment.
backupConfigs, err := parseBackupData(...)
if err != nil {
logger.Error("Get data failed", "err", err)
getDataSuccessStatus = false // dead on this path
}
resetMetrics()
if len(backupConfigs) != 0 {
...
getExporterStatusMetrics(dbStatus, ...) // only called here
} else {
logger.Warn("No backup data returned") // log only, no metric
}
log only, no metric
There was a problem hiding this comment.
In case of an error when receiving data or empty information, we do not generate metrics.
This suggests that there is a problem with getting data or with data itself.
In this case we don't provide metrics, except gpbackup_exporter_build_info metric.
| "--collect.interval", "600", | ||
| "--web.listen-address", fmt.Sprintf("127.0.0.1:%d", port), | ||
| ) | ||
| cmd.Start() |
There was a problem hiding this comment.
is better Expect(cmd.Start()).To(Succeed())
Verification
|
Introduce the Prometheus metrics exporter as a new standalone binary within the repository. - Create gpbackup_exporter.go entry point with CLI flags and collection loop. - Port core exporter logic to the new `exporter/` package. - Adapt type system to use `*history.BackupConfig` directly, aligning with the cloudberry-backup architecture. - Switch receiver methods to standalone gpbckpconfig helpers. - Add Prometheus and Kingpin dependencies to `go.mod`. - Update Makefile to support building, testing, and packaging. - Port unit tests to Ginkgo/Gomega.
Add e2e test that runs gpbackup commands and validates that gpbackup_exporter correctly reads history database and exposes metrics. Also remove dead gpbackmanPath assignment from useOldBackupVersion in e2e tests. And suppress noisy test logger output by writing to bytes.Buffer instead of os.Stdout in exporter unit tests.
SIGINT means "interrupt," signaling a user wants to stop the current operation. SIGTERM means "terminate," requesting a polite program shutdown. So' it's correct to consider as graceful shutdown
To prevent a panic with an error like "panic: http: invalid pattern", exit in case of an empty endpoint.
6a29090 to
292a9dd
Compare
Remove unused getDataSuccessStatus. Clarify include/exclude handling: * DB present in both dbInclude and dbExclude - warn and emit gpbackup_exporter_status=0. * DB only in dbExclude - skip, emit no metrics. * dbInclude empty or DB in dbInclude - process and emit full backup metrics. Add unit tests.
79f2e43 to
3ae938b
Compare
3ae938b to
de77f35
Compare
|
The description for |
Added
gpbackup_exporter- a Prometheus exporter for collecting metrics from the gpbackup history database (gpbackup_history.db).It is based on the original gpbackup_exporter project and has been adapted for integration into the cloudberry-backup repository.
Motivation
gpbackup does not expose built-in Prometheus metrics.
gpbackup_exporterfills this gap. This allows integrating backup health monitoring into existing Prometheus/Grafana stacks.Features
gpbackup_exporterexposes the following Prometheus metrics:Backup metrics:
gpbackup_backup_status— backup status (success / failure);gpbackup_backup_deletion_status— backup deletion status;gpbackup_backup_info— backup info (version, compression, plugin, etc.);gpbackup_backup_duration_seconds— backup duration in seconds.Last backup metrics:
gpbackup_backup_since_last_completion_seconds— seconds elapsed since the last completed backup per database and backup type.Exporter self-metrics:
gpbackup_exporter_status— gpbackup exporter get data status;gpbackup_exporter_build_info— information about gpbackup exporter.Filtering flags:
--gpbackup.db-include/--gpbackup.db-exclude— limit collection to specific databases;--gpbackup.backup-type— limit collection to a specific backup type;--gpbackup.collect-deleted/--gpbackup.collect-failed— include deleted / failed backups;--collect.depth— collect metrics only for backups not older than N days;--web.config.file— TLS and basic authentication support via the Prometheus exporter toolkit.Verification
Unit tests pass without errors:
Open Questions
The exporter uses
promslogfor logging, which outputs inlogfmtorjsonformat — the standard for Prometheus exporters. Other tools in this repository usegplogfromcloudberry-go-libs. Shouldgpbackup_exporterbe migrated togplogfor consistency, or is it acceptable to keep the Prometheus logger?The exporter's
Makefiletargets use additional-ldflags(GIT_REVISION,GIT_BRANCH,BUILD_DATE) injected intoproxy.fjygbaifeng.eu.org/prometheus/common/version. This is required by the Prometheus exporter convention to populategpbackup_exporter_build_infometric. Other tools in the repo use a simpler single-X version=...flag pattern. Is this acceptable?Related links