Skip to content

feat(toolkit): add --exclude-historical-balance for lite snapshot#6690

Open
halibobo1205 wants to merge 1 commit intotronprotocol:developfrom
halibobo1205:feat/toolkit_db_lite_opt
Open

feat(toolkit): add --exclude-historical-balance for lite snapshot#6690
halibobo1205 wants to merge 1 commit intotronprotocol:developfrom
halibobo1205:feat/toolkit_db_lite_opt

Conversation

@halibobo1205
Copy link
Copy Markdown
Collaborator

@halibobo1205 halibobo1205 commented Apr 16, 2026

What changed

Add a boolean flag --exclude-historical-balance (default false) tojava -jar Toolkit.jar db lite -o split -t snapshot. When enabled, the lite snapshot omits the balance-trace and account-trace stores.

Default behavior: unchanged

With the default, both split andmerge behave exactly as before this PR. The majority of operators
(running with historyBalanceLookup=off, the project default) need not think about this flag and will see no difference.

Opt-in behavior: smaller lite snapshots

Passing --exclude-historical-balance to split -t snapshot produces a lite snapshot that does not carry balance-trace or account-trace. For source full nodes that ran with historyBalanceLookup=true (where the two stores can grow to ~900 GB), which is what makes lite-snapshot slicing operationally feasible.

split -t history and merge ignore this flag — the merge pipeline itself is not modified by this PR.

Explicit warning at split time

When the flag is enabled, both the runtime output and the --help text spell out the contract:

  • The flag has functional impact only when the source full node runs with historyBalanceLookup=true. Default configured operators are unaffected either way.
  • If historyBalanceLookup was enabled, the loss is permanent: a lite node booted from such a snapshot cannot answer historical balance lookups (getBlockBalance / getAccountBalance), and running merge afterwards will not restore the feature.
  • Operators who need historical balance lookup on the resulting lite node must not enable this flag.

Design choices

  • Scope the flag strictly to split -t snapshot. The toolkit stays a thin physical-split tool; the merge pipeline is left intact, avoiding any need to reason about history-pack trace presence, bak replay, or cross-toolkit-version compatibility.
  • Keep the default at the legacy behavior. Existing workflows are not silently disrupted; operators who want the size saving opt in explicitly.
  • The loss only matters for historyBalanceLookup=true users. Under the default configuration, the trace stores are empty Spring-initialized directories, so excluding them changes nothing of substance.

Tests

  • Existing default-path tests remain unchanged (zero regression): DbLiteRocksDbTest, DbLiteRocksDbV2Test.
  • New DbLiteExcludeTraceRocksDbTest covers the opt-in path, asserting that the snapshot produced with --exclude-historical-balance contains neither balance-trace nor account-trace.

close #6597

Comment thread plugins/src/main/java/common/org/tron/plugins/DbLite.java Outdated
Comment thread plugins/src/main/java/common/org/tron/plugins/DbLite.java Outdated
Comment thread plugins/src/test/java/org/tron/plugins/DbLiteTest.java
Comment thread plugins/src/test/java/org/tron/plugins/DbLiteTest.java
Comment thread plugins/src/main/java/common/org/tron/plugins/DbLite.java Outdated
@waynercheung
Copy link
Copy Markdown
Collaborator

[NIT] A few minor cleanup items:

  • Please move the unrelated .gitignore hunk out of this PR so the review stays scoped.
  • The // delete account trace comment in DbLite.java reads as if it belongs to long blockNum = ...; please move/reword it so it describes the stream below more clearly.
  • The nested ternary in DbLiteTest.java is hard to parse; please extract it into a local variable before calling generateSomeTransactions(...).

Comment thread .gitignore Outdated
Comment thread plugins/src/main/java/common/org/tron/plugins/DbLite.java
Comment thread plugins/src/test/java/org/tron/plugins/rocksdb/DbLiteWithHistoryRocksDbTest.java Outdated
Comment thread plugins/src/main/java/common/org/tron/plugins/DbLite.java Outdated
Comment thread plugins/src/test/java/org/tron/plugins/DbLiteTest.java Outdated
Copy link
Copy Markdown
Collaborator

@xxo1shine xxo1shine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approved!

Comment thread plugins/src/main/java/common/org/tron/plugins/DbLite.java
@halibobo1205
Copy link
Copy Markdown
Collaborator Author

halibobo1205 commented Apr 24, 2026

problem summary

A · Pre-existing lite-mechanism defects (unrelated to this PR)

A1. transactionRetStore is not aligned with the block window
A lite node retains the most recent 65,535 blocks, but transactionRetStore does not hold a complete set of receipts for that same window — there are gaps.

A2. section-bloom and block body are asymmetric
section-bloom is kept in full, while block bodies are trimmed to the last 65,535. Queries can match the bloom filter but find no corresponding block/transaction.


B · Semantic nature of historical balance lookup

B1. State-tree semantics, not log semantics
balance-trace / account-trace must be accumulated continuously from genesis to correctly answer "what was the balance at any historical height." Any gap is a correctness bug.

B2. Only valid on a full node that had the feature enabled from genesis
A lite node has no legitimate "partially enabled" state — even if the data is present, a lite node cannot answer the query independently.

B3. The switch can be flipped at runtime
Because historyBalanceLookup can be toggled on/off at will, the same data directory may contain interleaved enabled/disabled segments. The original full node's store may itself already be corrupt by this definition.

B4. No persisted "enabled window" metadata on the node side
Nothing like firstEnabledBlockNum is written to disk, so neither the toolkit nor the runtime can determine which block range the data legitimately covers.


C · Current state of the code (important correction)

C1. The two trace DBs are currently treated as state stores
Before this PR, balance-trace / account-trace are classified alongside account and get packaged into the lite snapshot — they are not archive DBs.

C2. These two stores are close to 900 GB in size
On a full node with lookup enabled, the snapshot drags along that 900 GB, which makes the resulting lite snapshot operationally unusable (distribution, provisioning, and startup costs all become unacceptable).

C3. This is the actual motivation behind PR
Moving these DBs from snapshot into archiveDbs is what makes "slice a usable lite snapshot out of a lookup-enabled full node" actually feasible. The space benefit is not "≈0" — that only holds for the small case where lookup is off on both ends. In the real-world lookup-enabled scenario, the benefit is on the order of −900 GB.

C4. Illegal merge combinations still exist
Of the four (lite off/on × history off/on) combinations, only (on, on) from the same source is semantically valid; the other three can still silently happen. But given B3 + B4, any toolkit-level check can only catch the obvious misconfigurations, not the semantic ones — and a half-check offers false assurance.


D · Re-evaluation of this PR's value

D1. What the PR does
Moves the two trace DBs from the snapshot set into archiveDbs (so split -t history packages them, and merge restores them).

D2. Impact on the default user (lookup off on both ends — the majority)
The two DBs are Spring-initialized empty directories and never occupied meaningful space in the snapshot to begin with. The PR brings no net gain here, but also no cost — just two empty dirs travel via history instead.

D3. Impact on lookup-enabled full nodes (niche but real)
Removes 900 GB from the snapshot, turning lite-snapshot slicing from "infeasible" into "feasible." This is the core value of the PR.

D4. The functional loss is not something this PR introduces
A lite node cannot serve historical balance lookup — that is a consequence of B2, independent of this PR. Whether or not we merge this PR, the lite node cannot answer the query. The PR neither causes nor worsens this; it only makes lite-snapshotting a lookup-enabled node physically viable.

D5. The PR adds no warnings or validation
split does not notify the operator; merge does not check for illegal combinations.


E · Guiding principle

The toolkit is just a tool. The correctness of the node's own data is the operator's responsibility, and that responsibility should not be pushed onto the tool.

  • DbLite physically moves files between directories; it has no authoritative view of whether the source data is semantically valid (see B3, B4).
  • Any validation logic the toolkit layers on top can only reject obvious misconfigurations. It cannot — and should not be expected to — detect semantic corruption caused by runtime switch flipping, interleaved enabled windows, cross-source mixing, etc.
  • Adding heavy checks inside the toolkit therefore trades real engineering cost for a false sense of safety: users who skip node-side hygiene will still produce broken data; users who do node-side hygiene do not need the toolkit check.
  • The right place to enforce continuity guarantees is the node itself (persisted enabled-window metadata, immutable-once-enabled semantics, etc. — see B4 as a follow-up). Until that exists, the toolkit should stay thin, do exactly one physical job well, and document the semantic contract clearly rather than pretend to police it.

Corollary for this PR: keep the scope to the physical split; do not introduce validation or compatibility scaffolding that we cannot honor consistently. On the above reasoning, I consider this PR to have already absorbed enough compatibility work — it delivers the real, concrete need ("let a lookup-enabled node produce a usable lite snapshot") with the minimal code change needed to physically separate the two stores, without pulling semantic responsibilities into the toolkit that do not belong there.

cc @waynercheung @xxo1shine @Sunny6889

@waynercheung
Copy link
Copy Markdown
Collaborator

Reconsidered after @halibobo1205's A-E analysis. The operational motivation is valid, and "operator responsibility" is a reasonable product stance here, provided the limitation is documented clearly.

Withdrawing my earlier [MUST] on the snapshot-range query regression.
Given that the silent-0 behavior in Wallet.getAccountBalance is pre-existing (for historyBalanceLookup=false nodes), this PR enlarges rather than introduces that failure surface, so I no longer consider that point blocking for this PR.

Code changes LGTM. Ready to approve once one doc item lands; the other two are lightweight follow-ups.

[MUST] Update the PR description to state explicitly that lite snapshots no longer contain balance-trace / account-trace for retained blocks, and that getBlockBalance / getAccountBalance must not be relied on until merge has completed. If applicable, please also mention this in release notes / upgrade notes. Operator responsibility only works when operators are told about the changed contract up front.

[SHOULD] Please file a follow-up issue - this one is independent of the key-format question below, and is not covered by the "never happened / breaking change" argument. AccountTraceStore#getPrevBalance returning Pair.of(n, 0L) on empty lookup means Wallet.getAccountBalance surfaces balance=0 with no error signal. Because 0 is a legitimate balance value, clients cannot distinguish "account truly has 0" from "index isn't populated". This PR enlarges the set of configurations that trigger this. Just a tracking issue - no fix needed in this PR.

[SHOULD] Given that the on-disk key format is stable in practice, I'll drop the full helper/contract-test ask. But please add reciprocal cross-reference comments so the DbLite <-> AccountTraceStore coupling is discoverable:

chainbase/.../AccountTraceStore.java, above recordBalanceWithBlock:

      // NOTE: the composite key layout (address || Longs.toByteArray(
      // number ^ Long.MAX_VALUE)) is also reconstructed in plugins
      // DbLite#trimExtraHistory. Keep them in sync.

plugins/.../DbLite.java, above the composite key build:

      // Must match AccountTraceStore#recordBalanceWithBlock key layout.

@halibobo1205
Copy link
Copy Markdown
Collaborator Author

@waynercheung done.

Comment thread plugins/src/main/java/common/org/tron/plugins/DbLite.java
Comment thread plugins/src/main/java/common/org/tron/plugins/DbLite.java
Default behavior is unchanged: balance-trace and account-trace stay in
the lite snapshot as before, so default operators
(historyBalanceLookup=off) see no difference.

Opt-in via `--exclude-historical-balance=true` on `split -t snapshot`
excludes the two trace stores from the snapshot for size-conscious
operators. A loud warning is printed at split time noting that this
loss is permanent for nodes that had historyBalanceLookup=true (merge
cannot restore the feature) and that operators who need historical
balance lookup on the resulting lite node must NOT enable this flag.

`split -t history` and `merge` ignore the flag and continue using the
legacy 5-db archive set, so merge logic stays untouched.

Includes:
- DbLite: new CLI option, helper method, runtime warning.
- README: parameter documentation and worked example.
- DbLiteTest: 3-arg testTools overload and packaging-contract assertion.
- DbLiteExcludeHistoricalBalanceRocksDbTest: opt-in path coverage.

close tronprotocol#6597
@halibobo1205 halibobo1205 force-pushed the feat/toolkit_db_lite_opt branch from 3eeff09 to 5fad964 Compare April 27, 2026 12:51
@halibobo1205 halibobo1205 changed the title feat(toolkit): exclude historical balance DBs from lite snapshot feat(toolkit): add --exclude-historical-balance for lite snapshot Apr 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Exclude historical balance DBs from lite snapshot

4 participants