Skip to content

Explicitly dump symbol cache after generating the heap dump#4452

Merged
MartinquaXD merged 5 commits into
mainfrom
free-symbol-cache-after-dump
May 28, 2026
Merged

Explicitly dump symbol cache after generating the heap dump#4452
MartinquaXD merged 5 commits into
mainfrom
free-symbol-cache-after-dump

Conversation

@MartinquaXD
Copy link
Copy Markdown
Contributor

@MartinquaXD MartinquaXD commented May 28, 2026

Description

After the release we started to see OOM crashes of the baseline solver. One thing that stood out to me was that pods had a significant jump in memory consumption 1h after they got started.
This aligns with the side car that occasionally generates a heap dump. It first waits for 1h to allow a baseline memory usage to establish itself and then triggers the memory dump.
The memory dump gets generated in the target process (e.g. baseline solver, autopilot, etc.). When the dump happens a huge chunk of memory gets allocated and subsequently never freed.
After going through the internals of the crate we use for that the issue became apparent. When generating the heap dump the code generates a backtrace which in turn requires a bunch of identifiers to be resolved. Those resolved identifiers get put into a cache and don't get freed. (1, 2, 3)

Changes

After generating a heap dump we explicitly call backtrace::clear_symbol_cache() to free all the memory that was only allocated while dumping the memory.
This lowers the baseline memory consumption of our pods significantly and has the nice side effect that we eliminate that now freed memory from the heap dump itself. So specifically we no longer see the ~190MB of "overhead" and really only see the memory used by the process we care about.

Note while this PR is definitely an improvement we should consider it's still not entirely clear to me why the baseline solver pods suddenly ran into this issue. The code related to this didn't change for a while now.

How to test

Build a custom image from this PR, deployed to staging, and manually triggered a couple of memory dumps.
Screenshot 2026-05-28 at 12 28 49

@MartinquaXD MartinquaXD requested a review from a team as a code owner May 28, 2026 12:38
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the backtrace dependency and updates the heap dump generation in crates/observe/src/heap_dump_handler.rs to explicitly clear the global symbol cache via backtrace::clear_symbol_cache() after a dump is generated. This prevents memory bloat from accumulated resolved identifiers. No critical issues were found, and there is no feedback to provide.

@MartinquaXD
Copy link
Copy Markdown
Contributor Author

MartinquaXD commented May 28, 2026

Had to format a bunch of toml files since a new version of tombi came out.

Copy link
Copy Markdown
Contributor

@squadgazzz squadgazzz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥

@MartinquaXD MartinquaXD added this pull request to the merge queue May 28, 2026
Merged via the queue into main with commit b54766d May 28, 2026
3 checks passed
@MartinquaXD MartinquaXD deleted the free-symbol-cache-after-dump branch May 28, 2026 13:28
@github-actions github-actions Bot locked and limited conversation to collaborators May 28, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants