Explicitly dump symbol cache after generating the heap dump#4452
Merged
Conversation
Contributor
There was a problem hiding this comment.
Code Review
This pull request introduces the backtrace dependency and updates the heap dump generation in crates/observe/src/heap_dump_handler.rs to explicitly clear the global symbol cache via backtrace::clear_symbol_cache() after a dump is generated. This prevents memory bloat from accumulated resolved identifiers. No critical issues were found, and there is no feedback to provide.
Contributor
Author
|
Had to format a bunch of |
jmg-duarte
approved these changes
May 28, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
After the release we started to see OOM crashes of the baseline solver. One thing that stood out to me was that pods had a significant jump in memory consumption 1h after they got started.
This aligns with the side car that occasionally generates a heap dump. It first waits for 1h to allow a baseline memory usage to establish itself and then triggers the memory dump.
The memory dump gets generated in the target process (e.g. baseline solver, autopilot, etc.). When the dump happens a huge chunk of memory gets allocated and subsequently never freed.
After going through the internals of the crate we use for that the issue became apparent. When generating the heap dump the code generates a backtrace which in turn requires a bunch of identifiers to be resolved. Those resolved identifiers get put into a cache and don't get freed. (1, 2, 3)
Changes
After generating a heap dump we explicitly call
backtrace::clear_symbol_cache()to free all the memory that was only allocated while dumping the memory.This lowers the baseline memory consumption of our pods significantly and has the nice side effect that we eliminate that now freed memory from the heap dump itself. So specifically we no longer see the ~190MB of "overhead" and really only see the memory used by the process we care about.
Note while this PR is definitely an improvement we should consider it's still not entirely clear to me why the baseline solver pods suddenly ran into this issue. The code related to this didn't change for a while now.
How to test
Build a custom image from this PR, deployed to staging, and manually triggered a couple of memory dumps.
