Skip to content

Work around slow thread renaming in ClickHouse.#33

Merged
jmcarp merged 1 commit into
masterfrom
jmcarp/clickhouse-pthread
May 13, 2026
Merged

Work around slow thread renaming in ClickHouse.#33
jmcarp merged 1 commit into
masterfrom
jmcarp/clickhouse-pthread

Conversation

@jmcarp
Copy link
Copy Markdown
Contributor

@jmcarp jmcarp commented May 6, 2026

We observed that ClickHouse can spend upwards of 80% of its cpu time in pthread_setname_np. This happens because (1) ClickHouse constantly renames its threads for debugging purposes, and (2) thread renaming is relatively expensive on illumos. Since we don't make use of this debugging path, we can work around the performance issue by bailing out of ClickHouse's setThreadName helper early.

Note: this patch can't be upstreamed, so a proper long-term fix would involve adding a faster thread rename facility in illumos.

Fixes https://github.com/oxidecomputer/customer-support/issues/1101. h/t @wfchandler and @JustinAzoff, who found the bug and did the research.

We observed that ClickHouse can spend upwards of 80% of its cpu time in
pthread_setname_np. This happens because (1) ClickHouse constantly
renames its threads for debugging purposes, and (2) thread renaming is
relatively expensive on illumos. Since we don't make use of this
debugging path, we can work around the performance issue by bailing out
of ClickHouse's setThreadName helper early.

Note: this patch can't be upstreamed, so a proper long-term fix would
involve adding a faster thread rename facility in illumos.

Fixes oxidecomputer/customer-support#1101. h/t
@wfchandler and @JustinAzoff, who found the bug and did the research.
Copy link
Copy Markdown
Contributor

@bnaecker bnaecker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this seems ok for us, since we don't really use the name at this point. It's probably worth filing a host OS issue for the underlying problem, that setting the name appears much slower than other OS's.

Thanks to you, @wfchandler and @JustinAzoff for tracking this down!

@dancrossnyc
Copy link
Copy Markdown

I wonder if you could avoid the patch entirely, but preloading a small shared object that overloads the, pthread_setname_np symbol with a nop function. It's probably not worth it.

@jmcarp
Copy link
Copy Markdown
Contributor Author

jmcarp commented May 6, 2026

I wonder if you could avoid the patch entirely, but preloading a small shared object that overloads the, pthread_setname_np symbol with a nop function.

Yes, but that requires another patch. It turns out that clickhouse unsets LD_PRELOAD on start, so we'd also have to patch it for that approach anyway.

@dancrossnyc
Copy link
Copy Markdown

Oh, the irony. :-/

jmcarp added a commit to oxidecomputer/omicron that referenced this pull request May 13, 2026
We found in
oxidecomputer/customer-support#1101 that
clickhouse runs a huge number of thread renames, which consure the
majority of cpu time on some workloads. Since we don't need the
debugging feature that this renaming supports, we patched out os-level
thread renames in
oxidecomputer/garbage-compactor#33. This patch
updates omicron to fetch the latest clickhouse archive from s3. Note
that we also introduce a hash-based suffix to the clickhouse version to
represent changes in oxide-specific patches applied to the clickhouse
source version.

Note that we also merged a more principled fix into upstream clickhouse
for this issue: ClickHouse/ClickHouse#104410.
Once we upgrade to a current clickhouse version, we'll drop the patch in
garbage-compactor.

h/t as usual @wfchandler and @JustinAzoff for the investigation.
@jmcarp jmcarp merged commit 6ccac24 into master May 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants