Proposal:
There's room for tiny refactorings in the Objects/dictobject.c that may allow speedups, for example via better CPU cache behavior, compiler hints, cheap checks & fast exits for common cases. In my guess-timation, they may bring about upto 10% overall lookup/insert performance gains. That's more of an un-scientific intuition at best right now, but I'm laying some ideas out here:
-
compare_unicode_unicode(): can benefit from pointer-equality and branch prediction, and reducing unnecessary function calls
-
compare_generic(): can use early rejections on hash-mismatch, avoiding RichCompareBool calls
-
unicodekeys_lookup_unicode(): pointer-equality for the common case can precede do_lookup which today is invoked everytime for all cases
-
find_empty_slot(): early termination / separate path for first iteration can speed up insertions, especially for lightly-populated dicts
-
build_indices_generic / build_indices_unicode (): breaking out the first-probe success case may speed up dict resize operations
I welcome feedback on how to raise this proposal differently, but otherwise I'd like to work on these ideas and quantify them. I feel modern CPUs have deep pipelines and these optimizations may help us avoid some cache-thrashing. These changes are purely performance oriented and don't change functionality / API / ABI and should be thread-safe too.
Has this already been discussed elsewhere?
This is a minor feature, which does not need previous discussion elsewhere
Links to previous discussion of this feature:
No response
Proposal:
There's room for tiny refactorings in the
Objects/dictobject.cthat may allow speedups, for example via better CPU cache behavior, compiler hints, cheap checks & fast exits for common cases. In my guess-timation, they may bring about upto 10% overall lookup/insert performance gains. That's more of an un-scientific intuition at best right now, but I'm laying some ideas out here:compare_unicode_unicode(): can benefit from pointer-equality and branch prediction, and reducing unnecessary function callscompare_generic(): can use early rejections on hash-mismatch, avoidingRichCompareBoolcallsunicodekeys_lookup_unicode(): pointer-equality for the common case can precededo_lookupwhich today is invoked everytime for all casesfind_empty_slot(): early termination / separate path for first iteration can speed up insertions, especially for lightly-populated dictsbuild_indices_generic / build_indices_unicode (): breaking out the first-probe success case may speed up dict resize operationsI welcome feedback on how to raise this proposal differently, but otherwise I'd like to work on these ideas and quantify them. I feel modern CPUs have deep pipelines and these optimizations may help us avoid some cache-thrashing. These changes are purely performance oriented and don't change functionality / API / ABI and should be thread-safe too.
Has this already been discussed elsewhere?
This is a minor feature, which does not need previous discussion elsewhere
Links to previous discussion of this feature:
No response