Fix handling of surrogate pairs in length calculation#4462
Fix handling of surrogate pairs in length calculation#4462totkeks wants to merge 4 commits intoPowerShell:masterfrom
Conversation
|
@microsoft-github-policy-service agree |
|
I think this pull request needs a unit-test to demonstrate the requirements, but a couple unit tests would be nice. |
|
Oh I'm so sorry, we should've gotten to this sooner. Looks like the correct fix the identified bug to me. |
I reviewed and they seem to correctly cover the reported (and solved) issues with glyphs / surrogate pairs.
|
Hey @totkeks I merged the main branch in (it's sadly still called "master") and added some Claude generated unit tests which looked correct to me. Can you take a look and confirm the assertions are covering the bugfix? |
And short circuits faster if at end of the string.
andyleejordan
left a comment
There was a problem hiding this comment.
Looks good to me. @daxian-dbw?
| if ((i + 1) < end && char.IsSurrogatePair(c, str[i + 1])) | ||
| { | ||
| sum++; | ||
| i++; // Skip the low surrogate | ||
| continue; | ||
| } |
There was a problem hiding this comment.
This doesn't look right. It effectively counts every surrogate pair as 1 cell in terminal display, but that's not the case for almost all surrogate pairs. For example, 😀 and 👉 are both 2-cell width.
I intended to rewrite the length calculation using the lib Wcwidth and rewrite the cursor movement code using System.Globalization.StringInfo, but just didn't get the time to invest into this area ...
There was a problem hiding this comment.
The root cause lies in LengthInBufferCells, but for the reported issue, it's because LengthInBufferCells returns a smaller number than the actual buffer cells taken by the '❌' emoji.
The '❌' emoji is the Unicode point U+274C. It's a single 16-byte Unicode character, not a surrogate pair. However, it takes 2 buffer cells (1 char -> 2 cells in this case). So, for a prompt like '❌ ', the buffer length should be 3, but LengthInBufferCells returns 2 incorrectly.
PR Summary
Adds new conditions to handle surrogate pairs when calculating the buffer length of a string.
Fixes #3857.
PR Checklist
Microsoft Reviewers: Open in CodeFlow