sped up Tokenizer::dump()#5009
Conversation
c58378b to
e4f46ac
Compare
|
It looks like the pointer address is just used as a unique ID for the objects in the dump. @danmar? If that is the case (and as the pointer string representation is different on Windows and Linux) we could use something more light-weight, simple and portable instead. Or simply use just one representation. |
fb4bbd7 to
c3058d2
Compare
|
Scanning with
|
d95f72c to
e184436
Compare
|
I know that the new code is horrible but that's what we have to pay for having overengineered, unusable garbage in the standard (also looking at you I am aware that we might also be able use the |
66172f1 to
63da8d1
Compare
Yes it is a unique ID. In python it can be any arbitrary string, doesn't have to be numeric. But the premium addon expects that it's a hexadecimal value. I am not against that you change it if you think there is faster approach. |
As you see the value is different on several platforms and might not even be a hexadecimal value ( So I would keep the current format for now and prepare another PR which generates a consistent output across all platform. You could then test that with premium as well. Also if we run into issues we have a separate commit to revert which only contains the changes in the identifier. |
Personally I don't think it's horrible. The ostream never made me feel excited anyway. |
Yes it does. If there is a "0x" or not does not matter. It always uses base 16 when converting the string. Even if the string says |
|
As far as I see there are many opportunities to "fold" newlines.. if you do that I don't have any negative opinion about this. looks OK to merge then. |
| // a-f / A-F | ||
| c = 55 + temp; | ||
| #if !defined(_WIN32) || defined(__MINGW32__) | ||
| c += 32; // add 32 for lowercase |
There was a problem hiding this comment.
there is no technical reason to use lower case as far as I know.
There was a problem hiding this comment.
It is necessary to match the result of std::ostringstream. As mentioned before I will do the portable approach in another PR.
There was a problem hiding this comment.
if we really want to lowercase, I think that either c = 'a' + temp; or c += 'a' - 'A'; would be more elegant.
There was a problem hiding this comment.
I don't see why it's important to match std::ostringstream though. the only possible usage I can think of for this function is when generating the dump file or debug output.
There was a problem hiding this comment.
People might depend on the format. Also I want that change separate so it can easily be reverted if necessary. I will also put it behind a define for a release or two.
There was a problem hiding this comment.
if we really want to lowercase, I think that either
c = 'a' + temp;orc += 'a' - 'A';would be more elegant.
I think the - 10 is better as it highlights that we are dealing with a "base 16"/hex value as input.
| return value.valueKind == valueKind; | ||
| }); | ||
| out << " " << tok->str() << " "; | ||
| outs += " "; |
There was a problem hiding this comment.
Probably - if it were a single character... but I guess that is just a typo.
There was a problem hiding this comment.
It seems the code using a char is much slower similar to the stream insertion case I mentioned in another comment. Will investigate but that is out-of-scope of this PR.
There was a problem hiding this comment.
yes it should have been a single character that was a typo 👍
|
Regarding single character char vs. string literal in stream insertion I encountered something weird - a char is slower: llvm/llvm-project#65040 |
…al` to `std::unordered_map`
ok |
Scanning the
clifolder withDISABLE_VALUEFLOW=1Tokenizer::dump()will consume almost 25% of the total Ir count when an addon is specified. This is mainly caused by the usage ofstd::ostream.Encountered while profiling #4958.