Skip to content

chmatchdup() now handles non-ascii strings consistently#3850

Merged
mattdowle merged 10 commits into
masterfrom
fix3844
Sep 14, 2019
Merged

chmatchdup() now handles non-ascii strings consistently#3850
mattdowle merged 10 commits into
masterfrom
fix3844

Conversation

@shrektan
Copy link
Copy Markdown
Member

@shrektan shrektan commented Sep 9, 2019

Closes #3844

chmatchdup() now handles non-ASCII strings correctly.

More specifically, chmatchdup () calls the C code chmatchMain(), which would fall back to Rf_match() when non-ASCII strings were detected. However, the implementation was correct only before the param bool chmatchdup got introduced. Since Rf_match() can't handle the param chmatchdup, it would return undesired results for non-ASCII strings.

This PR will address this issue.

@shrektan shrektan added the encoding issues related to Encoding label Sep 9, 2019
@shrektan shrektan added this to the 1.12.4 milestone Sep 9, 2019
@shrektan shrektan requested a review from mattdowle September 9, 2019 17:31
@codecov
Copy link
Copy Markdown

codecov Bot commented Sep 9, 2019

Codecov Report

Merging #3850 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3850      +/-   ##
==========================================
+ Coverage   99.42%   99.42%   +<.01%     
==========================================
  Files          71       71              
  Lines       13441    13443       +2     
==========================================
+ Hits        13364    13366       +2     
  Misses         77       77
Impacted Files Coverage Δ
src/chmatch.c 100% <100%> (ø) ⬆️
src/utils.c 100% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5f284cf...ac563a3. Read the comment docs.

Comment thread src/chmatch.c
@shrektan
Copy link
Copy Markdown
Member Author

shrektan commented Sep 10, 2019

Note match_logical() is no longer used by any function. However, I'm not sure deleting the function is desired or not. So I just leave it there.


From Matt: Thanks. Yep let's just remove it then (will do). Great.

Comment thread src/chmatch.c Outdated
Comment thread src/chmatch.c
Comment thread src/chmatch.c Outdated
@shrektan
Copy link
Copy Markdown
Member Author

shrektan commented Sep 10, 2019

One question, as of SEXP *xd = STRING_PTR(coerceUtf8IfNeeded(x));, do I need to PROTECT() or not? I mean xd is pointing to coerceUtf8IfNeeded(x), which will be collected if GC being trigger or R is smart enough to know it's in use and won't collect it?


From Matt: yes the value returned from coerceUtf8IfNeeded() needs to be PROTECT-ed, will do. I did run this PR through rchk (see commands and links in CRAN_Release.cmd, and #3867 for recent work) which I expected to catch the missing PROTECT. But it didn't. That could be because rchk is even smarter to know that in this case there is actually no R API call before the last use of xd, so there is nowhere in this case that a GC could be triggered. It's very possible that it is that smart because the messages I've seen it report elsewhere suggest that. And the nature of this section of code (truelength-clobber) is that we do avoid R API usage here. rchk is a static analysis tool so it doesn't depend on coverage or tests to work, it works by analyzing the source code. Or it could be that the STRING_PTR wrapper hides the usage of coerceUtf8IfNeeded()'s return value from rchk's tracking. Not sure. But rchk didn't have any trouble with chmatch.c at all so it's not like the nesting was too deep and its coverage was incomplete; i.e. it didn't report any errors or warning on chmatch.c in this PR like it does with some other files. We should add the PROTECT anyway for completeness, will do.


From Shrektan: Thanks. For future readers, you may find https://github.com/kalibera/cran-checks/blob/master/rchk/PROTECT.md useful.

Comment thread src/utils.c Outdated
Comment thread src/utils.c Outdated
Comment thread src/utils.c Outdated
@shrektan shrektan mentioned this pull request Sep 13, 2019
25 tasks
@mattdowle mattdowle changed the title Closes #3844: chmatchdup() now handles non-ascii strings consistently chmatchdup() now handles non-ascii strings consistently Sep 14, 2019
Copy link
Copy Markdown
Member

@mattdowle mattdowle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent! Much simpler and better!

@mattdowle mattdowle merged commit b10dc94 into master Sep 14, 2019
@mattdowle mattdowle deleted the fix3844 branch September 14, 2019 23:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

encoding issues related to Encoding

Projects

None yet

Development

Successfully merging this pull request may close these issues.

setnames() can't set dt with non-ascii names

4 participants