SIGSEGV in getCdbComponentInfo() when standby coordinator is on dedicated host#1702
SIGSEGV in getCdbComponentInfo() when standby coordinator is on dedicated host#1702jangjang0401 wants to merge 5 commits intoapache:mainfrom
Conversation
There was a problem hiding this comment.
Hi, @jangjang0401 welcome!🎊 Thanks for taking the effort to make our project better! 🙌 Keep making such awesome contributions!
my-ship-it
left a comment
There was a problem hiding this comment.
LGTM with minor nits. Suggest:
- Add the Assert(found || IS_HOT_STANDBY_QD()) to keep the primary-path invariant.
- Mirror the explanatory comment on the second call site (or drop it from the first).
a650064 to
af7447c
Compare
getCdbComponentInfo() populates hostPrimaryCountHash with primary hosts only. When IS_HOT_STANDBY_QD() is true, mirror and standby hosts are also looked up in the hash but return NULL on dedicated standby nodes that host no primary segments. Replace Assert(found) with a null-safe check to prevent SIGSEGV.
af7447c to
446b0ab
Compare
|
Thanks for the review!
Let me know if anything else should be adjusted. |
|
@yjhjstz @my-ship-it It appears that the vacuum_gp failure is an indirect effect of this patch. I would appreciate it if you could confirm whether my understanding is correct. |
Right direction, but I'd scope it more tightly: |
…sync replication interference
@my-ship-it I haven't updated vacuum_gp.out yet since I'm not entirely sure what the exact output will look like for the commands inside the aborted transaction state. Would it be okay to update vacuum_gp.out after checking the actual CI output, or could you provide some guidance on the expected output? |
|
@yjhjstz @my-ship-it Hi, could someone please approve the workflows run when you get a chance? Thank you! |
What does this PR do?
When a hot standby coordinator runs on a dedicated host with no primary
segments,
getCdbComponentInfo()crashes with SIGSEGV.hostPrimaryCountHashis built from primary segment hosts only. WhenIS_HOT_STANDBY_QD()is true, the loops oversegment_db_infoandentry_db_infoalso process mirror/standby entries. On a dedicatedstandby host that owns no primary segments,
HASH_FINDreturns NULL.The subsequent
Assert(found)or direct dereference ofhsEntrycausesa SIGSEGV, crashing every backend that attempts to connect.
The bug is masked when the standby coordinator shares a host with primary
segments, because the hostname already exists in the hash by coincidence.
Fix by treating a missing hash entry as a count of zero.
Type of Change
Test Plan
host (no primary segments).
psqlto the standby crashed with SIGSEGVbefore this fix and connects successfully after.
Impact
User-facing changes:
Connections to a hot standby coordinator no longer crash when the standby
host runs no primary segments.