We've been hitting arbitrary CI failures due to ConnectionFailed errors for a while, but mostly they've appeared in times when Github CI was relatively busy (e.g., when users of US timezone would start to wake up and presumably start spinning up CI jobs).
However, recently our CI is ~unusable due to the rate at which ConnectionFailed errors are arising. We should really explore how we can avoid this flakiness.
Possibly this is even a good opportunity to make our connection handling generally more robust, e.g., by re-calibrating some of the timeout parameters and/or implementing retry logic.
We've been hitting arbitrary CI failures due to
ConnectionFailederrors for a while, but mostly they've appeared in times when Github CI was relatively busy (e.g., when users of US timezone would start to wake up and presumably start spinning up CI jobs).However, recently our CI is ~unusable due to the rate at which
ConnectionFailederrors are arising. We should really explore how we can avoid this flakiness.Possibly this is even a good opportunity to make our connection handling generally more robust, e.g., by re-calibrating some of the timeout parameters and/or implementing retry logic.