Skip to content

fix(sync): Prevent indefinite hang on credential prompts and slow fetches#544

Draft
tony wants to merge 31 commits intomasterfrom
fix-sync-hang-on-credential-prompts
Draft

fix(sync): Prevent indefinite hang on credential prompts and slow fetches#544
tony wants to merge 31 commits intomasterfrom
fix-sync-hang-on-credential-prompts

Conversation

@tony
Copy link
Copy Markdown
Member

@tony tony commented Apr 7, 2026

Summary

  • Set GIT_TERMINAL_PROMPT=0 at sync() entry so git fails fast instead of blocking on stdin when credentials are needed
  • Add 120s timeout to _maybe_fetch() subprocess call to prevent indefinite blocking on slow/large repo fetches
  • Handle subprocess.TimeoutExpired with a descriptive error message

Fixes #543

Context

vcspull sync --workspace <path> --all hangs indefinitely when any repo in the batch triggers a git credential prompt (SSH passphrase, HTTPS password) or when fetching a very large repo (e.g. 15GB rust-lang/rust). The sync loop is sequential, so one stuck git fetch blocks all remaining repos with no timeout or recovery.

Test plan

  • New parametrized test for TimeoutExpired handling in _maybe_fetch
  • Spy test verifying GIT_TERMINAL_PROMPT=0 is passed in subprocess env
  • Spy test verifying timeout kwarg is passed to subprocess
  • Test verifying sync() sets GIT_TERMINAL_PROMPT=0 via os.environ.setdefault
  • All 961 existing + new tests pass
  • ruff, mypy clean

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 7, 2026

Codecov Report

❌ Patch coverage is 74.70588% with 86 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.61%. Comparing base (419b94d) to head (c69879e).

Files with missing lines Patch % Lines
src/vcspull/cli/sync.py 71.91% 66 Missing and 16 partials ⚠️
src/vcspull/log.py 91.66% 2 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #544      +/-   ##
==========================================
- Coverage   84.27%   83.61%   -0.66%     
==========================================
  Files          29       29              
  Lines        3809     4114     +305     
  Branches      758      813      +55     
==========================================
+ Hits         3210     3440     +230     
- Misses        377      440      +63     
- Partials      222      234      +12     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

… slow fetches

why: vcspull sync hangs indefinitely when a repo needs credentials (git
waits on stdin) or when fetching very large repos (e.g. 15GB rust-lang/rust).
Sequential processing means one stuck repo blocks the entire batch.

what:
- Set GIT_TERMINAL_PROMPT=0 at sync() entry to make git fail fast on auth
- Add 120s timeout to _maybe_fetch subprocess.run() call
- Handle subprocess.TimeoutExpired with descriptive error message
- Pass no-prompt env to _maybe_fetch subprocess calls
- Add tests for timeout handling, env propagation, and prompt suppression

Refs: #543
@tony tony force-pushed the fix-sync-hang-on-credential-prompts branch from 9fe9c4f to 3d5bafa Compare April 24, 2026 10:12
tony added 27 commits April 24, 2026 17:24
why: This branch needs both of the libvcs 0.40.0a0 sync-hang fixes to do its
job: the `timeout=` kwarg on `libvcs._internal.run.run` (for per-subprocess
deadlines) and the faster `GitSync.remote()` path that no longer enumerates
every remote ref via `git remote show`. The pin is widened to allow any
0.40.x pre-release so test environments can pick up further patches without
a follow-up bump.

what:
- Widen libvcs pin from `~=0.39.0` to `>=0.40.0a0,<0.41.0`.
- Rename the libvcs pytest fixture type from `CreateRepoPytestFixtureFn` to
  the new `CreateRepoFn` alias across tests/test_sync.py and tests/test_cli.py.
- Add a CHANGES entry describing the dependency bump and why it matters.
why: Before this change, a single stuck repository -- e.g. openai/codex under
~/study/rust/, which pipes thousands of remote-show lines through libvcs's
busy-poll loop -- would block the entire batch of `vcspull sync` commands.
The prior branch commit only armed git's credential-prompt env vars and
timed out the *planning* fetch; the main `update_repo()` call still ran
without a deadline. This wires a wall-clock watchdog around every repository
so the batch keeps moving when one repo is slow or wedged, and surfaces an
actionable rerun recipe so the user can diagnose without hunting for CLI
flags.

what:
- Add `_SyncOutcome` + `_TimedOutRepo` dataclasses and a
  `_sync_repo_with_watchdog` primitive that runs `update_repo` on a single
  worker and enforces the timeout via `future.result(timeout=...)`. The
  executor is shut down with `wait=False` on every return path so a wedged
  libvcs subprocess is not allowed to block the main thread.
- Add `--timeout SECONDS` CLI flag and `VCSPULL_SYNC_TIMEOUT_SECONDS` env
  override. Default is 10s, intentionally aggressive so hangs are surfaced
  fast.
- Track timed-out repositories separately in the summary and emit a
  copy-pasteable rerun recipe grouped by workspace root. The suggested
  replacement timeout is `max(120, current * 10)` so a user who keeps the
  default gets a meaningful headroom on retry.
- Plumb the new kwarg through `vcspull.cli.__init__:sync(...)`.
- New `tests/cli/test_sync_watchdog.py` covering timeout resolution
  precedence, the synced/failed/timed-out outcome matrix, the rerun-recipe
  contents (workspace grouping, suggested timeout, verbose variant, manual
  git probe, log-file path), and the no-op branch when nothing timed out.
- CHANGES entry under 1.59.x.
…facing

why: When vcspull sync stalled on a repo and the watchdog abandoned it, the
user had no post-mortem trail -- the only signal was the terminal line that
said "timed out". npm/pnpm/yarn solve this by always writing a debug log to
the temp dir and only surfacing the path on failure, so clean runs stay
quiet but failed ones give the user something to grep. This wires the same
pattern into vcspull sync.

what:
- Add `default_debug_log_path`, `setup_file_logger`, and
  `teardown_file_logger` in vcspull.log. The file handler attaches to both
  the vcspull and libvcs loggers at DEBUG level, so libvcs's per-repo
  progress output is captured alongside vcspull's own structured events.
  Duplicate registration is deduped by baseFilename so multiple setups for
  the same path do not write every record twice.
- Default path is `$TMPDIR/vcspull-debug-<timestamp>-<pid>.log`, matching
  the npm/pnpm/yarn convention.
- Add `--log-file PATH` and `--no-log-file` CLI flags. Thread both through
  from `vcspull.cli.__init__:sync(...)`.
- Extract the existing body of `sync()` into `_sync_impl` so the file
  handler setup/teardown can wrap the whole sync in a `try/finally` and
  still cover every exit path (including argparse's SystemExit).
- Surface the log path to the terminal on any failure (timeout block via
  `_emit_rerun_recipe`, and a new "Full debug log: ..." line for plain
  failures). Clean runs leave the file on disk but never mention it.
- Tests in tests/cli/test_sync_log_file.py covering default path shape,
  attachment at DEBUG, end-to-end record capture, idempotent attachment,
  and clean teardown.
- CHANGES entry under 1.59.x.
why: Before this change, users running `vcspull sync` had no live signal of
which repository was currently processing -- the only output was a
post-factum success/failure line per repo. When a slow or wedged repo was
running, stdout just appeared frozen. That pattern is what led to the
"codex is hanging, is it hung?" confusion that motivated this whole branch.
Providing a lightweight indicator resolves that without reaching for a
heavyweight dep like `rich`.

what:
- Add src/vcspull/cli/_progress.py with a threaded SyncStatusIndicator that
  uses only stdlib + ANSI escape codes (pattern borrowed from tmuxp's
  cli/_progress.py): a 100ms braille-frame spinner in a TTY, a single
  "· syncing <name>" line on start plus periodic "still syncing" lines in
  pipes.
- atexit hook restores the cursor when the interpreter shuts down mid-sync
  so kitty/tmux don't leak a hidden cursor after a crash.
- `build_indicator` disables the spinner for --json / --ndjson and for
  --color=never so machine-readable output stays clean.
- Extract the inner repo loop of _sync_impl into _run_sync_loop so the
  indicator can be cleanly started, torn down via finally, and not leak
  into the summary emission.
- Register the new module in tests/test_log.py expected CLI-logger list
  so get_cli_logger_names still matches.
- Add tests/cli/test_sync_progress.py covering: disabled/no-op, start+
  heartbeat, throttled heartbeat under interval, TTY spinner frames +
  cursor-hide/show, idempotent close, context-manager form, and
  build_indicator's mode-aware gating.
- CHANGES entry under 1.59.x.
why: Two regressions in the post-summary output reported by the user:

1. The "Full debug log: ..." line printed twice whenever a batch had both
   timed-out repos AND plain failures -- _emit_rerun_recipe emitted it, and
   the separate "surface-on-failure" block after the summary emitted it
   again. Single source of truth beats two coordinated emitters.

2. The new timeout / rerun-recipe / spinner output used emoji-like glyphs
   (⏱ stopwatch, · middle dot, … ellipsis, Braille spinner frames). The
   user asked to avoid emojis while keeping the semantic colours, which
   already carry the warning signal on their own.

what:
- Remove the log-file emit from _emit_rerun_recipe. The sole emit site is
  now after _emit_summary, gated on summary['failed'] > 0 -- timeouts are
  already counted there so the hint fires exactly once on any failure.
- Drop the now-unused log_file_path kwarg from _emit_rerun_recipe. Mirror
  the callers (exit-on-error branch in the sync loop + post-summary in
  _sync_impl).
- Replace emoji/glyph decorations in the rerun recipe with ASCII:
  ⏱ -> plain "Timed out:" label (warning colour), -, ->.
- Replace the in-loop timeout row prefix ⏱ -> - and → -> ->.
- Progress indicator: swap Braille spinner frames (⠋⠙⠹⠸⠼⠴⠦⠧⠇⠏) for
  ASCII |/-\; drop the · start-line prefix; replace … with ... in the
  spinner line and the non-TTY heartbeat.
- Tests: drop the log-file-path assertion from the rerun-recipe test (the
  helper no longer takes the kwarg); add a regression guarding against
  the ⏱ glyph leaking back into the recipe; update the spinner test to
  assert ASCII frames and to forbid the previous Unicode decorations
  (Braille frames, ellipsis, middle dot).
why: Three regressions reported against the prior polish commit during a
real `vcspull sync --workspace ~/study/rust --all` run:

1. Braille spinner was the right glyph set; `|/-\` was an over-correction
   of the "avoid emojis" feedback. The user pointed to tmuxp -- which also
   uses Braille -- as the model.

2. The spinner's \r redraws collided with libvcs's concurrent stdout writes
   (log lines via the `libvcs` StreamHandler, subprocess output via
   `progress_cb`). The cursor sat mid-spinner when those writes landed and
   produced artefacts like `Already on 'main'ec ...  1.1s`.

3. Ctrl-C during the batch was silently downgraded to a per-repo "failed"
   outcome (the watchdog caught `BaseException`), so the loop moved on to
   the next repo. Worse, `ThreadPoolExecutor` registers its workers in the
   atexit join list, so Python's interpreter shutdown blocked on the wedged
   libvcs subprocess thread -- a second Ctrl-C was needed and a
   `concurrent/futures/thread.py` traceback printed.

what:
- Restore `_SPINNER_FRAMES = "⠋⠙⠹⠸⠼⠴⠦⠧⠇⠏"`.
- Add `SyncStatusIndicator.write(text)` which, under the same lock the
  redraw loop holds, emits `\r\033[2K` before the caller's text when a
  spinner frame is in flight, then flushes. Zero `_last_line_len` so the
  next tick redraws cleanly.
- Wrap each TTY frame in ANSI synchronized-output markers (`\x1b[?2026h`
  / `\x1b[?2026l`) so terminals flip the spinner line atomically; the
  bracket is ignored by terminals that don't implement the sequence.
- Install a narrow stdout diverter while the spinner is enabled:
  `_IndicatorStreamProxy` adapts the `libvcs` and `vcspull` StreamHandlers
  to call `indicator.write(...)`. FileHandlers (the debug log) are left
  alone so `/tmp/vcspull-debug-*.log` keeps its full trace regardless.
  libvcs's `progress_callback` is rewired to `indicator.write(output)` for
  the duration of the loop.
- Replace `concurrent.futures.ThreadPoolExecutor` with a raw daemon
  `threading.Thread` + `threading.Event` in `_sync_repo_with_watchdog`.
  Daemon threads are forcibly terminated at interpreter shutdown; no
  `_python_exit` join, no second-Ctrl-C traceback. Narrow the catch to
  `Exception` so `KeyboardInterrupt` and `SystemExit` propagate; the
  worker stashes a `BaseException` and the main thread re-raises it.
- Top-level `KeyboardInterrupt` handler in `_sync_impl` prints
  "Interrupted by user.", emits the partial summary, surfaces the debug
  log path, and raises `SystemExit(130)` (conventional SIGINT exit code).
- Tests: re-assert Braille frames and the synchronized-output escape
  sequences; add `test_write_clears_spinner_line_before_emitting_in_tty`
  as a regression for artefact 2; add
  `test_watchdog_propagates_keyboard_interrupt_from_worker` for the
  narrowed catch.
why: tmuxp's spinner renders each Braille frame through
``self.colors.info(frame)``, which reads brightly against the surrounding
dim log stream without tinting the repo name / elapsed suffix. The user
called that out as the model and the plain-ANSI version we'd shipped
looked visually flat next to libvcs's coloured ``|git| (repo)`` lines.

what:
- Accept a ``Colors`` instance on ``SyncStatusIndicator`` (defaulting to a
  NEVER-colour palette so ad-hoc callers and tests don't leak ANSI codes).
- Wrap only the Braille cell in ``self._colors.info(frame)`` inside
  ``_render_tty``; keep the ``syncing <name> ... 1.1s`` suffix in the
  terminal default so it doesn't compete with the permanent-line colours
  (✓ Synced green, - Timed out yellow).
- Track visible-cell length (not escape-inclusive string length) in
  ``_last_line_len`` so the pad-clear arithmetic still aligns the next
  redraw even though the line has grown with ANSI bytes.
- Thread the shared ``Colors`` from ``_sync_impl`` into ``build_indicator``
  so the spinner honours ``--color=always/auto/never`` exactly like the
  rest of the CLI.
- Test: ``test_tty_spinner_colours_the_frame_cell`` regex-matches an ANSI
  colour sequence immediately followed by a Braille frame, locking in the
  tmuxp-parity and guarding against future "colour the whole line" drift.
why: The prior interrupt handler lived inside ``_sync_impl`` wrapped
around ``_run_sync_loop``. Ctrl-C landing BEFORE the loop -- notably
during ``load_configs`` (libyaml parse on a large ~/.vcspull.yaml), or
inside ``find_config_files``, or during the dry-run branch -- propagated
all the way out as an unhandled ``KeyboardInterrupt`` and the user saw
the full yaml parser stack dumped to the terminal. Observed in the
reporter's chained ``vcspull sync; vcspull sync; ...`` invocation.

what:
- Add an outer ``except KeyboardInterrupt`` in ``sync()`` wrapping the
  existing ``try: _sync_impl(...) finally: teardown_file_logger(...)``.
  This handler fires only for interrupts outside the repo loop (the
  in-loop handler inside ``_sync_impl`` still runs first when applicable
  and emits the partial summary via its ``SystemExit(130)``, which is not
  caught here).
- Handler writes "Interrupted by user." and the debug log path straight
  to ``sys.stderr`` -- the formatter may not yet exist, or may have been
  partially emitted, so a plain message is the safe contract.
- Exits with ``SystemExit(130)`` (SIGINT convention) so shells / CI
  distinguish "user stopped this" from "crashed".
- Regression test ``test_sync_handles_keyboard_interrupt_during_config_load``
  monkeypatches ``load_configs`` to raise ``KeyboardInterrupt`` and
  asserts exit code 130 + stderr message.
why: bash/zsh abort ``cmd1; cmd2; cmd3`` sequential lists only when the
child was killed by a signal (WIFSIGNALED(SIGINT)). A clean
SystemExit(130) leaves the shell no reason to stop, so chained
``vcspull sync ...; vcspull sync ...`` required per-command Ctrl-C --
the reporter hit Ctrl-C four times in a four-invocation chain. Git
solves this with ``sigchain_pop(sig); raise(sig)`` (sigchain.h:20-34,
builtin/clone.c:416). Translate the pattern to Python so vcspull
composes with ``;``-chains like git clone / git fetch.

what:
- Add ``_exit_on_sigint()``: POSIX does ``SIG_IGN`` (guard the gap) →
  ``SIG_DFL`` → ``signal.raise_signal(SIGINT)`` so the kernel exits
  with WIFSIGNALED(SIGINT); Windows falls back to ``SystemExit(130)``.
  Precondition: main thread (documented; the ValueError fails loud
  rather than silently defeating the change off-thread).
- Add ``_SyncInterruptedAfterSummary(KeyboardInterrupt)`` marker so
  the outer ``sync()`` catch skips the plain stderr fallback when the
  inner ``_sync_impl`` already printed a coloured partial summary.
- Inline log-file teardown in the outer ``except`` (atexit and
  ``finally`` do not run after ``SIG_DFL`` terminates at C level) and
  null the handle so the surrounding ``finally`` is a single-close
  no-op on the Windows fallback path.
- Shield the inner ``if interrupted:`` summary emission against
  ``OSError`` / ``ValueError`` (e.g. ``BrokenPipeError`` when stdout
  is piped to ``head``) so the marker exception *always* propagates --
  otherwise bash sees exit 1 and the ``;`` chain continues.
- Tests: opt-in ``_fake_sigint_escalation`` fixture for in-process
  ``SystemExit(130)`` assertions (applied to the existing
  ``test_sync_handles_keyboard_interrupt_during_config_load``);
  mock-based unit tests for ``_exit_on_sigint`` POSIX ordering and
  Windows fallback; new subprocess-based
  ``test_exit_on_sigint_produces_wifsignaled_sigint`` asserting
  ``proc.returncode == -signal.SIGINT`` so the real
  WIFSIGNALED path is pinned down (POSIX only).
- CHANGES entry under 1.59.x.

Matches the design pass Claude/Gemini/GPT converged on via two rounds
of weave:ask (see
``$XDG_STATE_HOME/ai-aip/repos/vcspull--*/sessions/ask/latest/``).
…ndler

why: After shipping the npm-style debug log file, users saw libvcs's
chatty ``|git| (repo) head_sha: ...`` / ``show_ref_output: ...`` /
``git_remote_name: ...`` lines on the terminal even at the default
``--log-level=INFO``. Root cause: ``setup_file_logger`` raises the
``vcspull`` and ``libvcs`` loggers to DEBUG so its FileHandler captures
the full trace, but ``setup_logger`` left the terminal StreamHandlers
with no per-handler level filter -- so a logger-level bump opened the
floodgate to the terminal too. Reporter confirmed the same noise across
4 chained ``vcspull sync`` invocations today.

what:
- Add ``verbosity: int = 0`` kwarg to ``setup_logger`` (sync subcommand
  passes ``args.verbosity``; everything else gets 0). Drives the libvcs
  StreamHandler level via ``_libvcs_stream_level``: 0 → WARNING (chatty
  INFO + DEBUG suppressed); 1 (``-v``) → INFO; 2+ (``-vv``) → DEBUG.
- Pin per-handler levels on every vcspull and libvcs StreamHandler. The
  per-handler filter is what protects the terminal from
  ``setup_file_logger`` raising the logger level to DEBUG; the file
  handler still receives DEBUG records because libvcs's *logger* level
  is DEBUG, but the terminal handler drops them via its own filter.
- ``cli/__init__.py``: thread ``args.verbosity`` (defaults to 0 when the
  subcommand doesn't define it) into ``setup_logger``.
- Tests: 5 new in ``tests/test_log.py``: default WARNING; ``-v`` INFO;
  ``-vv`` DEBUG; vcspull StreamHandler pinned to resolved level;
  ``setup_file_logger`` does NOT change StreamHandler levels (the
  floodgate regression guard).
why: Reporter wanted the *interesting* git stderr (``From <url>``,
``* [new branch]``, ``Already on 'main'``, ``Successfully rebased``)
visible while a repo is syncing, but NOT lingering in scrollback once
the repo completes. Today those lines stack as plain rows that the
``✓ Synced`` summary then has to compete with. tmuxp's
``before_script`` panel gives exactly the desired shape: a fixed-height
region above the spinner that rolls under as fresh lines arrive and
collapses on completion. Translate that pattern into vcspull's
``SyncStatusIndicator``.

what:
- Extend ``SyncStatusIndicator`` (``cli/_progress.py``) with an
  ``output_lines: int = 3`` constructor arg, a
  ``collections.deque(maxlen=output_lines)`` panel buffer, a
  ``_panel_visible_lines`` counter for redraw geometry, and a public
  ``add_output_line(text)`` API. ``0`` hides the panel and falls back
  to plain ``write()``; ``-1`` is unbounded.
- Extend ``_render_tty`` to draw panel rows above the spinner inside
  one synchronized-output bracket: walk the cursor up by
  ``_panel_visible_lines`` rows from the previous frame, erase + emit
  each panel line, then write the spinner cell -- the whole frame is
  one atomic write so concurrent ``add_output_line`` calls can't tear it.
- Add ``_clear_block`` for the spinner+panel teardown path used by
  ``stop_repo`` and ``close``: walks back up the rendered rows and
  erases each, leaving the cursor on a clean row for the next
  ``formatter.emit_text("✓ Synced ...")``. ``write()`` (the log
  diverter path) gains the same up-walk so a logger record clears the
  panel along with the spinner before printing.
- ``start_repo`` clears the panel buffer (fresh trail per repo);
  ``stop_repo`` clears + collapses; ``close`` falls back to
  ``_clear_block`` so a Ctrl-C mid-repo doesn't leave the panel
  sticky.
- ``add_output_line`` gates the panel path on ``self._tty``: in
  non-TTY mode the spinner thread never starts to render the deque,
  so the method falls back to plain ``write()``. Without this gate
  the bytes would silently disappear under capsys / pipes / CI.
- ``cli/sync.py``: new ``--panel-lines N`` flag with metavar/help;
  new ``_resolve_panel_lines`` helper (CLI > env > default), mirroring
  ``_resolve_repo_timeout``; thread the resolved value through
  ``sync()`` -> ``_sync_impl()`` -> ``build_indicator()``. Switch
  ``_indicator_progress`` from ``indicator.write(output)`` to
  ``indicator.add_output_line(output)`` so libvcs's chunked progress
  callback feeds the deque instead of the raw stream.
- ``cli/__init__.py``: forward ``args.panel_lines`` (defaulting to
  ``None``) into ``sync(...)``.
- 11 new tests across ``tests/cli/test_sync_progress.py`` (deque
  bounds, ``output_lines=0`` fallback, non-TTY fallback, atomic
  panel render with synchronized-output bracket, ``stop_repo``
  collapse, fresh-trail-per-repo) and ``tests/cli/test_sync_watchdog.py``
  (5 ``_resolve_panel_lines`` precedence tests).
- ``CHANGES`` entry under 1.59.x.
…nner

why: Reporter showed two adjacent lines

    ✓ Synced fish-shell → ~/study/rust/fish-shell
    ⠸ syncing flume ...  0.9s

and called out two inconsistencies: the in-flight repo name (``flume``)
is plain text while the permanent line's name (``fish-shell``) is
``colors.info`` cyan, and the verb is lowercase ``syncing`` while the
permanent line uses uppercase ``Synced`` / ``Timed out``. Both lines are
status badges and should read as a family.

what:
- ``_render_tty``: wrap ``name`` in ``self._colors.info(name)`` so the
  in-flight repo name matches the cyan ``colors.info(repo_name)`` used
  on the permanent ``✓ Synced <name>`` line in ``cli/sync.py``.
- Capitalise the verb: ``Syncing`` (not ``syncing``) in the spinner
  line, mirroring ``Synced``/``Timed out``.
- Non-TTY ``start_repo`` line gets the same capitalisation
  (``Syncing <name>``). The ``... still syncing <name>`` heartbeat
  stays lowercase since it's a sentence continuation after the
  ellipsis, not a status badge.
- Tests updated accordingly: assert ``Syncing codex`` (capital) on
  the start line; case-insensitive count for the heartbeat-throttle
  test (``Syncing`` start + ``... still syncing`` heartbeat).
  ``test_tty_spinner_colours_the_frame_cell`` is renamed to
  ``test_tty_spinner_colours_the_frame_cell_and_name`` and now also
  asserts a colour escape immediately precedes the repo name.
…r race

why: Reporter saw a flicker (and occasionally a doubled
``✓ Synced clap`` line) when a repo finished. Two interacting causes:

1. ``_render_tty`` snapshotted state under ``_lock`` then released the
   lock before the actual ``stream.write``. A concurrent ``stop_repo``
   could clear the screen between the snapshot and the write, letting
   the spinner thread paint a stale frame on top of the new state.
   Rich avoids this by holding ``self._lock`` across every refresh tick
   *and* re-checking ``done.is_set()`` inside the lock
   (``rich/live.py:_RefreshThread.run``); the same pattern applies here.
2. ``stop_repo`` cleared the spinner, then the caller printed the
   permanent line in a *separate* stream call. Even on a fast terminal
   the spinner row blinks to blank between the two writes, which
   reads as flicker. The atomic transition pattern (single
   ``\x1b[?2026h ... \x1b[?2026l`` block that erases the panel and
   replaces the spinner row with the permanent line) eliminates the
   flash.

what:
- ``_render_tty``: hold ``_lock`` from snapshot through the stream
  write. Re-check ``self._active_repo`` under the lock at the top --
  if it changed (``stop_repo`` ran), skip the tick entirely so a
  stale frame never lands on screen.
- ``stop_repo`` gains an optional ``final_line: str | None`` argument
  and now returns ``bool``. When the indicator is actively rendering
  on a TTY *and* a ``final_line`` is given, build one atomic ANSI
  block under the lock that walks up the panel rows, erases each
  going down, and replaces the spinner row with ``final_line + \n``.
  Returns ``True`` to tell the caller "I owned the print; skip your
  ``formatter.emit_text``" -- which is what kills the doubled
  ``✓ Synced clap`` artefact.
- Headless / disabled-indicator paths still return ``False`` so the
  caller's ``formatter.emit_text`` keeps running unchanged.
- ``cli/sync.py``: replace the ``with indicator.repo(...)`` context
  manager (which fired ``stop_repo()`` with no args before the
  outcome was known) with manual ``start_repo`` + post-outcome
  ``stop_repo(final_line=permanent if is_human else None)``. Skip
  the separate ``formatter.emit_text(permanent)`` when ``stop_repo``
  reported it owned the print. Applies uniformly to the ``synced``,
  ``failed``, and ``timed_out`` branches.
- ``cli/sync.py`` exception path: any ``BaseException`` (incl. the
  KeyboardInterrupt the SIGINT handler relies on) tears the indicator
  down with ``stop_repo()`` (no replacement line) and re-raises so
  the surrounding ``except KeyboardInterrupt`` in ``_sync_impl`` still
  owns the partial-summary print.
- 4 new tests in ``tests/cli/test_sync_progress.py``:
  ``stop_repo(final_line=...)`` writes inside one synchronized-output
  bracket and returns ``True``; the no-arg path returns ``False``;
  non-TTY stays caller-owned even with a ``final_line``;
  ``_render_tty`` skips writes when the active repo was cleared
  (the lock-protected stale-tick guard).
why: Reporter saw blank-line gaps between successive ``✓ Synced ...``
lines when chaining ``vcspull sync --workspace ~/study/otel/ --all;
vcspull sync --workspace ~/study/rust/ --all``. The panel feature added
in 79fd4c3 promised "collapses when each repo finishes -- leaving only
the permanent ``✓ Synced <name>`` line", but the implementation in
d650c97 only erased panel-row content with ``\x1b[2K\n``. ``\x1b[2K``
clears the row's visible characters; the trailing ``\n`` advances the
cursor down. After ``panel_visible`` iterations the cursor is back at
the spinner row, but the panel rows above are blank-but-present in
scrollback. Replacing the spinner with the permanent line then leaves
orphan blank rows above it; the count varies per repo (whatever the
previous repo's panel height was at completion time), exactly matching
the reporter's "variable gaps" symptom.

Fix: use DL (``\x1b[Pn M``, Delete Line) instead. After walking up to
the first panel row, ``\x1b[{panel_visible}M`` deletes those rows from
the buffer; subsequent rows scroll up to fill, so the spinner row's
content lands at the cursor row. The follow-up
``erase + final_line + \n`` morphs that row into the permanent line in
place. No orphan blanks. DL is a VT100 sequence supported by every
terminal that implements the synchronized-output bracket already in
use here.

what:
- ``stop_repo`` atomic transition (``_progress.py``): replace the
  ``\x1b[2K\n`` x panel_visible loop with a single
  ``\x1b[{panel_visible}M`` after the upward walk. Restructure the
  conditional so DL only fires when ``panel_visible > 0``.
- ``_clear_block`` (``_progress.py``): same defect in the ``close()``
  teardown path -- walked up + erased per row, leaving blanks. Replace
  with a single DL of ``panel + 1`` rows after walking up to the first
  panel row. The close path is preventive (normal operation hits
  ``stop_repo`` first and zeroes ``_panel_visible_lines``) but the
  parallel pattern was wrong for the same reason and would surface
  under a KeyboardInterrupt close-before-stop_repo.
- 4 new regression tests in ``tests/cli/test_sync_progress.py``:
  DL escape lands when ``_panel_visible_lines > 0`` (with negative
  assertion that the regressed ``\x1b[2K\n`` per-row pattern is NOT
  present); DL omitted when zero panel rows; ``stop_repo()`` without
  ``final_line`` still DL's the panel; ``_clear_block`` deletes
  panel + spinner via one DL.
why: libvcs 0.40.0 was tagged and published to PyPI
(2026-04-25T18:59Z, sha256 5f3340b6...). The earlier ``>=0.40.0a0``
pin was a placeholder while the prerelease was being prepared; with
the stable release available, drop the ``a0`` suffix so the lower
bound matches the actual minimum-supported version.

what:
- ``pyproject.toml``: ``libvcs>=0.40.0a0,<0.41.0`` ->
  ``libvcs>=0.40.0,<0.41.0``.
- ``uv.lock``: relock to libvcs 0.40.0 from PyPI (sdist + wheel
  hashes match the published release). The lock carries
  ``[options.exclude-newer-package] libvcs = "2026-04-26..."`` to
  override the global 3-day cooldown configured in
  ``~/.config/uv/uv.toml`` -- libvcs 0.40.0 was published hours
  before this lock and would otherwise have been filtered out.
why: The ``### Dependencies`` entry in the unreleased section was
written when the pin was ``>=0.40.0a0,<0.41.0``. With libvcs 0.40.0
released and the pin tightened to ``>=0.40.0,<0.41.0``, the
changelog text now describes a stale bound; align it with reality
so users reading the next release notes see the actual lower bound.

what:
- ``CHANGES``: ``Bump libvcs to >=0.40.0a0,<0.41.0...`` ->
  ``Bump libvcs to >=0.40.0,<0.41.0...``.
why: ``log.debug(NO_REPOS_FOR_TERM_MSG.format(name=search_term))``
runs ``str.format`` even when DEBUG is filtered out -- wasted work on
the hot ``vcspull sync`` path that resolves search terms for every
repo pattern. CLAUDE.md mandates lazy ``%``-style log args precisely
because eager formatting defeats the level filter.

what:
- ``cli/sync.py``: rewrite the debug call as
  ``log.debug('No repo found in config(s) for "%s"', search_term)``.
- The user-facing ``formatter.emit_text`` f-string a few lines below
  is *not* a log call; leave it alone.
why: ``_install_indicator_log_diverter`` rebound ``handler.stream``
with a plain attribute assignment while the spinner thread and the
worker thread were both emitting libvcs records through the same
handler. ``Handler.handle`` (stdlib ``logging/__init__.py:1011``)
takes ``self.lock`` around every ``emit()`` precisely so that
modifications to the handler's mutable state can serialise with
in-flight writes; an unlocked swap technically violates that contract.
On CPython the GIL hides the race in practice (single STORE_ATTR is
atomic, no torn pointer), but the contract is still wrong and would
matter on a non-GIL'd build.

what:
- Acquire ``handler.lock`` (via ``handler.acquire()/release()``) around
  the install swap and the restore swap. Same lock the emit path holds.
- Don't go through ``StreamHandler.setStream``: its first action is to
  ``flush()`` the existing stream, which raises ``ValueError`` when the
  current stream is a finalized pytest ``capsys``/``CaptureIO`` (a
  prior test ran while ``setup_logger`` captured ``sys.stdout``, the
  test ended, the CaptureIO closed, the handler still holds the dead
  reference, then this test calls ``setStream``). Skipping the flush
  is safe -- we aren't closing the stream, only redirecting future
  writes -- and matches the handler-internal locking the contract
  actually requires.
- Inline the previous-stream capture (no longer needs the
  ``setStream`` return-value plumbing).
why: ``_NO_PROMPT_ENV`` was a module-global cache that snapshotted
``os.environ`` on first call. Tests using ``monkeypatch.setenv`` after
the cache populated never saw the change in ``_get_no_prompt_env`` --
the subprocess inherited a stale env. The cache also has no
performance justification: ``_maybe_fetch`` already runs
``subprocess.run`` (orders of magnitude more expensive than a dict
copy).

what:
- ``cli/sync.py``: remove the ``_NO_PROMPT_ENV`` global and the
  ``global``-guard branch; build the env dict fresh on each
  ``_get_no_prompt_env`` call.
- Updated docstring to call out the rationale (test isolation > 1us
  saved per fetch).
why: The help text said "increase plan verbosity (-vv for maximum
detail)", which was accurate for the original dry-run plan flag but
became misleading when ``setup_logger`` started using ``args.verbosity``
to drive the libvcs ``StreamHandler`` level (WARNING -> INFO -> DEBUG).
A user reading ``--help`` to figure out why their terminal got chatty
on ``-v`` had no breadcrumb pointing at the verbosity ladder.

what:
- ``cli/sync.py``: rewrite the ``--verbose`` help string to spell out
  the actual ladder (default WARNING; -v INFO; -vv DEBUG) and to keep
  the dry-run plan-detail behaviour mentioned for callers who relied on
  the old wording.
why: The rerun-recipe banners (``-> Rerun...``, ``-> Or with verbose...``,
``-> Probe each...``), the per-repo timed-out path arrow, and the
"Full debug log" hint all used ASCII ``->``, while the rest of the
file (``✓ Synced clap → ~/study/...``, ``Timed out fish → ~/...``)
uses the Unicode ``→``. Two ornaments coexisted in the same emit
stream, which read as visual drift.

what:
- ``cli/sync.py``: replace ``colors.info('->')`` and
  ``colors.muted('->')`` with the ``→`` form across all 7 surfaces
  (4 in ``_emit_rerun_recipe``, 2 in the ``_sync_impl`` "Full debug
  log" hints, 1 in the ``timed_out`` permanent line of
  ``_run_sync_loop``).
- The non-arrow ``colors.warning('-')`` bullet stays as-is -- it's a
  list bullet, not a path-or-flow ornament.
why: ``add_output_line`` took ``self._lock`` and called
``text.splitlines()`` for every chunk libvcs delivered through the
progress callback, including whitespace-only fragments (a stray
``\n`` between progress frames is common). The inner ``if stripped``
loop body would skip every fragment and the deque wouldn't change,
so the lock acquire was pure overhead on a hot path that fires
hundreds of times per repo.

what:
- ``cli/_progress.py``: after the panel-disabled fallback (which
  *should* still surface whitespace via ``self.write``), early-return
  when ``text.strip()`` is empty. The fallback path comes first so
  panel=0 / non-TTY / JSON callers still see the bytes.
why: CLAUDE.md requires working doctests on every function and method,
both as documentation and as runnable examples. The sync UX overhaul
introduced public-ish helpers without doctest coverage, leaving the
contract for ``--panel-lines`` / ``--timeout`` resolution and the
``SyncStatusIndicator`` lifecycle described only in prose.

what:
- ``cli/sync.py``: add doctests on ``_resolve_repo_timeout`` (default
  fallback when CLI value is ``None``/``0``/negative; CLI override
  passes through) and ``_resolve_panel_lines`` (default 3; ``0`` hides;
  ``-1`` unbounded; explicit values pass through). Each doctest pops
  the relevant ``VCSPULL_*`` env var first so a stray export in the
  caller's shell doesn't flake the example.
- ``log.py``: add a doctest on ``default_debug_log_path`` asserting the
  ``vcspull-debug-`` prefix, ``.log`` suffix, and that the parent is
  the system tempdir.
- ``cli/_progress.py``: doctests on ``SyncStatusIndicator.start_repo``,
  ``stop_repo``, and ``add_output_line``. Each builds a non-TTY (or
  TTY-with-StringIO) indicator so the doctest prints nothing visible
  yet still exercises the state machine. ``add_output_line`` got an
  ``r"""`` prefix because the example payloads contain ``\n``
  literals (D301).
- Test count goes from 1018 to 1024 (+6 doctests).
why: ``--panel-lines`` uses ``-1`` as the unbounded sentinel and ``0``
as the hide sentinel; any other negative was meaningless but
``_resolve_panel_lines`` silently coerced -2 / -7 / -42 to "unbounded"
via the permissive ``elif output_lines < 0:`` branch in
``SyncStatusIndicator.__init__``. A user typing ``--panel-lines -3``
expecting "3 lines" would get an unbounded panel with no warning.
typer/click validate at parse time precisely so user typos surface
where the user can fix them.

what:
- ``cli/sync.py``: add ``_panel_lines_arg(value: str) -> int``, an
  argparse ``type=`` callable that accepts -1, 0, or any positive
  integer; raises ``argparse.ArgumentTypeError`` otherwise. Includes
  a doctest covering happy-path and both rejection cases (non-int
  string and out-of-range).
- ``--panel-lines`` argparse entry uses ``type=_panel_lines_arg`` in
  place of plain ``type=int``.
- The ``VCSPULL_PROGRESS_LINES`` env-var path keeps its existing
  log-warning + fallback behaviour: env vars are trusted less
  strictly than CLI args (typos there are usually copy/paste, not
  intent), and a hard error on a stray export would lock users out
  of running ``vcspull sync`` until they unset it.
why: Zero or negative ``--timeout`` is meaningless: the watchdog would
either return immediately (deadline already passed) or never (logically
invalid). The earlier permissive ``_resolve_repo_timeout`` silently fell
back to the default on these inputs, masking typos like ``--timeout
-10`` (intended ``10``) where the user assumed the flag took effect.
typer/click validate at parse time precisely so user typos surface
where the user can fix them.

what:
- ``cli/sync.py``: add ``_positive_int_arg(value: str) -> int``, an
  argparse ``type=`` callable that requires a positive integer; raises
  ``argparse.ArgumentTypeError`` on zero, negative, or non-numeric.
  Includes a doctest covering the happy path and three rejection
  branches.
- ``--timeout`` argparse entry uses ``type=_positive_int_arg`` in
  place of plain ``type=int``.
- ``_resolve_repo_timeout`` keeps its programmatic-caller fallback
  for ``cli_timeout <= 0``; doctest now notes the CLI surface is
  guarded by ``_positive_int_arg`` at parse time.
- ``VCSPULL_SYNC_TIMEOUT_SECONDS`` env path keeps log-warning +
  fallback (env vars are trusted less strictly than CLI args; a hard
  error on a stale export would lock users out).
why: ``setup_file_logger`` raises both the ``vcspull`` and ``libvcs``
*logger* levels to DEBUG so its FileHandler captures the full trace
for post-mortem. That bump would ordinarily open the same DEBUG
records to every other handler attached to those loggers (notably
the terminal ``StreamHandler``), but ``setup_logger`` pins
per-handler levels on every StreamHandler, which absorbs the bump
without leaking. The whole "libvcs DEBUG floodgate" regression that
``89649a86`` fixed was this invariant breaking. The code that
relies on it had no inline note, so a future "simplification" that
drops the per-handler pinning would silently re-open the floodgate
and the only signal would be the regression test going red.

what:
- ``log.py``: add an inline comment at the ``logger.setLevel(level)``
  bump in ``setup_file_logger`` explaining the precondition (the
  per-handler levels in ``setup_logger``) and naming the regression
  test (``test_setup_file_logger_does_not_open_stream_floodgate``)
  that catches it.
why: The marker subclass exists so the outer ``sync()`` ``except
KeyboardInterrupt`` catch can distinguish "inner layer already printed
a coloured partial summary, don't repeat it" from "Ctrl-C hit before
the formatter was even built; print the plain-stderr fallback". The
contract works because the only ``except KeyboardInterrupt`` between
the inner ``_sync_impl`` raise and process exit is :func:`sync`'s,
which subclass-checks. A future broader catch wrapping ``cli()`` would
need to either re-raise the subclass unchanged or perform the same
isinstance check -- otherwise the marker is swallowed, the inner
partial-summary prints, AND the outer fallback prints, restoring the
double-print artefact the marker was introduced to suppress. The raise
site and the catch site each had no inline note.

what:
- ``cli/sync.py``: extend ``_SyncInterruptedAfterSummary``'s docstring
  with an explicit "Invariant" section spelling out the "only catcher
  is sync()" precondition.
- Add a comment near the inner-layer ``raise _SyncInterruptedAfterSummary
  from None`` in ``_sync_impl`` pointing back at the docstring so the
  raise site is also self-documenting.
tony added 3 commits April 25, 2026 16:37
why: Reporter saw libvcs's ``|git| (rye) Failed to determine current
branch`` warning leak in alongside vcspull's own ``✗ Failed syncing
rye: Command failed with code 128: git symbolic-ref HEAD --short``
line -- two messages saying the same thing, breaking the
``✓ Synced X / ✗ Failed X / - Timed out X`` rhythm of the rest of the
output. libvcs's WARNING/ERROR records emit through the StreamHandler
at default verbosity (WARNING), and the indicator's stream-diverter
only redirected the bytes -- it didn't filter them out -- so the
libvcs prologue still printed before the vcspull summary.

what:
- ``cli/sync.py``: ``_install_indicator_log_diverter`` now also
  raises the libvcs StreamHandler level to ``logging.CRITICAL + 1``
  for the duration of the sync, but only when the level is at
  WARNING (default verbosity). 51 sits one above ``CRITICAL`` so
  no standard log record reaches the handler. The
  ``FileHandler`` is left untouched and still captures everything
  at DEBUG, so a post-mortem in the debug log still shows the
  libvcs entry. Users at ``-v`` (libvcs INFO) or ``-vv`` (libvcs
  DEBUG) opted into seeing libvcs activity; the diverter leaves
  their level alone.
- The restore closure now also walks the saved levels and resets
  them, mirroring the stream restore.
- ``tests/test_log.py``: two regression tests next to
  ``test_setup_file_logger_does_not_open_stream_floodgate`` (same
  shape: handler-level invariant under condition X). Asserts the
  bump from WARNING to ``CRITICAL + 1`` and the restore at default
  verbosity, plus the no-op when the user is at ``-v``.
- vcspull's StreamHandler is intentionally untouched: its WARNING
  records (e.g. ``Could not open debug log at ...``) all fire
  *before* the indicator is installed, but if any future call site
  emits a vcspull warning during the loop the user should still
  see it.
why: ``default_debug_log_path`` was dropping debug logs as bare files
in the system tempdir (``/tmp/vcspull-debug-<ts>-<pid>.log``). The
``vcspull-`` filename prefix gets lost in a busy ``/tmp/`` next to
``vscode-IPC-...``, ``systemd-private-...``, ``kotlin-idea-...``, etc.,
and there is no single dir to ``rm -rf`` to clean up old vcspull logs.
Reporter wanted the path grouped under a self-describing directory:
``/tmp/vcspull/`` for production runs, ``/tmp/vcspull-test/`` for
tests, so a glance at ``/tmp/`` says what's whose.

what:
- ``log.py``: ``default_debug_log_path`` now returns
  ``<tempdir>/vcspull/debug-<ts>-<pid>.log`` (the ``vcspull-`` prefix
  on the file is redundant once the directory carries the label).
- Auto-detect pytest via ``PYTEST_CURRENT_TEST`` and route to
  ``<tempdir>/vcspull-test/`` instead. This is a safety net for tests
  that incidentally call ``default_debug_log_path`` without first
  redirecting ``TMPDIR`` -- they'll never pollute the production
  ``vcspull/`` dir even when the test author forgets to monkeypatch.
  ``PYTEST_CURRENT_TEST`` is set by pytest for the entire test
  session (collection + execution + doctest), so the doctest assertion
  also picks the test dir.
- ``cli/sync.py``: ``--log-file`` help text mentions both production
  and test default paths.
- ``tests/cli/test_sync_log_file.py``: update the two tests asserting
  on the file name prefix to match the new ``debug-<ts>-<pid>.log``
  pattern. ``test_default_debug_log_path_lives_in_tempdir`` also
  asserts the parent dir name is ``vcspull-test`` (since that's what
  pytest sees), pinning the auto-detect.
- The doctest on ``default_debug_log_path`` accepts either
  ``vcspull`` or ``vcspull-test`` so the example reads correctly
  whether it runs under pytest or in a Python repl.
- Behaviour is wholly additive: no callers change. Existing debug
  logs from previous vcspull runs at ``/tmp/vcspull-debug-...log``
  remain on disk; users can clean them by hand once.
why: Two user-visible behaviours landed without CHANGES updates: the
libvcs WARNING/ERROR silencing during human-mode sync (so the
``|git| (rye) Failed to determine ...`` line no longer competes with
``✗ Failed syncing rye: ...``), and the debug-log directory regrouping
under ``$TMPDIR/vcspull/`` (and ``$TMPDIR/vcspull-test/`` under
pytest). Releasing without these would leave the changelog describing
behaviour that no longer matches reality.

what:
- ``CHANGES``: extend the existing "libvcs is quiet by default"
  unreleased section with a paragraph spelling out the sync-time
  StreamHandler silencing -- name the rye repro because that is the
  symptom users will recognise -- and stress that the debug log file
  still captures the entry for post-mortem.
- ``CHANGES``: rewrite the debug-log section to advertise the new
  ``$TMPDIR/vcspull/debug-<ts>-<pid>.log`` path (production) and the
  ``$TMPDIR/vcspull-test/...`` test-mode safety net, replacing the
  stale ``$TMPDIR/vcspull-debug-...`` description.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

sync hangs indefinitely on credential prompts and slow fetches

1 participant