Skip to content

WIP: UI smoke tests for axis, touchy, gmoccapy, qtdragon#3999

Draft
grandixximo wants to merge 9 commits intoLinuxCNC:masterfrom
grandixximo:ui-tests
Draft

WIP: UI smoke tests for axis, touchy, gmoccapy, qtdragon#3999
grandixximo wants to merge 9 commits intoLinuxCNC:masterfrom
grandixximo:ui-tests

Conversation

@grandixximo
Copy link
Copy Markdown
Contributor

Draft, opening for CI feedback. Refs #3756.

Summary

Phase 1 of the GUI test work tracked in #3756. Each test launches a GUI under xvfb-run against an existing configs/sim/<gui>/*.ini, drives Estop reset / machine on / home all via NML, asserts the interpreter reaches IDLE, then shuts down cleanly. Verifies the GUI starts and accepts basic commands without crashing.

Coverage

  • axis
  • touchy
  • gmoccapy
  • qtdragon (qtdragon_xyz/qtdragon_metric.ini)

Mechanics

  • tests/ui-smoke/_lib/launch.sh: xvfb-run wrapper, setsid so the linuxcnc process group can be signalled cleanly, falls back to axis-remote --quit then SIGTERM with grace then SIGKILL. Skips with exit 77 if xvfb-run is unavailable (matches tests/tooledit and tests/pyvcp).
  • tests/ui-smoke/_lib/drive.py: NML driver. Tolerant of sim configs that come up already in STATE_ON via auto-estop-release HAL wiring. Falls back to per-joint serial homing if no HOME_SEQUENCE is configured.
  • tests/ui-smoke/_lib/checkresult.sh: pass when UI_SMOKE_OK printed and no crash markers in captured logs.
  • Reuses existing sim configs, no test-only INI files.

Cleanup discipline

  • .gitignore covers all runtime artifacts (linuxcnc.{out,err,pid}, ui-smoke.{out,err}, result, stderr)
  • 4 consecutive runs locally: 4/4 pass, 0 shmem errors, working tree clean (no untracked files added beyond the committed test scripts). Aligns with the clean-tree gate Bertho asked for and that @hdiethelm is wiring up in CI improvemens: General improvements #3984.

Deps

xvfb is already declared in debian/control with the <!nocheck> profile so apt-get build-dep installs it on the existing CI without a workflow change. Coordinated with @hdiethelm in #3984: this PR adds no system deps; if his lands first, no rebase needed here.

Out of scope (deferred)

  • Phase 2: load a small G-code file via linuxcnc.command.program_open + auto(RUN), verify final position via linuxcnc.stat.position. Per-GUI cross-checks via xdotool or AT-SPI where useful.
  • Phase 3: screenshot or short video on failure, uploaded as CI artifact.

Test plan

Phase 1 of LinuxCNC#3756: launch each GUI under xvfb-run against an existing
sim config, drive Estop reset / machine on / home all via NML, assert
the interpreter reaches IDLE, then shut down cleanly. Verifies the GUI
starts and accepts basic commands without crashing.

Skips gracefully (exit 77) when xvfb-run is not installed, matching
the precedent set by tests/tooledit and tests/pyvcp.

Shared helpers under _lib/:
  drive.py        common NML driver, prints UI_SMOKE_OK on success
  launch.sh       xvfb-run wrapper with setsid + signal escalation for
                  clean linuxcnc shutdown (preserves shared memory
                  cleanup via scripts/linuxcnc trap)
  checkresult.sh  shared pass/fail check delegated to by per-test
                  checkresult shims

Each per-GUI directory exposes test.sh + checkresult and reuses the
existing configs/sim/<gui>/*.ini so no test-only sim configs are
introduced.

Functional tests (load G-code, verify final position) and screenshot/
video on failure are deferred to follow-up phases.

xvfb is already declared in debian/control (<!nocheck>) so apt-get
build-dep installs it on CI; no new system deps required for this
phase.

Refs LinuxCNC#3756
CI failed with "Permission denied" exec'ing _lib/launch.sh because the
local repo has core.filemode=false so chmod +x was not recorded in the
git index. Use git update-index --chmod=+x to mark all test scripts
as executable.
Two CI-driven fixes:

1. Per-GUI Python module preflight in launch.sh. test.sh now passes a
   comma-separated list of modules the GUI needs at import time; if
   any fail to import the test exits 77 (skipped) rather than wedging
   linuxcnc waiting for a GUI that will never come up.

   - axis: OpenGL.GL
   - touchy, gmoccapy: gi
   - qtdragon: PyQt5.QtCore, qtvcp

   Master CI does not currently install these runtime deps (Bertho's
   LinuxCNC#3391 work added them only to the 2.9 branch), so without preflight
   every smoke test failed with a wedged linuxcnc startup or an
   uninformative timeout. This way the tests skip cleanly until the
   deps land in master CI.

2. Wait up to 30s for the linuxcnc SIGTERM trap (scripts/linuxcnc
   Cleanup) to finish before SIGKILL. Earlier tighter window meant
   Cleanup got cut off mid-run and left shared memory attached, which
   caused subsequent tests in the same job to fail with SHMERR.

Refs LinuxCNC#3756
The previous launch.sh had `echo "WARN: ..."` inside a `bash -c "..."`
heredoc; the inner double quotes closed the outer string and the
shutdown block was truncated. Symptom on CI: "linuxcnc: -c: line 34:
syntax error: unexpected end of file" before any logs were captured.

Switch to single quotes for the warning message. Also add cairo to
gmoccapy's import preflight: gladevcp.makepins (loaded by gmoccapy)
imports cairo via the led module, which trips on minimal CI without
python3-cairo.
scripts/runtests does not honor exit 77 from a test.sh; its skip
mechanism is a per-directory `skip` executable that returns non-zero
when the test should be skipped. Add a shared _lib/skip-if-missing.sh
and per-GUI skip scripts that check for xvfb-run plus the python
modules each GUI needs. The launch.sh preflight stays as a fallback.

Modules required:
  axis      OpenGL.GL
  touchy    gi, cairo
  gmoccapy  gi, cairo
  qtdragon  PyQt5.QtCore, qtvcp
Forward port of the GUI dependency work from 2.9 (LinuxCNC#3391). The runtime
deps were already in linuxcnc-uspace's Depends, but apt-get build-dep
on CI does not install runtime deps, which left the new ui-smoke tests
unable to launch any GUI and forced them to skip.

Adds python3-opengl, python3-pyqt5, python3-pyqt5.qsci, python3-cairo,
python3-gi, python3-gi-cairo, gir1.2-gtk-3.0 under the !nocheck profile,
matching the existing pattern for xvfb and x11-xserver-utils.

Edited debian/control.top.in (debian/control is gitignored and
regenerated by debian/configure).

Refs LinuxCNC#3391, LinuxCNC#3756
CI run after the first dep batch revealed gmoccapy needs the
GtkSource-4 typelib, qtdragon needs additional PyQt5 modules
(qtsvg/qtopengl/qtwebengine), python3-qtpy, and the dbus mainloop
binding. Add these to Build-Depends with !nocheck profile so they
install on apt-get build-dep.

Also extend skip-if-missing.sh to verify gi typelibs (entries of the
form gi:Namespace:version), not just python imports. This catches
the GtkSource case where gi imports fine but the typelib is absent,
which gladevcp tripped on at gi.require_version time.

touchy and gmoccapy skip predicates now require Gtk-3.0 (and
GtkSource-4 for gmoccapy).

Refs LinuxCNC#3756
The previous driver did too much for a smoke layer (Estop reset,
machine on, home all, wait for IDLE) and tripped on each GUI's
specific startup sequence assumptions. Reduce to: connect to NML,
wait for task ready, sleep 3s for GUI construction, recheck task
alive, print UI_SMOKE_OK. This is the literal answer to Bertho's
"does it start" question. Functional behaviour belongs in
tests/ui-functional/ (Phase 2).

Also harden shutdown: extend the SIGTERM grace from 30s to 60s, and
add a halrun -U + explicit ipcrm fallback if Cleanup still has not
finished. Removes /tmp/linuxcnc.lock too. Without this the next
ui-smoke test inherited stale shared memory and wedged at startup.

Bump LINUXCNC_TIMEOUT to 180s (8s startup + 30s driver + 60s grace +
slack) and reduce DRIVER_TIMEOUT to 30s now that the driver work is
small.

Refs LinuxCNC#3756
CI run after the previous fix made progress (0 shmem errors, axis and
gmoccapy passing) but qtdragon hit "bind error: 98 -- Address already
in use" on NML port 5005, meaning gmoccapy's linuxcncsvr was still
alive when qtdragon tried to start. touchy then cascaded.

Add a pre-launch cleanup to launch.sh that pkills the known long-lived
processes (linuxcncsvr, milltask, halui, hal_bridge, axis, gmoccapy,
touchy, qtvcp, rtapi_app), removes /tmp/linuxcnc.lock, runs halrun -U,
and ipcrms any leftover linuxcnc shared memory keys before each test.

Refs LinuxCNC#3756
Comment on lines +73 to +76
# Give task time to come up before driver attaches. The GUI also
# needs time to register and home up to the point where it accepts
# commands; 8s is conservative for headless sim runs.
sleep 8
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't there a way to detect what you are waiting for? Timed starts may fail when CI is busy and real-clock timeouts do not match the machine's activity.

This may also be a problem other places where a real-clock timeout is playing out.

kill -KILL -\$LINUXCNC_PGID 2>/dev/null || true
sleep 2
halrun -U 2>/dev/null || true
for key in 0x48414c32 0x48484c34 0x00000064; do
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate key list. Make an array once and index that. Then you can also add stuff when needed. (see also key list above)

Comment on lines +106 to +107
shmid=\$(ipcs -m | awk -v k=\$key 'tolower(\$1)==k {print \$2}')
[ -n \"\$shmid\" ] && ipcrm -m \$shmid 2>/dev/null || true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplication? --> function?

Comment thread tests/ui-smoke/README
Comment on lines +20 to +21
If xvfb-run is not available on the host, tests skip gracefully (matches
the precedent set by tests/tooledit and tests/pyvcp).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to discuss this. We'd want to run this on CI and then need the xvfb-run. However, if it skips gracefully, then CI does not fail and we don't know whether the code is sane. Something like "damned if you do and damned if you don't" situation?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just fail the test. In CI, we can make sure that xvfb-run is there and locally on a PC, it won't hurt anybody.
If anytime there is an issue in CI that can not be easily corrected, just add "continue-on-error: true" until it is fixed.

@hdiethelm
Copy link
Copy Markdown
Contributor

Phase 3: screenshot or short video on failure, uploaded as CI artifact.

If you manage to create consistent screenshots and want to go to pedantic mode:

  • Store reference known good screenshots (TBD where, I often use submodules for test data storage so the main repo is not overfilled and it is still tracked)
  • Take screenshots at certain points where everything is static, like before / after homing / at the end
  • Compare to the reference and highlight any differences, fail if there are differences -> Artifact
  • The dev can download the artifacts, check them manually and if the change was on purpose replace the known good ones, so the CI passes again

Probably over complicated and I don't know how deterministic LinuxCNC is but this way, bugs like this #3979 can be easily avoided. Testing manually, these kind of bugs are just often overlooked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants