WIP: UI smoke tests for axis, touchy, gmoccapy, qtdragon#3999
WIP: UI smoke tests for axis, touchy, gmoccapy, qtdragon#3999grandixximo wants to merge 9 commits intoLinuxCNC:masterfrom
Conversation
Phase 1 of LinuxCNC#3756: launch each GUI under xvfb-run against an existing sim config, drive Estop reset / machine on / home all via NML, assert the interpreter reaches IDLE, then shut down cleanly. Verifies the GUI starts and accepts basic commands without crashing. Skips gracefully (exit 77) when xvfb-run is not installed, matching the precedent set by tests/tooledit and tests/pyvcp. Shared helpers under _lib/: drive.py common NML driver, prints UI_SMOKE_OK on success launch.sh xvfb-run wrapper with setsid + signal escalation for clean linuxcnc shutdown (preserves shared memory cleanup via scripts/linuxcnc trap) checkresult.sh shared pass/fail check delegated to by per-test checkresult shims Each per-GUI directory exposes test.sh + checkresult and reuses the existing configs/sim/<gui>/*.ini so no test-only sim configs are introduced. Functional tests (load G-code, verify final position) and screenshot/ video on failure are deferred to follow-up phases. xvfb is already declared in debian/control (<!nocheck>) so apt-get build-dep installs it on CI; no new system deps required for this phase. Refs LinuxCNC#3756
CI failed with "Permission denied" exec'ing _lib/launch.sh because the local repo has core.filemode=false so chmod +x was not recorded in the git index. Use git update-index --chmod=+x to mark all test scripts as executable.
Two CI-driven fixes: 1. Per-GUI Python module preflight in launch.sh. test.sh now passes a comma-separated list of modules the GUI needs at import time; if any fail to import the test exits 77 (skipped) rather than wedging linuxcnc waiting for a GUI that will never come up. - axis: OpenGL.GL - touchy, gmoccapy: gi - qtdragon: PyQt5.QtCore, qtvcp Master CI does not currently install these runtime deps (Bertho's LinuxCNC#3391 work added them only to the 2.9 branch), so without preflight every smoke test failed with a wedged linuxcnc startup or an uninformative timeout. This way the tests skip cleanly until the deps land in master CI. 2. Wait up to 30s for the linuxcnc SIGTERM trap (scripts/linuxcnc Cleanup) to finish before SIGKILL. Earlier tighter window meant Cleanup got cut off mid-run and left shared memory attached, which caused subsequent tests in the same job to fail with SHMERR. Refs LinuxCNC#3756
The previous launch.sh had `echo "WARN: ..."` inside a `bash -c "..."` heredoc; the inner double quotes closed the outer string and the shutdown block was truncated. Symptom on CI: "linuxcnc: -c: line 34: syntax error: unexpected end of file" before any logs were captured. Switch to single quotes for the warning message. Also add cairo to gmoccapy's import preflight: gladevcp.makepins (loaded by gmoccapy) imports cairo via the led module, which trips on minimal CI without python3-cairo.
scripts/runtests does not honor exit 77 from a test.sh; its skip mechanism is a per-directory `skip` executable that returns non-zero when the test should be skipped. Add a shared _lib/skip-if-missing.sh and per-GUI skip scripts that check for xvfb-run plus the python modules each GUI needs. The launch.sh preflight stays as a fallback. Modules required: axis OpenGL.GL touchy gi, cairo gmoccapy gi, cairo qtdragon PyQt5.QtCore, qtvcp
Forward port of the GUI dependency work from 2.9 (LinuxCNC#3391). The runtime deps were already in linuxcnc-uspace's Depends, but apt-get build-dep on CI does not install runtime deps, which left the new ui-smoke tests unable to launch any GUI and forced them to skip. Adds python3-opengl, python3-pyqt5, python3-pyqt5.qsci, python3-cairo, python3-gi, python3-gi-cairo, gir1.2-gtk-3.0 under the !nocheck profile, matching the existing pattern for xvfb and x11-xserver-utils. Edited debian/control.top.in (debian/control is gitignored and regenerated by debian/configure). Refs LinuxCNC#3391, LinuxCNC#3756
CI run after the first dep batch revealed gmoccapy needs the GtkSource-4 typelib, qtdragon needs additional PyQt5 modules (qtsvg/qtopengl/qtwebengine), python3-qtpy, and the dbus mainloop binding. Add these to Build-Depends with !nocheck profile so they install on apt-get build-dep. Also extend skip-if-missing.sh to verify gi typelibs (entries of the form gi:Namespace:version), not just python imports. This catches the GtkSource case where gi imports fine but the typelib is absent, which gladevcp tripped on at gi.require_version time. touchy and gmoccapy skip predicates now require Gtk-3.0 (and GtkSource-4 for gmoccapy). Refs LinuxCNC#3756
The previous driver did too much for a smoke layer (Estop reset, machine on, home all, wait for IDLE) and tripped on each GUI's specific startup sequence assumptions. Reduce to: connect to NML, wait for task ready, sleep 3s for GUI construction, recheck task alive, print UI_SMOKE_OK. This is the literal answer to Bertho's "does it start" question. Functional behaviour belongs in tests/ui-functional/ (Phase 2). Also harden shutdown: extend the SIGTERM grace from 30s to 60s, and add a halrun -U + explicit ipcrm fallback if Cleanup still has not finished. Removes /tmp/linuxcnc.lock too. Without this the next ui-smoke test inherited stale shared memory and wedged at startup. Bump LINUXCNC_TIMEOUT to 180s (8s startup + 30s driver + 60s grace + slack) and reduce DRIVER_TIMEOUT to 30s now that the driver work is small. Refs LinuxCNC#3756
CI run after the previous fix made progress (0 shmem errors, axis and gmoccapy passing) but qtdragon hit "bind error: 98 -- Address already in use" on NML port 5005, meaning gmoccapy's linuxcncsvr was still alive when qtdragon tried to start. touchy then cascaded. Add a pre-launch cleanup to launch.sh that pkills the known long-lived processes (linuxcncsvr, milltask, halui, hal_bridge, axis, gmoccapy, touchy, qtvcp, rtapi_app), removes /tmp/linuxcnc.lock, runs halrun -U, and ipcrms any leftover linuxcnc shared memory keys before each test. Refs LinuxCNC#3756
| # Give task time to come up before driver attaches. The GUI also | ||
| # needs time to register and home up to the point where it accepts | ||
| # commands; 8s is conservative for headless sim runs. | ||
| sleep 8 |
There was a problem hiding this comment.
Isn't there a way to detect what you are waiting for? Timed starts may fail when CI is busy and real-clock timeouts do not match the machine's activity.
This may also be a problem other places where a real-clock timeout is playing out.
| kill -KILL -\$LINUXCNC_PGID 2>/dev/null || true | ||
| sleep 2 | ||
| halrun -U 2>/dev/null || true | ||
| for key in 0x48414c32 0x48484c34 0x00000064; do |
There was a problem hiding this comment.
Duplicate key list. Make an array once and index that. Then you can also add stuff when needed. (see also key list above)
| shmid=\$(ipcs -m | awk -v k=\$key 'tolower(\$1)==k {print \$2}') | ||
| [ -n \"\$shmid\" ] && ipcrm -m \$shmid 2>/dev/null || true |
There was a problem hiding this comment.
Duplication? --> function?
| If xvfb-run is not available on the host, tests skip gracefully (matches | ||
| the precedent set by tests/tooledit and tests/pyvcp). |
There was a problem hiding this comment.
I think we need to discuss this. We'd want to run this on CI and then need the xvfb-run. However, if it skips gracefully, then CI does not fail and we don't know whether the code is sane. Something like "damned if you do and damned if you don't" situation?
There was a problem hiding this comment.
I would just fail the test. In CI, we can make sure that xvfb-run is there and locally on a PC, it won't hurt anybody.
If anytime there is an issue in CI that can not be easily corrected, just add "continue-on-error: true" until it is fixed.
If you manage to create consistent screenshots and want to go to pedantic mode:
Probably over complicated and I don't know how deterministic LinuxCNC is but this way, bugs like this #3979 can be easily avoided. Testing manually, these kind of bugs are just often overlooked. |
Draft, opening for CI feedback. Refs #3756.
Summary
Phase 1 of the GUI test work tracked in #3756. Each test launches a GUI under
xvfb-runagainst an existingconfigs/sim/<gui>/*.ini, drives Estop reset / machine on / home all via NML, asserts the interpreter reaches IDLE, then shuts down cleanly. Verifies the GUI starts and accepts basic commands without crashing.Coverage
Mechanics
tests/ui-smoke/_lib/launch.sh:xvfb-runwrapper,setsidso the linuxcnc process group can be signalled cleanly, falls back toaxis-remote --quitthen SIGTERM with grace then SIGKILL. Skips with exit 77 ifxvfb-runis unavailable (matchestests/tooleditandtests/pyvcp).tests/ui-smoke/_lib/drive.py: NML driver. Tolerant of sim configs that come up already inSTATE_ONvia auto-estop-release HAL wiring. Falls back to per-joint serial homing if noHOME_SEQUENCEis configured.tests/ui-smoke/_lib/checkresult.sh: pass whenUI_SMOKE_OKprinted and no crash markers in captured logs.Cleanup discipline
.gitignorecovers all runtime artifacts (linuxcnc.{out,err,pid},ui-smoke.{out,err},result,stderr)Deps
xvfbis already declared indebian/controlwith the<!nocheck>profile soapt-get build-depinstalls it on the existing CI without a workflow change. Coordinated with @hdiethelm in #3984: this PR adds no system deps; if his lands first, no rebase needed here.Out of scope (deferred)
linuxcnc.command.program_open+auto(RUN), verify final position vialinuxcnc.stat.position. Per-GUI cross-checks viaxdotoolor AT-SPI where useful.Test plan
scripts/runtests tests/ui-smoke, no shmem leaks