sound tests: Test hot-unload

christos commented

2026-04-15 17:35:30 +00:00

Member

Sponsored by: The FreeBSD Foundation
MFC after: 1 week

Sponsored by: The FreeBSD Foundation MFC after: 1 week

christos added 1 commit

2026-04-15 17:35:32 +00:00

sound tests: Test hot-unload

Checklist / commit (pull_request_target) Has been cancelled

Details

Cross-build Kernel / amd64 ubuntu-22.04 (clang-15) (pull_request) Has been cancelled

Details

Cross-build Kernel / aarch64 ubuntu-22.04 (clang-15) (pull_request) Has been cancelled

Details

Cross-build Kernel / amd64 ubuntu-24.04 (clang-18) (pull_request) Has been cancelled

Details

Cross-build Kernel / aarch64 ubuntu-24.04 (clang-18) (pull_request) Has been cancelled

Details

Cross-build Kernel / amd64 macos-latest (clang-18) (pull_request) Has been cancelled

Details

Cross-build Kernel / aarch64 macos-latest (clang-18) (pull_request) Has been cancelled

Details

Style Checker / Style Checker (pull_request) Has been cancelled

Details

6e1ef7e9ac

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week

christos requested reviews from emaste, kib, markj

2026-04-15 17:36:21 +00:00

christos commented

2026-04-15 17:39:29 +00:00

Author

Member

This is problematic when running with -v parallelism because it might unload snd_dummy when the other tests are using it. For instance:

# kyua -v parallelism=8 test
pcm_read_write:pcm_write  ->  passed  [0.017s]
pcm_read_write:pcm_read  ->  passed  [0.022s]
pcm_read_write:pcm_format_bits  ->  passed  [0.025s]
polling:poll_poll  ->  passed  [0.124s]
polling:poll_kqueue  ->  passed  [0.456s]
sndstat:sndstat_nv  ->  passed  [0.503s]
sndstat:sndstat_udev  ->  passed  [0.519s]


Fatal trap 12: page fault while in kernel mode
cpuid = 4; apic id = 04
fault virtual address	= 0x8
fault code		= supervisor read data, page not present
instruction pointer	= 0x20:0xffffffff80bf8d6c
stack pointer	        = 0x28:0xfffffe00c3da6930
frame pointer	        = 0x28:0xfffffe00c3da6ae0
code segment		= base 0x0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 6246 (polling)
rdi: fffff80004509cc0 rsi: 0000000000000009 rdx: ffffffff83142323
rcx: fffff801cdfa4780  r8: 00000000000004d6  r9: deadc0dedeadc0de
rax: 0000000000000000 rbx: 0000000000000000 rbp: fffffe00c3da6ae0
r10: ffffffff83142323 r11: 0000000000010000 r12: 00000000000004d6
r13: 0000000000000009 r14: ffffffff83142323 r15: fffff80004509cc0
trap number		= 12
panic: page fault
cpuid = 4
time = 1776281449
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00c3da6660
vpanic() at vpanic+0x13f/frame 0xfffffe00c3da6790
panic() at panic+0x43/frame 0xfffffe00c3da67f0
trap_pfault() at trap_pfault+0x422/frame 0xfffffe00c3da6860
calltrap() at calltrap+0x8/frame 0xfffffe00c3da6860
--- trap 0xc, rip = 0xffffffff80bf8d6c, rsp = 0xfffffe00c3da6930, rbp = 0xfffffe00c3da6ae0 ---
witness_checkorder() at witness_checkorder+0x7c/frame 0xfffffe00c3da6ae0
__mtx_lock_flags() at __mtx_lock_flags+0x91/frame 0xfffffe00c3da6b30
dsp_ioctl() at dsp_ioctl+0x1845/frame 0xfffffe00c3da6bb0
devfs_ioctl() at devfs_ioctl+0xd2/frame 0xfffffe00c3da6c00
VOP_IOCTL_APV() at VOP_IOCTL_APV+0x54/frame 0xfffffe00c3da6c30
vn_ioctl() at vn_ioctl+0xc2/frame 0xfffffe00c3da6ca0
devfs_ioctl_f() at devfs_ioctl_f+0x1e/frame 0xfffffe00c3da6cc0
kern_ioctl() at kern_ioctl+0x267/frame 0xfffffe00c3da6d30
sys_ioctl() at sys_ioctl+0x143/frame 0xfffffe00c3da6e00
amd64_syscall() at amd64_syscall+0x16c/frame 0xfffffe00c3da6f30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe00c3da6f30
--- syscall (54, FreeBSD ELF64, ioctl), rip = 0x214937d7d0ea, rsp = 0x214935dffa68, rbp = 0x214935dffaa0 ---
KDB: enter: panic
[ thread pid 6246 tid 100858 ]
Stopped at      kdb_enter+0x33: movq    $0,0x161c472(%rip)
db>

That being said, I'm wondering what the solution is. We already have checks inside sound(4)'s devfs methods to see if the device has been detached (see DSP_REGISTERED()), but because these happen at the beginning of the function, it is not guaranteed that the device won't be detached at some point when a devfs method releases the lock for a while. I'm trying to figure out an elegant solution for this, because adding a DSP_REGISTERED() check after every single lock release seems way too tedious. Any thoughts?

This is problematic when running with `-v parallelism` because it might unload `snd_dummy` when the other tests are using it. For instance: ``` # kyua -v parallelism=8 test pcm_read_write:pcm_write -> passed [0.017s] pcm_read_write:pcm_read -> passed [0.022s] pcm_read_write:pcm_format_bits -> passed [0.025s] polling:poll_poll -> passed [0.124s] polling:poll_kqueue -> passed [0.456s] sndstat:sndstat_nv -> passed [0.503s] sndstat:sndstat_udev -> passed [0.519s] Fatal trap 12: page fault while in kernel mode cpuid = 4; apic id = 04 fault virtual address = 0x8 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80bf8d6c stack pointer = 0x28:0xfffffe00c3da6930 frame pointer = 0x28:0xfffffe00c3da6ae0 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 6246 (polling) rdi: fffff80004509cc0 rsi: 0000000000000009 rdx: ffffffff83142323 rcx: fffff801cdfa4780 r8: 00000000000004d6 r9: deadc0dedeadc0de rax: 0000000000000000 rbx: 0000000000000000 rbp: fffffe00c3da6ae0 r10: ffffffff83142323 r11: 0000000000010000 r12: 00000000000004d6 r13: 0000000000000009 r14: ffffffff83142323 r15: fffff80004509cc0 trap number = 12 panic: page fault cpuid = 4 time = 1776281449 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00c3da6660 vpanic() at vpanic+0x13f/frame 0xfffffe00c3da6790 panic() at panic+0x43/frame 0xfffffe00c3da67f0 trap_pfault() at trap_pfault+0x422/frame 0xfffffe00c3da6860 calltrap() at calltrap+0x8/frame 0xfffffe00c3da6860 --- trap 0xc, rip = 0xffffffff80bf8d6c, rsp = 0xfffffe00c3da6930, rbp = 0xfffffe00c3da6ae0 --- witness_checkorder() at witness_checkorder+0x7c/frame 0xfffffe00c3da6ae0 __mtx_lock_flags() at __mtx_lock_flags+0x91/frame 0xfffffe00c3da6b30 dsp_ioctl() at dsp_ioctl+0x1845/frame 0xfffffe00c3da6bb0 devfs_ioctl() at devfs_ioctl+0xd2/frame 0xfffffe00c3da6c00 VOP_IOCTL_APV() at VOP_IOCTL_APV+0x54/frame 0xfffffe00c3da6c30 vn_ioctl() at vn_ioctl+0xc2/frame 0xfffffe00c3da6ca0 devfs_ioctl_f() at devfs_ioctl_f+0x1e/frame 0xfffffe00c3da6cc0 kern_ioctl() at kern_ioctl+0x267/frame 0xfffffe00c3da6d30 sys_ioctl() at sys_ioctl+0x143/frame 0xfffffe00c3da6e00 amd64_syscall() at amd64_syscall+0x16c/frame 0xfffffe00c3da6f30 fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe00c3da6f30 --- syscall (54, FreeBSD ELF64, ioctl), rip = 0x214937d7d0ea, rsp = 0x214935dffa68, rbp = 0x214935dffaa0 --- KDB: enter: panic [ thread pid 6246 tid 100858 ] Stopped at kdb_enter+0x33: movq $0,0x161c472(%rip) db> ``` That being said, I'm wondering what the solution is. We already have checks inside `sound(4)`'s devfs methods to see if the device has been detached (see `DSP_REGISTERED()`), but because these happen at the beginning of the function, it is not guaranteed that the device won't be detached at some point when a devfs method releases the lock for a while. I'm trying to figure out an elegant solution for this, because adding a `DSP_REGISTERED()` check after every single lock release seems way too tedious. Any thoughts?

kib commented

2026-04-16 00:09:14 +00:00

Member

If you properly use destroy_dev() and devfs cdevpriv, the priv data should be intact until all threads leave the cdevsw methods.

christos commented

2026-04-17 08:03:19 +00:00

Author

Member

@kib wrote in #13 (comment):

If you properly use destroy_dev() and devfs cdevpriv, the priv data should be intact until all threads leave the cdevsw methods.

Can you elaborate on what properly means?

@kib wrote in https://ron-dev.freebsd.org/FreeBSD/src/pulls/13#issuecomment-90: > If you properly use destroy_dev() and devfs cdevpriv, the priv data should be intact until all threads leave the cdevsw methods. Can you elaborate on what properly means?

kib commented

2026-04-17 23:27:19 +00:00

Member

At very least, if outside of the cdevsw methods, where it is handled automatically, the code must not touch device data unless dev_refthread() or like returned success.

In any case, the case in the comment should be debugged, it might be that this is just a race in driver code and not the KPI mis-use.

At very least, if outside of the cdevsw methods, where it is handled automatically, the code must not touch device data unless dev_refthread() or like returned success. In any case, the case in the comment should be debugged, it might be that this is just a race in driver code and not the KPI mis-use.

Checklist / commit (pull_request_target) Has been cancelled

Details

Cross-build Kernel / amd64 ubuntu-22.04 (clang-15) (pull_request) Has been cancelled

Details

Cross-build Kernel / aarch64 ubuntu-22.04 (clang-15) (pull_request) Has been cancelled

Details

Cross-build Kernel / amd64 ubuntu-24.04 (clang-18) (pull_request) Has been cancelled

Details

Cross-build Kernel / aarch64 ubuntu-24.04 (clang-18) (pull_request) Has been cancelled

Details

Cross-build Kernel / amd64 macos-latest (clang-18) (pull_request) Has been cancelled

Details

Cross-build Kernel / aarch64 macos-latest (clang-18) (pull_request) Has been cancelled

Details

Style Checker / Style Checker (pull_request) Has been cancelled

Details

This pull request can be merged automatically.

This branch is out-of-date with the base branch

You are not authorized to merge this pull request.

View command line instructions

Checkout

From your project repository, check out a new branch and test the changes.

git fetch -u origin +refs/pull/13/head:christos/hotunload

git switch christos/hotunload

Rows
Columns

sound tests: Test hot-unload #13

Checkout