Change various kernel APIs to work with pointers instead of vm_offset_t #2

Open
jhb wants to merge 20 commits from reviews/jhb/pmap_pointer into main
Member

In CHERI kernels, pointers are not the same size as addresses, and vm_offset_t represents a virtual address (ptraddr_t). To handle places that use vm_offset_t to hold pointers that can be dereferenced by the CPU, CheriBSD adds a new vm_pointer_t type (uintptr_t). However, using vm_pointer_t can be a bit fragile, as C compilers will silently accept conversions between uintptr_t and ptraddr_t. The resulting errors can only be found at runtime. However, if pointers are stored as a pointer type such as void *, then mismatches between addresses and pointers can be caught at compile time including when compiling on non-CHERI architectures. As such, this series converts various kernel pmap/virtual memory kernel APIs that currently use vm_offset_t to use pointer types (either void * or char *) instead. Normally void * is used, but char * is preferred if the API or structure member is commonly used in pointer arithmetic. In practice, these changes seem to be cleaner as most kernel consumers for many of these APIs had to cast between pointer types and vm_offset_t anyway and now those casts can be removed.

In CHERI kernels, pointers are not the same size as addresses, and vm_offset_t represents a virtual address (ptraddr_t). To handle places that use vm_offset_t to hold pointers that can be dereferenced by the CPU, CheriBSD adds a new vm_pointer_t type (uintptr_t). However, using vm_pointer_t can be a bit fragile, as C compilers will silently accept conversions between uintptr_t and ptraddr_t. The resulting errors can only be found at runtime. However, if pointers are stored as a pointer type such as void *, then mismatches between addresses and pointers can be caught at compile time including when compiling on non-CHERI architectures. As such, this series converts various kernel pmap/virtual memory kernel APIs that currently use vm_offset_t to use pointer types (either void * or char *) instead. Normally void * is used, but char * is preferred if the API or structure member is commonly used in pointer arithmetic. In practice, these changes seem to be cleaner as most kernel consumers for many of these APIs had to cast between pointer types and vm_offset_t anyway and now those casts can be removed.
jhb added 68 commits 2026-03-07 16:55:02 +00:00
release the refcount of link-local prefix information to ensure
it gets freed when the address is deleted.

Reviewed By: zlei, ivy
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D55593
The function pmap_is_valid_memattr(pmap, mode) checks whether the
given variable mode is between the two constant values
VM_MEMATTR_DEVICE and VM_MEMATTR_WRITE_THROUGH.
After the code for this function was written, the value of
VM_MEMATTR_DEVICE changed from 0 to 4. Since VM_MEMATTR_WRITE_THROUGH
is still 3, the condition is always false.
This patch changes the condition to check whether mode is equal to any
of the VM_MEMATTR* constants.

Reviewed by:		andrew, tuexen
MFC after:		1 week
Differential Revision:	https://reviews.freebsd.org/D55534
virtio_pci uses two loader tunables that should be more visible.
This patch adds these loader tunables to sysctl and describes them
in the virtio(4) man page.

Reviewed by:		imp (erlier version), tuexen
MFC after:		1 week
Differential Revision:	https://reviews.freebsd.org/D55533
We retired most obsolete 10 and 10/100 Ethernet NIC drivers in 2019 --
see commits following ebcf740a32 ("FCP-101: remove obsolete 10 and
10/100 Ethernet drivers.).

le(4) was retained with with the note "Emulated by QEMU, alternatives
don't yet work for mips64."  MIPS has since been removed from the tree
and emulators and virtual machines offer many other, more suitable
devices.

Reviewed by:	brooks
Relnotes:	Yes
Sponsored by:	The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D55516
These were obtained from IBM specs and actual tapes/drives.

Standard LTO-10 cartriges hold 30TB raw, 75TB with 2.5:1 compression.
Premium LTO-10 cartridges hold 40TB raw, 100TB with 2.5:1 compression.
LTO-10 tape drives are not backward compatible with previous generation
LTO tapes. (This is a change from older generation drives.)

Since the Premium tape is a new thing for LTO, we'll call this density
code LTO-10P vs. the standard LTO-10.  The barcode identifier for LTO-10
tapes is "LA"; the barcode identifier for LTO-10P tapes is "PA".

LTO-10 cartridges contain 1035m of tape, while LTO-10 Premium
cartridges contain 1337m of tape and have slightly higher density.
(Obtained from MAM data on actual tape cartridges and the density
report, obtained via 'mt getdensity'.)  LTO-10 cartridges use a
polyethylene naphthalate (PEN) film substrate. LTO-10 Premium
cartridges use an Aramid (aromatic polyamide) substrate that is thinner
and stronger, allowing a longer tape to fit in the same cartridge form
factor.

usr.bin/mt/mt.1:
	Add density codes and specs for LTO-10 and LTO-10P.

lib/libmt/mtlib.c:
	Add density codes for LTO-10 and LTO-10P.

Sponsored by:	Spectra Logic
MFC after:	3 days
zstdcat is equivalent to zstd -dcf, and matches our intention.

Suggested by:	delphij (in D55101)
Sponsored by:	The FreeBSD Foundation
Move them to the usr.bin section.

Fixes: de5663609e ("This is the new crunch utility for making...")
Changes: https://github.com/eggert/tz/blob/2026a/NEWS

MFC after:      3 days
Per Wikipedia, ACPI WMI support is available on all x86* platforms
and ARM platforms. Add the source to `files.arm64` so code that relies
on its headers (thunderbolt(4) for instance), can be built on ARM64.

MFC after:	1 month
Reviewed By:	andrew
Differential Revision: https://reviews.freebsd.org/D55535
This change moves the thunderbolt module and other USB modules under a
MK_USB != no conditional to ensure that users not desiring USB support
can easily build systems without USB-specific drivers using this knob.

MFC after:	1 week
Reviewed By:	imp
Differential Revision: https://reviews.freebsd.org/D55576
Before this change, `make test-includes` (run as part of buildworld)
would place test files in the current directory, which would clutter up
git clones. Run `make obj` beforehand to ensure that the files are put
in `${.OBJDIR}` instead of `${.CURDIR}`. This helps cut down on the
noise significantly when running commands like `git status`.

MFC after:	1 week
Differential Revision: https://reviews.freebsd.org/D55499
If we have virtual_oss running, this devd notification will make sure to
automatically transfer sound to the new default unit, while also making
sure that we switch to it only for the supported directions (recording
and/or playback).

For more information, please refer to 2ffaca551e ("snd_hda: Implement
automatic redirection between associations").

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D55530
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D55531
We now have devd rules in snd.conf that achieve this in a much cleaner
way.

This reverts commit 9aac27599a.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D55532
Prior to commit 6973701a00 ("1. Make the BSD version of cpio the
default [1]") GNU cpio was installed unconditionally.  The BSD_CPIO
option was added when we introduced the BSD licensed, libarchive-based
cpio, to support installation of GNU cpio, libarchive cpio, or both.

GNU cpio was removed long ago and there is no longer a need for this
option.  We can just install BSD cpio unconditionally.

Reviewed by:	des
Sponsored by:	The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D55467
While the majority of virtio platforms will be fully coherent, some may
require cache maintenance or other specific device memory handling (eg for
secure partitioning). Using bus_dma allows for these usecases.

The virtio buffers are marked as coherent; this should ensure that sync
calls are no-ops in the common cases.

Reviewed by:	andrew
Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D54959
While the majority of virtio platforms will be fully coherent, some may
require cache maintenance or other specific device memory handling (eg for
secure partitioning). Using bus_dma allows for these usecases.

The virtio buffers are marked as coherent; this should ensure that sync
calls are no-ops in the common cases.

Reviewed by:	andrew, br
Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D54960
restart is a boolean. While I'm here, convert to a bool.

Sponsored by:		Netflix
Differential Revision:	https://reviews.freebsd.org/D55518
Move ndasetgeom up in the file. We'll need it here for future
commits. Also, preserve the UNMAPPED_BIO flag since we can't observe
enough data from this routine to set it directly.

Sponsored by:		Netflix
Differential Revision:	https://reviews.freebsd.org/D55519
SCSI and ATA drives rescan the drive on opens to catch changes to the
disk. We do it here to so we catch if a drive has been FORMATed or
SANITIZEd with different parameters. We don't use xpt_rescan() since we
don't want to interfere with boot or keep all busses locked (this rescan
won't change the bus, so we don't need the CAM topo lock).

Sponsored by:		Netflix
Differential Revision:	https://reviews.freebsd.org/D55520
When the sector size changes, we assume it's new media. When the
mediasize changes, we'll just resize the disk (we get called for both
events). When neither have changed, don't call either.

Some NVMe drives (but not all) post a async event on page 4 with the
sector size changes via a FORMAT command. We'll notice the new media
right away, rather than the next device open. As a practical effect,
this just means that certain geom operations will see it sooner. Since
most drive interaction goes through open, that will catch those drives
that do not post this event well enough.

Sponsored by:		Netflix
Differential Revision:	https://reviews.freebsd.org/D55521
Fix the error message in nvme_sim_ns_removed that was cut and pasted
from nvme_sim_ns_changed to reflect its new home.  No functional change.

Sponsored by:		Netflix
Differential Revision:	https://reviews.freebsd.org/D55522
The error recovery is nicer if we can wait for the tiny memory we need
to send the messages when the physpath changes. Since we've moved the
async handler into a sleepable context, we can wait for the allocation
to complete since async events are rare enough and it's not an
indefinite wait.

Also add a comment about the scope of AC_ADVINFO_CHANGED for nvme
drives. We could use it for broadcasting INDENTIFY changes in nvme
drives. However, the underlying mechanisms in NVMe don't really allow
for that (they are more fine-grained). So for namespace changes, for
example, we'll send AC_GETDEV_CHANGED instead of a AC_ADVINFO_CHANGED.

Sponsored by:		Netflix
Differential Revision:	https://reviews.freebsd.org/D55523
The BUILD_BUG_ON_ZERO() macro returns an (int)0 if it does not fail
at build time. LinuxKPI sort() has it as a guard for an unsupported
argument but ignores the return value.

This leads to gcc complaining:

/usr/src/sys/compat/linuxkpi/common/include/linux/build_bug.h:60:33: error: statement with no effect [-Werror=unused-value]
   60 | #define BUILD_BUG_ON_ZERO(x)    ((int)sizeof(struct { int:-((x) != 0); }))
      |                                 ^
/usr/src/sys/compat/linuxkpi/common/include/linux/sort.h:37:9: note: in expansion of macro 'BUILD_BUG_ON_ZERO'
   37 |         BUILD_BUG_ON_ZERO(swap);                        \
      |         ^~~~~~~~~~~~~~~~~
/usr/src/sys/contrib/dev/rtw89/core.c:2575:9: note: in expansion of macro 'sort'
 2575 |         sort(drift, RTW89_BCN_TRACK_STAT_NR, sizeof(*drift), cmp_u16, NULL);

Change to BUILD_BUG_ON() for the statement version.

Reported by:	CI
Co-authored-by:	bz
Approved by:	emaste (mentor)
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D55634
Reviewed by: bapt
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D55462
INT_MAX is already larger than a reasonable hostname might be, but
size_t makes some of this easier to reason about as we do arithmetic
with it.  This would maybe not be worth it if we had to bump the
soversion because of it, but libutil does symbol versioning now so we
can provide a compat shim.

While we're here, fix some inconsistencies in argument names in the
manpage.

Reviewed by:	des
Obtained from:	https://github.com/apple-oss-distributions/libutil
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D54622
memchr(3) will happily believe we've passed in a valid object, but
hostsize could easily exceed the bounds of fullhost.  Clamp it down to
the string size to be safe and avoid UB.  This plugs a potential
overread noted in the compat shim that was just added.

Reviewed by:	des
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D54623
In acpi_spmc_get_constraints_spec(), the revision of the device
constraint detail package was mistakenly read from
constraint_obj->Package.Elements[0], which is the device name
(a string), instead of from the detail sub-package's first element.

Move the initialisation of 'detail' before the revision check and
read the revision from detail->Package.Elements[0] as the comment
already states

Approved by:	obiwac
Differential Revision:	https://reviews.freebsd.org/D55639
Sponsored by:	Netflix
MFC after:	1 week
Add explicit note that me(4) works as a point-to-point pseudo device.

MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
No functional changes.  Just moved the function within the file.

Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
Over time struct ieee80211_prep_tx_info has grown further fields.
One which is becoming mandatory is the subtype (of the mgmt frame).
iwlwifi(mld) has a WARN for it if it does not match, so we now have
to set this for proper operation.  In addition we are tyring to improve
the situation of setting/unsetting (prepare_tx/complete_tx) in various
states and cleanup the use of other fields but link_id which we now
leave as a marker for the future everywhere.
The general problem we are facing is that our hook surface in this case
is the state machine but likely would have to be tx/rx mgmt frames but
we would alos have to driver the TX queues from there which is tricky.
The long-term answer is to change net80211.

Further the hardware flag DEAUTH_NEED_MGD_TX_PREP is dead and was
removed again in favour of leting drivers deal with it.  iwlwifi(mvm)
likely being the only driver which ever used this.

Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
Various macros (dma_map_sg_attrs, dma_unmap_sg_attrs,
dma_map_single_attrs, and dma_unmap_single_attrs) currently supress
passing on the attrs argument.  Their implementation (even though at
times still marked the argument __unused; we remove that) have long
gained support for handling the argument.
With ofed fixed (5edf24aac1), pass the argument through so that
other drivers using these functions may hopefully work just a bit
better as well.

Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
Reviewed by:	kib
Differential Revision: https://reviews.freebsd.org/D55391
Reviewed by:    ziaee, mhorne
Approved by:    lwhsu (mentor)
MFC after:      2 weeks
Sponsored by:   The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D54466
Reported by:	bz
MFC after:	1 week
Sponsored by:	Netflix, Inc.
There is no need to call execl(), which will allocate an array and copy
our arguments into it, when we can use a static array and call execve()
directly.

MFC after:	1 week
Sponsored by:	Klara, Inc.
Reviewed by:	kevans
Differential Revision:	https://reviews.freebsd.org/D55648
amdsmu_suspend() and amdsmu_resume() for sending hints to the AMD SMU
power management firmware (PMFW) that we are entering and exiting
s2idle. We also dump sleep metrics once we tell it we're exiting sleep,
so the relevant metrics are updated.

Register these as acpi_post_dev_suspend and acpi_post_dev_resume
eventhandlers.

Reviewed by:	olce
Approved by:	olce
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D48721
Reviewed by:	olce
Approved by:	olce
Fixes:	4c4392e791 ("Add doxygen doc comments for most of newbus and the BUS interface.")
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D48721
% pfctl -F ethernet
Ethernet rules cleared

% pfctl -s ethernet
pfctl: Unknown show modifier 'ethernet'

pfctl accepts 'ethernet' (or any prefix of it) in the -F flag but
accepts only 'ether' (or any prefix of it) in the -s flag, which seems
inconsistent.  This change brings the two to parity while remaining
backwards compatible.

Reviewed by:	kp
MFC after:	2 weeks
Signed-off-by: Seth Hoffert <seth.hoffert@gmail.com>
The PMUVer field of ID_AA64DFR0 contains an unsigned version of the
Performance Monitors Extension, but it is currently treated as signed.
Change it to unsigned.

Reviewed by:	andrew
Sponsored by:	Arm Ltd
Signed-off-by:	Kajetan Puchalski <kajetan.puchalski@arm.com>
Pull Request:	https://github.com/freebsd/freebsd-src/pull/2062
The only new register is read-only. As the kernel just passes the
registers to the guest directly no further change should be needed.

Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D51764
Add check_iov_len() to check whether an iovec array covers a certain
length without the need to call count_iov() on the whole array first.

Garbage-collect the 'seek' argument to buf_to_iov(), used only by
virtio-scsi control request handling. The apparent benefit of using it
to copy only the final status byte instead of the whole TMF or AN
request (25 and 21 bytes, respectively) is dubious at best, given that
the extra code to handle this in buf_to_iov() allocates and frees a new
iovec array and uses seek_iov(), which traverses the whole array and
copies iovecs around.

Replace seek_iov() and truncate_iov(), used only by virtio-scsi, with
a single function split_iov() which combines the functionality of both
in a more efficient way:
While seek_iov() always copies all iovecs past the seek offset into a
new iovec array, split_iov() works in place and doesn't copy iovecs
unless actually necessary. By using split_iov(), we can avoid almost
all copying of iovecs in I/O handling code paths of virtio-scsi.

Reviewed by:	markj
Differential Revision: https://reviews.freebsd.org/D53468
By preallocating all I/O requests on all queues, we can take most
allocations out of the hot I/O code paths and simplify the code
significantly. While here, make sure we check all allocations for
success and make sure to handle failures gracefully.

Additionally, check for I/O request validity as early as possible,
and return illegal requests immediately.

Reviewed by:	markj
Differential Revision: https://reviews.freebsd.org/D53469
Instead of blindly trusting the guest OS driver that it sends us well-
formed LUN addresses, check the LUN address for validity and fail the
request if it is invalid. While here, constify the members of the virtio
requests which aren't device-writable anyway.

Reviewed by:	markj
Differential Revision: https://reviews.freebsd.org/D53470
No functional change, but this is friendlier for CHERI.
This removes the need for several casts to pointer in callers.
Add explicit uintptr_t casts to the arguments to these macros so that
the work both with virtual addresses (e.g. vm_offset_t) and pointers.

Drop no-longer-needed casts in various invocations of DMAP_TO_PHYS.
Consistently use vm_paddr_t for the type returned from
moea64_bootstrap_alloc and avoid temporarily smuggling it via a
pointer.  Instead, be explicit in the places that assume a 1:1
mapping.
amd64/aarch64 pmap: Switch type of pmap_preinit_mapping.va to void *
Some checks failed
Cross-build Kernel / amd64 ubuntu-22.04 (clang-15) (pull_request) Has been cancelled
Cross-build Kernel / aarch64 ubuntu-22.04 (clang-15) (pull_request) Has been cancelled
Cross-build Kernel / amd64 ubuntu-24.04 (clang-18) (pull_request) Has been cancelled
Cross-build Kernel / aarch64 ubuntu-24.04 (clang-18) (pull_request) Has been cancelled
Cross-build Kernel / amd64 macos-latest (clang-18) (pull_request) Has been cancelled
Cross-build Kernel / aarch64 macos-latest (clang-18) (pull_request) Has been cancelled
Style Checker / Style Checker (pull_request) Has been cancelled
1c094efb69
Merge branch 'main' into reviews/jhb/pmap_pointer
Some checks failed
Checklist / commit (pull_request_target) Has been cancelled
Cross-build Kernel / amd64 ubuntu-22.04 (clang-15) (pull_request) Has been cancelled
Cross-build Kernel / aarch64 ubuntu-22.04 (clang-15) (pull_request) Has been cancelled
Cross-build Kernel / amd64 ubuntu-24.04 (clang-18) (pull_request) Has been cancelled
Cross-build Kernel / aarch64 ubuntu-24.04 (clang-18) (pull_request) Has been cancelled
Cross-build Kernel / amd64 macos-latest (clang-18) (pull_request) Has been cancelled
Cross-build Kernel / aarch64 macos-latest (clang-18) (pull_request) Has been cancelled
Style Checker / Style Checker (pull_request) Has been cancelled
a820cf1af5
Some checks failed
Checklist / commit (pull_request_target) Has been cancelled
Cross-build Kernel / amd64 ubuntu-22.04 (clang-15) (pull_request) Has been cancelled
Cross-build Kernel / aarch64 ubuntu-22.04 (clang-15) (pull_request) Has been cancelled
Cross-build Kernel / amd64 ubuntu-24.04 (clang-18) (pull_request) Has been cancelled
Cross-build Kernel / aarch64 ubuntu-24.04 (clang-18) (pull_request) Has been cancelled
Cross-build Kernel / amd64 macos-latest (clang-18) (pull_request) Has been cancelled
Cross-build Kernel / aarch64 macos-latest (clang-18) (pull_request) Has been cancelled
Style Checker / Style Checker (pull_request) Has been cancelled
This pull request can be merged automatically.
This branch is out-of-date with the base branch
You are not authorized to merge this pull request.
View command line instructions

Checkout

From your project repository, check out a new branch and test the changes.
git fetch -u origin reviews/jhb/pmap_pointer:reviews/jhb/pmap_pointer
git switch reviews/jhb/pmap_pointer
Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
FreeBSD/FreeBSD-src!2
No description provided.