| Age | Commit message (Collapse) | Author |
|
|
|
|
|
|
|
|
|
|
|
Wire up the user interrupt receiver and sender related syscalls for
x86_64.
For rest of the architectures the syscalls are not implemented.
<TODO: Reserve the syscall numbers for other architectures>
Signed-off-by: Sohil Mehta <[email protected]>
|
|
Add a new system call to allow applications to block in the kernel and
wait for user interrupts.
<The current implementation doesn't support waking up from other
blocking system calls like sleep(), read(), epoll(), etc.
uintr_wait() is a placeholder syscall while we decide on that
behaviour.>
When the application makes this syscall the notification vector is
switched to a new kernel vector. Any new SENDUIPI will invoke the kernel
interrupt which is then used to wake up the process.
Currently, the task wait list is global one. To make the implementation
scalable there is a need to move to a distributed per-cpu wait list.
Signed-off-by: Sohil Mehta <[email protected]>
|
|
Add a registration syscall for a task to register itself as a user
interrupt sender using the uintr_fd generated by the receiver. A task
can register multiple uintr_fds. Each unique successful connection
creates a new entry in the User Interrupt Target Table (UITT).
Each entry in the UITT table is referred by the UITT index (uipi_index).
The uipi_index returned during the registration syscall lets a sender
generate a user IPI using the 'SENDUIPI <uipi_index>' instruction.
Also, add a sender unregister syscall to unregister a particular task
from the uintr_fd. Calling close on the uintr_fd will disconnect all
threads in a sender process from that FD.
Currently, the UITT size is arbitrarily chosen as 256 entries
corresponding to a 4KB page. Based on feedback and usage data this can
either be increased/decreased or made dynamic later.
Architecturally, the UITT table can be unique for each thread or shared
across threads of the same thread group. The current implementation
keeps the UITT as unique for the each thread. This makes the kernel
implementation relatively simple and only threads that use uintr get
setup with the related structures. However, this means that the
uipi_index for each thread would be inconsistent wrt to other threads.
(Executing 'SENDUIPI 2' on threads of the same process could generate
different user interrupts.)
Alternatively, the benefit of sharing the UITT table is that all threads
would see the same view of the UITT table. Also the kernel UITT memory
allocation would be more efficient if multiple threads connect to the
same uintr_fd. However, this would mean the kernel needs to keep the
UITT table size MISC_MSR[] in sync across these threads. Also the
UPID/UITT teardown flows might need additional consideration.
Signed-off-by: Sohil Mehta <[email protected]>
|
|
Each receiving task has its own interrupt vector space of 64 vectors.
For each vector registered by a task create a uintr_fd. Only tasks that
have previously registered a user interrupt handler can register a
vector.
The sender for the user interrupt could be another userspace
application, kernel or an external source (like a device). Any sender
that wants to generate a user interrupt needs access to receiver's
vector number and UPID. uintr_fd abstracts that information and allows
a sender with access to uintr_fd to connect and generate a user
interrupt. Upon interrupt delivery, the interrupt handler would be
invoked with the associated vector number pushed onto the stack.
Using an FD abstraction automatically provides a secure mechanism to
connect with a receiver. It also makes the tracking and management of
the interrupt vector resource easier for userspace.
uintr_fd can be useful in some of the usages where eventfd is used for
userspace event notifications. Though uintr_fd is nowhere close to a
drop-in replacement, the semantics are meant to be somewhat similar to
an eventfd or the write end of a pipe.
Access to uintr_fd can be achieved in the following ways:
- direct access if the task is part of the same thread group (process)
- inherited by a child process.
- explicitly shared using any of the FD sharing mechanisms.
If the sender is another userspace task, it can use the uintr_fd to send
user IPIs to the receiver. This works in conjunction with the SENDUIPI
instruction. The details related to this are covered later.
The exact APIs for the sender being a kernel or another external source
are still being worked upon. The general idea is that the receiver would
pass the uintr_fd to the kernel by extending some existing API (like
io_uring).
The vector associated with uintr_fd can be unregistered by closing all
references to the uintr_fd.
Signed-off-by: Sohil Mehta <[email protected]>
|
|
The user interrupt MSRs and the user interrupt state is task specific.
During task fork and exit clear the task state, clear the MSRs and
dereference the shared resources.
Some of the memory resources like the UPID are referenced in the file
descriptor and could be in use while the uintr_fd is still valid.
Instead of freeing up the UPID just dereference it. Eventually when
every user releases the reference the memory resource will be freed up.
Signed-off-by: Jacob Pan <[email protected]>
Signed-off-by: Sohil Mehta <[email protected]>
|
|
User interrupt state is saved and restored using xstate supervisor
feature support. This includes the MSR state and the User Interrupt Flag
(UIF) value.
During context switch update the UPID for a uintr task to reflect the
current state of the task; namely whether the task should receive
interrupt notifications and which cpu the task is currently running on.
XSAVES clears the notification vector (UINV) in the MISC MSR to prevent
interrupts from being recognized in the UIRR MSR while the task is being
context switched. The UINV is restored back when the kernel does an
XRSTORS.
However, this conflicts with the kernel's lazy restore optimization
which skips an XRSTORS if the kernel is scheduling the same user task
back and the underlying MSR state hasn't been modified. Special handling
is needed for a uintr task in the context switch path to keep using this
optimization.
Signed-off-by: Jacob Pan <[email protected]>
Signed-off-by: Sohil Mehta <[email protected]>
|
|
Any application that wants to receive a user interrupt needs to register
an interrupt handler with the kernel. Add a registration syscall that
sets up the interrupt handler and the related kernel structures for
the task that makes this syscall.
Only one interrupt handler per task can be registered with the
kernel/hardware. Each task has its private interrupt vector space of 64
vectors. The vector registration and the related FD management is
covered later.
Also add an unregister syscall to let a task unregister the interrupt
handler.
The UPID for each receiver task needs to be updated whenever a task gets
context switched or it moves from one cpu to another. This will also be
covered later. The system calls haven't been wired up yet so no real
harm is done if we don't update the UPID right now.
<Code typically in the x86/kernel directory doesn't deal with file
descriptor management. I have kept uintr_fd.c separate to make it easier
to move it somewhere else if needed.>
Signed-off-by: Jacob Pan <[email protected]>
Signed-off-by: Sohil Mehta <[email protected]>
|
|
A user interrupt notification vector is used on the receiver's cpu to
identify an interrupt as a user interrupt (and not a kernel interrupt).
Hardware uses the same notification vector to generate an IPI from a
sender's cpu core when the SENDUIPI instruction is executed.
Typically, the kernel shouldn't receive an interrupt with this vector.
However, it is possible that the kernel might receive this vector.
Scenario that can cause the spurious interrupt:
Step cpu 0 (receiver task) cpu 1 (sender task)
---- --------------------- -------------------
1 task is running
2 executes SENDUIPI
3 IPI sent
4 context switched out
5 IPI delivered
(kernel interrupt detected)
A kernel interrupt can be detected, if a receiver task gets scheduled
out after the SENDUIPI-based IPI was sent but before the IPI was
delivered.
The kernel doesn't need to do anything in this case other than receiving
the interrupt and clearing the local APIC. The user interrupt is always
stored in the receiver's UPID before the IPI is generated. When the
receiver gets scheduled back the interrupt would be delivered based on
its UPID.
Signed-off-by: Jacob Pan <[email protected]>
Signed-off-by: Sohil Mehta <[email protected]>
|
|
Enable xstate supervisor support for User Interrupts by default.
The user interrupt state for a task consists of the MSR state and the
User Interrupt Flag (UIF) value. XSAVES and XRSTORS handle saving and
restoring both of these states.
<The supervisor XSTATE code might be reworked based on issues reported
in the past. The Uintr context switching code would also need rework and
additional testing in that regard.>
Signed-off-by: Sohil Mehta <[email protected]>
|
|
User Interrupts support including user IPIs is enumerated through cpuid.
The 'uintr' flag in /proc/cpuinfo can be used to identify it. The
recommended mechanism for user applications to detect support is calling
the uintr related syscalls.
Use CONFIG_X86_USER_INTERRUPTS to compile with User Interrupts support.
The feature can be disabled at boot time using the 'nouintr' kernel
parameter.
SENDUIPI is a special ring-3 instruction that makes a supervisor mode
memory access to the UPID and UITT memory. Currently, KPTI needs to be
off for User IPIs to work. Processors that support user interrupts are
not affected by Meltdown so the auto mode of KPTI will default to off.
Users who want to force enable KPTI will need to wait for a later
version of this patch series that is compatible with KPTI. We need to
allocate the UPID and UITT structures from a special memory region that
has supervisor access but it is mapped into userspace. The plan is to
implement a mechanism similar to LDT.
Signed-off-by: Jacob Pan <[email protected]>
Signed-off-by: Sohil Mehta <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull CPU hotplug updates from Thomas Gleixner:
"Updates for the SMP and CPU hotplug:
- Remove DEFINE_SMP_CALL_CACHE_FUNCTION() which is a left over of the
original hotplug code and now causing trouble with the ARM64 cache
topology setup due to the pointless SMP function call.
It's not longer required as the hotplug callbacks are guaranteed to
be invoked on the upcoming CPU.
- Remove the deprecated and now unused CPU hotplug functions
- Rewrite the CPU hotplug API documentation"
* tag 'smp-urgent-2021-09-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
Documentation: core-api/cpuhotplug: Rewrite the API section
cpu/hotplug: Remove deprecated CPU-hotplug functions.
thermal: Replace deprecated CPU-hotplug functions.
drivers: base: cacheinfo: Get rid of DEFINE_SMP_CALL_CACHE_FUNCTION()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull more RISC-V updates from Palmer Dabbelt:
- A pair of defconfig additions, for NVMe and the EFI filesystem
localization options.
- A larger address space for stack randomization.
- A cleanup to our install rules.
- A DTS update for the Microchip Icicle board, to fix the serial
console.
- Support for build-time table sorting, which allows us to have
__ex_table read-only.
* tag 'riscv-for-linus-5.15-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: Move EXCEPTION_TABLE to RO_DATA segment
riscv: Enable BUILDTIME_TABLE_SORT
riscv: dts: microchip: mpfs-icicle: Fix serial console
riscv: move the (z)install rules to arch/riscv/Makefile
riscv: Improve stack randomisation on RV64
riscv: defconfig: enable NLS_CODEPAGE_437, NLS_ISO8859_1
riscv: defconfig: enable BLK_DEV_NVME
|
|
_ex_table section is read-only, so move it to RO_DATA.
Signed-off-by: Jisheng Zhang <[email protected]>
Signed-off-by: Palmer Dabbelt <[email protected]>
|
|
Enable BUILDTIME_TABLE_SORT to sort the exception table at build time
rather than during boot.
Signed-off-by: Jisheng Zhang <[email protected]>
Signed-off-by: Palmer Dabbelt <[email protected]>
|
|
Currently, nothing is output on the serial console, unless
"console=ttyS0,115200n8" or "earlycon" are appended to the kernel
command line. Enable automatic console selection using
chosen/stdout-path by adding a proper alias, and configure the expected
serial rate.
While at it, add aliases for the other three serial ports, which are
provided on the same micro-USB connector as the first one.
Fixes: 0fa6107eca4186ad ("RISC-V: Initial DTS for Microchip ICICLE board")
Signed-off-by: Geert Uytterhoeven <[email protected]>
Reviewed-by: Bin Meng <[email protected]>
Reviewed-by: Conor Dooley <[email protected]>
Signed-off-by: Palmer Dabbelt <[email protected]>
|
|
Currently, the (z)install targets in arch/riscv/Makefile descend into
arch/riscv/boot/Makefile to invoke the shell script, but there is no
good reason to do so.
arch/riscv/Makefile can run the shell script directly.
Signed-off-by: Masahiro Yamada <[email protected]>
Signed-off-by: Palmer Dabbelt <[email protected]>
|
|
This enlarges the bits availiable for stack randomisation on RV64 from
the default of 8MiB to 1GiB, to match arm64 and x86.
Also, update the documentation to reflect our support for stack
randomisation.
Signed-off-by: Kefeng Wang <[email protected]>
[Palmer: commit text]
Signed-off-by: Palmer Dabbelt <[email protected]>
|
|
The EFI system partition uses the FAT file system. Many distributions add
an entry in /etc/fstab for the ESP. We must ensure that mounting does not
fail.
The default code page for FAT is 437 (cf. CONFIG_FAT_DEFAULT_CODEPAGE).
The default IO character set is "iso8859-1" (cf. CONFIG_NLS_ISO8859_1).
So let's enable NLS_CODEPAGE_437 and NLS_ISO8859_1 in defconfig.
Signed-off-by: Heinrich Schuchardt <[email protected]>
Signed-off-by: Palmer Dabbelt <[email protected]>
|
|
NVMe is a non-volatile storage media attached via PCIe.
As NVMe has much higher throughput than other block devices like
SATA it is a must have for RISC-V. Enable CONFIG_BLK_DEV_NVME.
The HiFive Unmatched is a board providing M.2 slots for NVMe drives.
Enable CONFIG_PCIE_FU740.
Signed-off-by: Heinrich Schuchardt <[email protected]>
Reviewed-by: Anup Patel <[email protected]>
Signed-off-by: Palmer Dabbelt <[email protected]>
|
|
Ensure that all usage sites of get/put_online_cpus() except for the
struggler in drivers/thermal are gone. So the last user and the deprecated
inlines can be removed.
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
- Limit the linear region to 51-bit when KVM is running in nVHE mode.
Otherwise, depending on the placement of the ID map, kernel-VA to
hyp-VA translations may produce addresses that either conflict with
other HYP mappings or generate addresses outside of the 52-bit
addressable range.
- Instruct kmemleak not to scan the memory reserved for kdump as this
range is removed from the kernel linear map and therefore not
accessible.
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: kdump: Skip kmemleak scan reserved memory for kdump
arm64: mm: limit linear region to 51 bits for KVM in nVHE mode
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc fixes from Helge Deller:
- Build warning fixes in Makefile and Dino PCI driver
- Fix when sched_clock is marked unstable
- Drop strnlen_user() in favour of generic version
- Prevent kernel to write outside userspace signal stack
- Remove CONFIG_SET_FS including KERNEL_DS and USER_DS from parisc and
switch to __get/put_kernel_nofault()
* tag 'for-5.15/parisc-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Implement __get/put_kernel_nofault()
parisc: Mark sched_clock unstable only if clocks are not syncronized
parisc: Move pci_dev_is_behind_card_dino to where it is used
parisc: Reduce sigreturn trampoline to 3 instructions
parisc: Check user signal stack trampoline is inside TASK_SIZE
parisc: Drop useless debug info and comments from signal.c
parisc: Drop strnlen_user() in favour of generic version
parisc: Add missing FORCE prerequisite in Makefile
|
|
Trying to boot with kdump + kmemleak, command will result in a crash:
"echo scan > /sys/kernel/debug/kmemleak"
crashkernel reserved: 0x0000000007c00000 - 0x0000000027c00000 (512 MB)
Kernel command line: BOOT_IMAGE=(hd1,gpt2)/vmlinuz-5.14.0-rc5-next-20210809+ root=/dev/mapper/ao-root ro rd.lvm.lv=ao/root rd.lvm.lv=ao/swap crashkernel=512M
Unable to handle kernel paging request at virtual address ffff000007c00000
Mem abort info:
ESR = 0x96000007
EC = 0x25: DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
FSC = 0x07: level 3 translation fault
Data abort info:
ISV = 0, ISS = 0x00000007
CM = 0, WnR = 0
swapper pgtable: 64k pages, 48-bit VAs, pgdp=00002024f0d80000
[ffff000007c00000] pgd=1800205ffffd0003, p4d=1800205ffffd0003, pud=1800205ffffd0003, pmd=1800205ffffc0003, pte=0068000007c00f06
Internal error: Oops: 96000007 [#1] SMP
pstate: 804000c9 (Nzcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : scan_block+0x98/0x230
lr : scan_block+0x94/0x230
sp : ffff80008d6cfb70
x29: ffff80008d6cfb70 x28: 0000000000000000 x27: 0000000000000000
x26: 00000000000000c0 x25: 0000000000000001 x24: 0000000000000000
x23: ffffa88a6b18b398 x22: ffff000007c00ff9 x21: ffffa88a6ac7fc40
x20: ffffa88a6af6a830 x19: ffff000007c00000 x18: 0000000000000000
x17: 0000000000000000 x16: 0000000000000000 x15: ffffffffffffffff
x14: ffffffff00000000 x13: ffffffffffffffff x12: 0000000000000020
x11: 0000000000000000 x10: 0000000001080000 x9 : ffffa88a6951c77c
x8 : ffffa88a6a893988 x7 : ffff203ff6cfb3c0 x6 : ffffa88a6a52b3c0
x5 : ffff203ff6cfb3c0 x4 : 0000000000000000 x3 : 0000000000000000
x2 : 0000000000000001 x1 : ffff20226cb56a40 x0 : 0000000000000000
Call trace:
scan_block+0x98/0x230
scan_gray_list+0x120/0x270
kmemleak_scan+0x3a0/0x648
kmemleak_write+0x3ac/0x4c8
full_proxy_write+0x6c/0xa0
vfs_write+0xc8/0x2b8
ksys_write+0x70/0xf8
__arm64_sys_write+0x24/0x30
invoke_syscall+0x4c/0x110
el0_svc_common+0x9c/0x190
do_el0_svc+0x30/0x98
el0_svc+0x28/0xd8
el0t_64_sync_handler+0x90/0xb8
el0t_64_sync+0x180/0x184
The reserved memory for kdump will be looked up by kmemleak, this area
will be set invalid when kdump service is bring up. That will result in
crash when kmemleak scan this area.
Fixes: a7259df76702 ("memblock: make memblock_find_in_range method private")
Signed-off-by: Chen Wandun <[email protected]>
Reviewed-by: Kefeng Wang <[email protected]>
Reviewed-by: Mike Rapoport <[email protected]>
Reviewed-by: Catalin Marinas <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Catalin Marinas <[email protected]>
|
|
Remove CONFIG_SET_FS from parisc, so we need to add
__get_kernel_nofault() and __put_kernel_nofault(), define
HAVE_GET_KERNEL_NOFAULT and remove set_fs(), get_fs(), load_sr2(),
thread_info->addr_limit, KERNEL_DS and USER_DS.
The nice side-effect of this patch is that we now can directly access
userspace via sr3 without the need to use a temporary sr2 which is
either copied from sr3 or set to zero (for kernel space).
Signed-off-by: Helge Deller <[email protected]>
Suggested-by: Arnd Bergmann <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml
Pull UML updates from Richard Weinberger:
- Support for VMAP_STACK
- Support for splice_write in hostfs
- Fixes for virt-pci
- Fixes for virtio_uml
- Various fixes
* tag 'for-linus-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
um: fix stub location calculation
um: virt-pci: fix uapi documentation
um: enable VMAP_STACK
um: virt-pci: don't do DMA from stack
hostfs: support splice_write
um: virtio_uml: fix memory leak on init failures
um: virtio_uml: include linux/virtio-uml.h
lib/logic_iomem: fix sparse warnings
um: make PCI emulation driver init/exit static
|
|
Pull ARM development updates from Russell King:
- Rename "mod_init" and "mod_exit" so that initcall debug output is
actually useful (Randy Dunlap)
- Update maintainers entries for linux-arm-kernel to indicate it is
moderated for non-subscribers (Randy Dunlap)
- Move install rules to arch/arm/Makefile (Masahiro Yamada)
- Drop unnecessary ARCH_NR_GPIOS definition (Linus Walleij)
- Don't warn about atags_to_fdt() stack size (David Heidelberg)
- Speed up unaligned copy_{from,to}_kernel_nofault (Arnd Bergmann)
- Get rid of set_fs() usage (Arnd Bergmann)
- Remove checks for GCC prior to v4.6 (Geert Uytterhoeven)
* tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: 9118/1: div64: Remove always-true __div64_const32_is_OK() duplicate
ARM: 9117/1: asm-generic: div64: Remove always-true __div64_const32_is_OK()
ARM: 9116/1: unified: Remove check for gcc < 4
ARM: 9110/1: oabi-compat: fix oabi epoll sparse warning
ARM: 9113/1: uaccess: remove set_fs() implementation
ARM: 9112/1: uaccess: add __{get,put}_kernel_nofault
ARM: 9111/1: oabi-compat: rework fcntl64() emulation
ARM: 9114/1: oabi-compat: rework sys_semtimedop emulation
ARM: 9108/1: oabi-compat: rework epoll_wait/epoll_pwait emulation
ARM: 9107/1: syscall: always store thread_info->abi_syscall
ARM: 9109/1: oabi-compat: add epoll_pwait handler
ARM: 9106/1: traps: use get_kernel_nofault instead of set_fs()
ARM: 9115/1: mm/maccess: fix unaligned copy_{from,to}_kernel_nofault
ARM: 9105/1: atags_to_fdt: don't warn about stack size
ARM: 9103/1: Drop ARCH_NR_GPIOS definition
ARM: 9102/1: move theinstall rules to arch/arm/Makefile
ARM: 9100/1: MAINTAINERS: mark all linux-arm-kernel@infradead list as moderated
ARM: 9099/1: crypto: rename 'mod_init' & 'mod_exit' functions to be module-specific
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull more s390 updates from Heiko Carstens:
"Except for the xpram device driver removal it is all about fixes and
cleanups.
- Fix topology update on cpu hotplug, so notifiers see expected
masks. This bug was uncovered with SCHED_CORE support.
- Fix stack unwinding so that the correct number of entries are
omitted like expected by common code. This fixes KCSAN selftests.
- Add kmemleak annotation to stack_alloc to avoid false positive
kmemleak warnings.
- Avoid layering violation in common I/O code and don't unregister
subchannel from child-drivers.
- Remove xpram device driver for which no real use case exists since
the kernel is 64 bit only. Also all hypervisors got required
support removed in the meantime, which means the xpram device
driver is dead code.
- Fix -ENODEV handling of clp_get_state in our PCI code.
- Enable KFENCE in debug defconfig.
- Cleanup hugetlbfs s390 specific Kconfig dependency.
- Quite a lot of trivial fixes to get rid of "W=1" warnings, and and
other simple cleanups"
* tag 's390-5.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
hugetlbfs: s390 is always 64bit
s390/ftrace: remove incorrect __va usage
s390/zcrypt: remove incorrect kernel doc indicators
scsi: zfcp: fix kernel doc comments
s390/sclp: add __nonstring annotation
s390/hmcdrv_ftp: fix kernel doc comment
s390: remove xpram device driver
s390/pci: read clp_list_pci_req only once
s390/pci: fix clp_get_state() handling of -ENODEV
s390/cio: fix kernel doc comment
s390/ctrlchar: fix kernel doc comment
s390/con3270: use proper type for tasklet function
s390/cpum_cf: move array from header to C file
s390/mm: fix kernel doc comments
s390/topology: fix topology information when calling cpu hotplug notifiers
s390/unwind: use current_frame_address() to unwind current task
s390/configs: enable CONFIG_KFENCE in debug_defconfig
s390/entry: make oklabel within CHKSTG macro local
s390: add kmemleak annotation in stack_alloc()
s390/cio: dont unregister subchannel from child-drivers
|
|
KVM in nVHE mode divides up its VA space into two equal halves, and
picks the half that does not conflict with the HYP ID map to map its
linear region. This worked fine when the kernel's linear map itself was
guaranteed to cover precisely as many bits of VA space, but this was
changed by commit f4693c2716b35d08 ("arm64: mm: extend linear region for
52-bit VA configurations").
The result is that, depending on the placement of the ID map, kernel-VA
to hyp-VA translations may produce addresses that either conflict with
other HYP mappings (including the ID map itself) or generate addresses
outside of the 52-bit addressable range, neither of which is likely to
lead to anything useful.
Given that 52-bit capable cores are guaranteed to implement VHE, this
only affects configurations such as pKVM where we opt into non-VHE mode
even if the hardware is VHE capable. So just for these configurations,
let's limit the kernel linear map to 51 bits and work around the
problem.
Fixes: f4693c2716b3 ("arm64: mm: extend linear region for 52-bit VA configurations")
Signed-off-by: Ard Biesheuvel <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Catalin Marinas <[email protected]>
|
|
We check at runtime if the cr16 clocks are stable across CPUs. Only mark
the sched_clock unstable by calling clear_sched_clock_stable() if we
know that we run on a system which isn't syncronized across CPUs.
Signed-off-by: Helge Deller <[email protected]>
|
|
We can move the INSN_LDI_R20 instruction into the branch delay slot.
Signed-off-by: Helge Deller <[email protected]>
|
|
Add some additional checks to ensure the signal stack is inside
userspace bounds.
Signed-off-by: Helge Deller <[email protected]>
|
|
Signed-off-by: Helge Deller <[email protected]>
|
|
As suggested by Arnd Bergmann, drop the parisc version of
strnlen_user() and switch to the generic version.
Suggested-by: Arnd Bergmann <[email protected]>
Acked-by: Arnd Bergmann <[email protected]>
Signed-off-by: Helge Deller <[email protected]>
|
|
Signed-off-by: Helge Deller <[email protected]>
|
|
Merge yet more updates and hotfixes from Andrew Morton:
"Post-linux-next material, based upon latest upstream to catch the
now-merged dependencies:
- 10 patches.
Subsystems affected by this patch series: mm (vmstat and migration)
and compat.
And bunch of hotfixes, mostly cc:stable:
- 8 patches.
Subsystems affected by this patch series: mm (hmm, hugetlb, vmscan,
pagealloc, pagemap, kmemleak, mempolicy, and memblock)"
* emailed patches from Andrew Morton <[email protected]>:
arch: remove compat_alloc_user_space
compat: remove some compat entry points
mm: simplify compat numa syscalls
mm: simplify compat_sys_move_pages
kexec: avoid compat_alloc_user_space
kexec: move locking into do_kexec_load
mm: migrate: change to use bool type for 'page_was_mapped'
mm: migrate: fix the incorrect function name in comments
mm: migrate: introduce a local variable to get the number of pages
mm/vmstat: protect per cpu variables with preempt disable on RT
* emailed hotfixes from Andrew Morton <[email protected]>:
nds32/setup: remove unused memblock_region variable in setup_memory()
mm/mempolicy: fix a race between offset_il_node and mpol_rebind_task
mm/kmemleak: allow __GFP_NOLOCKDEP passed to kmemleak's gfp
mmap_lock: change trace and locking order
mm/page_alloc.c: avoid accessing uninitialized pcp page migratetype
mm,vmscan: fix divide by zero in get_scan_count
mm/hugetlb: initialize hugetlb_usage in mm_init
mm/hmm: bypass devmap pte when all pfn requested flags are fulfilled
|
|
kernel test robot reports unused variable warning:
arch/nds32/kernel/setup.c:247:26: warning: Unused variable: region
[unusedVariable]
struct memblock_region *region;
^
Remove the unused variable.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Mike Rapoport <[email protected]>
Reported-by: kernel test robot <[email protected]>
Reviewed-by: Guenter Roeck <[email protected]>
Tested-by: Guenter Roeck <[email protected]>
Cc: Greentime Hu <[email protected]>
Cc: Nick Hu <[email protected]>
Cc: Vincent Chen <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull more power management updates from Rafael Wysocki:
"These are mostly ARM cpufreq driver updates, including one new
MediaTek driver that has just passed all of the reviews, with the
addition of a revert of a recent intel_pstate commit, some core
cpufreq changes and a DT-related update of the operating performance
points (OPP) support code.
Specifics:
- Add new cpufreq driver for the MediaTek MT6779 platform called
mediatek-hw along with corresponding DT bindings (Hector.Yuan).
- Add DCVS interrupt support to the qcom-cpufreq-hw driver (Thara
Gopinath).
- Make the qcom-cpufreq-hw driver set the dvfs_possible_from_any_cpu
policy flag (Taniya Das).
- Blocklist more Qualcomm platforms in cpufreq-dt-platdev (Bjorn
Andersson).
- Make the vexpress cpufreq driver set the CPUFREQ_IS_COOLING_DEV
flag (Viresh Kumar).
- Add new cpufreq driver callback to allow drivers to register with
the Energy Model in a consistent way and make several drivers use
it (Viresh Kumar).
- Change the remaining users of the .ready() cpufreq driver callback
to move the code from it elsewhere and drop it from the cpufreq
core (Viresh Kumar).
- Revert recent intel_pstate change adding HWP guaranteed performance
change notification support to it that led to problems, because the
notification in question is triggered prematurely on some systems
(Rafael Wysocki).
- Convert the OPP DT bindings to DT schema and clean them up while at
it (Rob Herring)"
* tag 'pm-5.15-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (23 commits)
Revert "cpufreq: intel_pstate: Process HWP Guaranteed change notification"
cpufreq: mediatek-hw: Add support for CPUFREQ HW
cpufreq: Add of_perf_domain_get_sharing_cpumask
dt-bindings: cpufreq: add bindings for MediaTek cpufreq HW
cpufreq: Remove ready() callback
cpufreq: sh: Remove sh_cpufreq_cpu_ready()
cpufreq: acpi: Remove acpi_cpufreq_cpu_ready()
cpufreq: qcom-hw: Set dvfs_possible_from_any_cpu cpufreq driver flag
cpufreq: blocklist more Qualcomm platforms in cpufreq-dt-platdev
cpufreq: qcom-cpufreq-hw: Add dcvs interrupt support
cpufreq: scmi: Use .register_em() to register with energy model
cpufreq: vexpress: Use .register_em() to register with energy model
cpufreq: scpi: Use .register_em() to register with energy model
dt-bindings: opp: Convert to DT schema
dt-bindings: Clean-up OPP binding node names in examples
ARM: dts: omap: Drop references to opp.txt
cpufreq: qcom-cpufreq-hw: Use .register_em() to register with energy model
cpufreq: omap: Use .register_em() to register with energy model
cpufreq: mediatek: Use .register_em() to register with energy model
cpufreq: imx6q: Use .register_em() to register with energy model
...
|
|
Pull microblaze update from Michal Simek:
- Kbuild clean up
* tag 'microblaze-v5.15' of git://git.monstr.eu/linux-2.6-microblaze:
microblaze: move core-y in arch/microblaze/Makefile to arch/microblaze/Kbuild
|
|
All users of compat_alloc_user_space() and copy_in_user() have been
removed from the kernel, only a few functions in sparc remain that can be
changed to calling arch_copy_in_user() instead.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnd Bergmann <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Eric Biederman <[email protected]>
Cc: Feng Tang <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "James E.J. Bottomley" <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
These are all handled correctly when calling the native system call entry
point, so remove the special cases.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnd Bergmann <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Eric Biederman <[email protected]>
Cc: Feng Tang <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "James E.J. Bottomley" <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Merge more updates from Andrew Morton:
"147 patches, based on 7d2a07b769330c34b4deabeed939325c77a7ec2f.
Subsystems affected by this patch series: mm (memory-hotplug, rmap,
ioremap, highmem, cleanups, secretmem, kfence, damon, and vmscan),
alpha, percpu, procfs, misc, core-kernel, MAINTAINERS, lib,
checkpatch, epoll, init, nilfs2, coredump, fork, pids, criu, kconfig,
selftests, ipc, and scripts"
* emailed patches from Andrew Morton <[email protected]>: (94 commits)
scripts: check_extable: fix typo in user error message
mm/workingset: correct kernel-doc notations
ipc: replace costly bailout check in sysvipc_find_ipc()
selftests/memfd: remove unused variable
Kconfig.debug: drop selecting non-existing HARDLOCKUP_DETECTOR_ARCH
configs: remove the obsolete CONFIG_INPUT_POLLDEV
prctl: allow to setup brk for et_dyn executables
pid: cleanup the stale comment mentioning pidmap_init().
kernel/fork.c: unexport get_{mm,task}_exe_file
coredump: fix memleak in dump_vma_snapshot()
fs/coredump.c: log if a core dump is aborted due to changed file permissions
nilfs2: use refcount_dec_and_lock() to fix potential UAF
nilfs2: fix memory leak in nilfs_sysfs_delete_snapshot_group
nilfs2: fix memory leak in nilfs_sysfs_create_snapshot_group
nilfs2: fix memory leak in nilfs_sysfs_delete_##name##_group
nilfs2: fix memory leak in nilfs_sysfs_create_##name##_group
nilfs2: fix NULL pointer in nilfs_##name##_attr_release
nilfs2: fix memory leak in nilfs_sysfs_create_device_group
trap: cleanup trap_init()
init: move usermodehelper_enable() to populate_rootfs()
...
|
|
This CONFIG option was removed in commit 278b13ce3a89 ("Input: remove
input_polled_dev implementation") so there's no point to keep it in
defconfigs any longer.
Get rid of the leftover for all arches.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Zenghui Yu <[email protected]>
Cc: Dmitry Torokhov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
There are some empty trap_init() definitions in different ARCHs, Introduce
a new weak trap_init() function to clean them up.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kefeng Wang <[email protected]>
Acked-by: Russell King (Oracle) <[email protected]> [arm32]
Acked-by: Vineet Gupta [arc]
Acked-by: Michael Ellerman <[email protected]> [powerpc]
Cc: Yoshinori Sato <[email protected]>
Cc: Ley Foon Tan <[email protected]>
Cc: Jonas Bonn <[email protected]>
Cc: Stefan Kristiansson <[email protected]>
Cc: Stafford Horne <[email protected]>
Cc: James E.J. Bottomley <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Jeff Dike <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Anton Ivanov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Threre is a spelling mistake in the Kconfig text. Fix it.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Colin Ian King <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Fix all kernel-doc warnings in arch/alpha/kernel/pci-sysfs.c:
arch/alpha/kernel/pci-sysfs.c:67: warning: No description found for return value of 'pci_mmap_resource'
arch/alpha/kernel/pci-sysfs.c:115: warning: Function parameter or member 'pdev' not described in 'pci_remove_resource_files'
arch/alpha/kernel/pci-sysfs.c:115: warning: Excess function parameter 'dev' description in 'pci_remove_resource_files'
arch/alpha/kernel/pci-sysfs.c:230: warning: Function parameter or member 'pdev' not described in 'pci_create_resource_files'
arch/alpha/kernel/pci-sysfs.c:230: warning: Excess function parameter 'dev' description in 'pci_create_resource_files'
arch/alpha/kernel/pci-sysfs.c:232: warning: No description found for return value of 'pci_create_resource_files'
arch/alpha/kernel/pci-sysfs.c:305: warning: Function parameter or member 'bus' not described in 'pci_adjust_legacy_attr'
arch/alpha/kernel/pci-sysfs.c:305: warning: Excess function parameter 'b' description in 'pci_adjust_legacy_attr'
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Randy Dunlap <[email protected]>
Cc: Richard Henderson <[email protected]>
Cc: Ivan Kokshaysky <[email protected]>
Cc: Matt Turner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|