summaryrefslogtreecommitdiff
path: root/arch/x86/kernel
AgeCommit message (Collapse)Author
2023-07-26uintr_event_write 注释掉打印HEADepoll_waitlistYour Name
2023-07-24uintr_event_write: write 时清除 SN ON (按道理只管 ON 就行)Your Name
- 重新启用 event_wait 参考 uintr_wait_list
2023-07-24注释掉 uintr_event_rstYour Name
2023-07-24添加一些打印epoll_testYour Name
2023-07-21fix 越界访问Your Name
2023-07-21uintr_eventYour Name
2023-07-19eventfd 勉强通过TEMP3Your Name
2023-07-18add system call uintr_event(int fd)temp2Your Name
2023-07-17add UINTR_EVENT_VECTORtemp1Your Name
2023-07-14sys_uintr_wait 添加超时时间支持.qemu 下建议 2575usuintr_wait_fixYour Name
2022-07-11turn off debug, modify makefile for test, ready for iouringtsing-kernel项小羽
2022-05-28modify xstate.c, modify some exsample项小羽
2022-04-18now can caught all intructions, and msrs workxxy
2021-09-12x86/uintr: Introduce uintr_wait() syscallSohil Mehta
Add a new system call to allow applications to block in the kernel and wait for user interrupts. <The current implementation doesn't support waking up from other blocking system calls like sleep(), read(), epoll(), etc. uintr_wait() is a placeholder syscall while we decide on that behaviour.> When the application makes this syscall the notification vector is switched to a new kernel vector. Any new SENDUIPI will invoke the kernel interrupt which is then used to wake up the process. Currently, the task wait list is global one. To make the implementation scalable there is a need to move to a distributed per-cpu wait list. Signed-off-by: Sohil Mehta <[email protected]>
2021-09-12x86/uintr: Introduce user IPI sender syscallsSohil Mehta
Add a registration syscall for a task to register itself as a user interrupt sender using the uintr_fd generated by the receiver. A task can register multiple uintr_fds. Each unique successful connection creates a new entry in the User Interrupt Target Table (UITT). Each entry in the UITT table is referred by the UITT index (uipi_index). The uipi_index returned during the registration syscall lets a sender generate a user IPI using the 'SENDUIPI <uipi_index>' instruction. Also, add a sender unregister syscall to unregister a particular task from the uintr_fd. Calling close on the uintr_fd will disconnect all threads in a sender process from that FD. Currently, the UITT size is arbitrarily chosen as 256 entries corresponding to a 4KB page. Based on feedback and usage data this can either be increased/decreased or made dynamic later. Architecturally, the UITT table can be unique for each thread or shared across threads of the same thread group. The current implementation keeps the UITT as unique for the each thread. This makes the kernel implementation relatively simple and only threads that use uintr get setup with the related structures. However, this means that the uipi_index for each thread would be inconsistent wrt to other threads. (Executing 'SENDUIPI 2' on threads of the same process could generate different user interrupts.) Alternatively, the benefit of sharing the UITT table is that all threads would see the same view of the UITT table. Also the kernel UITT memory allocation would be more efficient if multiple threads connect to the same uintr_fd. However, this would mean the kernel needs to keep the UITT table size MISC_MSR[] in sync across these threads. Also the UPID/UITT teardown flows might need additional consideration. Signed-off-by: Sohil Mehta <[email protected]>
2021-09-12x86/uintr: Introduce vector registration and uintr_fd syscallSohil Mehta
Each receiving task has its own interrupt vector space of 64 vectors. For each vector registered by a task create a uintr_fd. Only tasks that have previously registered a user interrupt handler can register a vector. The sender for the user interrupt could be another userspace application, kernel or an external source (like a device). Any sender that wants to generate a user interrupt needs access to receiver's vector number and UPID. uintr_fd abstracts that information and allows a sender with access to uintr_fd to connect and generate a user interrupt. Upon interrupt delivery, the interrupt handler would be invoked with the associated vector number pushed onto the stack. Using an FD abstraction automatically provides a secure mechanism to connect with a receiver. It also makes the tracking and management of the interrupt vector resource easier for userspace. uintr_fd can be useful in some of the usages where eventfd is used for userspace event notifications. Though uintr_fd is nowhere close to a drop-in replacement, the semantics are meant to be somewhat similar to an eventfd or the write end of a pipe. Access to uintr_fd can be achieved in the following ways: - direct access if the task is part of the same thread group (process) - inherited by a child process. - explicitly shared using any of the FD sharing mechanisms. If the sender is another userspace task, it can use the uintr_fd to send user IPIs to the receiver. This works in conjunction with the SENDUIPI instruction. The details related to this are covered later. The exact APIs for the sender being a kernel or another external source are still being worked upon. The general idea is that the receiver would pass the uintr_fd to the kernel by extending some existing API (like io_uring). The vector associated with uintr_fd can be unregistered by closing all references to the uintr_fd. Signed-off-by: Sohil Mehta <[email protected]>
2021-09-12x86/process/64: Clean up uintr task fork and exit pathsSohil Mehta
The user interrupt MSRs and the user interrupt state is task specific. During task fork and exit clear the task state, clear the MSRs and dereference the shared resources. Some of the memory resources like the UPID are referenced in the file descriptor and could be in use while the uintr_fd is still valid. Instead of freeing up the UPID just dereference it. Eventually when every user releases the reference the memory resource will be freed up. Signed-off-by: Jacob Pan <[email protected]> Signed-off-by: Sohil Mehta <[email protected]>
2021-09-12x86/process/64: Add uintr task context switch supportSohil Mehta
User interrupt state is saved and restored using xstate supervisor feature support. This includes the MSR state and the User Interrupt Flag (UIF) value. During context switch update the UPID for a uintr task to reflect the current state of the task; namely whether the task should receive interrupt notifications and which cpu the task is currently running on. XSAVES clears the notification vector (UINV) in the MISC MSR to prevent interrupts from being recognized in the UIRR MSR while the task is being context switched. The UINV is restored back when the kernel does an XRSTORS. However, this conflicts with the kernel's lazy restore optimization which skips an XRSTORS if the kernel is scheduling the same user task back and the underlying MSR state hasn't been modified. Special handling is needed for a uintr task in the context switch path to keep using this optimization. Signed-off-by: Jacob Pan <[email protected]> Signed-off-by: Sohil Mehta <[email protected]>
2021-09-12x86/uintr: Introduce uintr receiver syscallsSohil Mehta
Any application that wants to receive a user interrupt needs to register an interrupt handler with the kernel. Add a registration syscall that sets up the interrupt handler and the related kernel structures for the task that makes this syscall. Only one interrupt handler per task can be registered with the kernel/hardware. Each task has its private interrupt vector space of 64 vectors. The vector registration and the related FD management is covered later. Also add an unregister syscall to let a task unregister the interrupt handler. The UPID for each receiver task needs to be updated whenever a task gets context switched or it moves from one cpu to another. This will also be covered later. The system calls haven't been wired up yet so no real harm is done if we don't update the UPID right now. <Code typically in the x86/kernel directory doesn't deal with file descriptor management. I have kept uintr_fd.c separate to make it easier to move it somewhere else if needed.> Signed-off-by: Jacob Pan <[email protected]> Signed-off-by: Sohil Mehta <[email protected]>
2021-09-12x86/irq: Reserve a user IPI notification vectorSohil Mehta
A user interrupt notification vector is used on the receiver's cpu to identify an interrupt as a user interrupt (and not a kernel interrupt). Hardware uses the same notification vector to generate an IPI from a sender's cpu core when the SENDUIPI instruction is executed. Typically, the kernel shouldn't receive an interrupt with this vector. However, it is possible that the kernel might receive this vector. Scenario that can cause the spurious interrupt: Step cpu 0 (receiver task) cpu 1 (sender task) ---- --------------------- ------------------- 1 task is running 2 executes SENDUIPI 3 IPI sent 4 context switched out 5 IPI delivered (kernel interrupt detected) A kernel interrupt can be detected, if a receiver task gets scheduled out after the SENDUIPI-based IPI was sent but before the IPI was delivered. The kernel doesn't need to do anything in this case other than receiving the interrupt and clearing the local APIC. The user interrupt is always stored in the receiver's UPID before the IPI is generated. When the receiver gets scheduled back the interrupt would be delivered based on its UPID. Signed-off-by: Jacob Pan <[email protected]> Signed-off-by: Sohil Mehta <[email protected]>
2021-09-12x86/fpu/xstate: Enumerate User Interrupts supervisor stateSohil Mehta
Enable xstate supervisor support for User Interrupts by default. The user interrupt state for a task consists of the MSR state and the User Interrupt Flag (UIF) value. XSAVES and XRSTORS handle saving and restoring both of these states. <The supervisor XSTATE code might be reworked based on issues reported in the past. The Uintr context switching code would also need rework and additional testing in that regard.> Signed-off-by: Sohil Mehta <[email protected]>
2021-09-12x86/cpu: Enumerate User Interrupts supportSohil Mehta
User Interrupts support including user IPIs is enumerated through cpuid. The 'uintr' flag in /proc/cpuinfo can be used to identify it. The recommended mechanism for user applications to detect support is calling the uintr related syscalls. Use CONFIG_X86_USER_INTERRUPTS to compile with User Interrupts support. The feature can be disabled at boot time using the 'nouintr' kernel parameter. SENDUIPI is a special ring-3 instruction that makes a supervisor mode memory access to the UPID and UITT memory. Currently, KPTI needs to be off for User IPIs to work. Processors that support user interrupts are not affected by Meltdown so the auto mode of KPTI will default to off. Users who want to force enable KPTI will need to wait for a later version of this patch series that is compatible with KPTI. We need to allocate the UPID and UITT structures from a special memory region that has supervisor access but it is mapped into userspace. The plan is to implement a mechanism similar to LDT. Signed-off-by: Jacob Pan <[email protected]> Signed-off-by: Sohil Mehta <[email protected]>
2021-09-11Merge branch 'linus' into smp/urgentThomas Gleixner
Ensure that all usage sites of get/put_online_cpus() except for the struggler in drivers/thermal are gone. So the last user and the deprecated inlines can be removed.
2021-09-07Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM updates from Paolo Bonzini: "ARM: - Page ownership tracking between host EL1 and EL2 - Rely on userspace page tables to create large stage-2 mappings - Fix incompatibility between pKVM and kmemleak - Fix the PMU reset state, and improve the performance of the virtual PMU - Move over to the generic KVM entry code - Address PSCI reset issues w.r.t. save/restore - Preliminary rework for the upcoming pKVM fixed feature - A bunch of MM cleanups - a vGIC fix for timer spurious interrupts - Various cleanups s390: - enable interpretation of specification exceptions - fix a vcpu_idx vs vcpu_id mixup x86: - fast (lockless) page fault support for the new MMU - new MMU now the default - increased maximum allowed VCPU count - allow inhibit IRQs on KVM_RUN while debugging guests - let Hyper-V-enabled guests run with virtualized LAPIC as long as they do not enable the Hyper-V "AutoEOI" feature - fixes and optimizations for the toggling of AMD AVIC (virtualized LAPIC) - tuning for the case when two-dimensional paging (EPT/NPT) is disabled - bugfixes and cleanups, especially with respect to vCPU reset and choosing a paging mode based on CR0/CR4/EFER - support for 5-level page table on AMD processors Generic: - MMU notifier invalidation callbacks do not take mmu_lock unless necessary - improved caching of LRU kvm_memory_slot - support for histogram statistics - add statistics for halt polling and remote TLB flush requests" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (210 commits) KVM: Drop unused kvm_dirty_gfn_invalid() KVM: x86: Update vCPU's hv_clock before back to guest when tsc_offset is adjusted KVM: MMU: mark role_regs and role accessors as maybe unused KVM: MIPS: Remove a "set but not used" variable x86/kvm: Don't enable IRQ when IRQ enabled in kvm_wait KVM: stats: Add VM stat for remote tlb flush requests KVM: Remove unnecessary export of kvm_{inc,dec}_notifier_count() KVM: x86/mmu: Move lpage_disallowed_link further "down" in kvm_mmu_page KVM: x86/mmu: Relocate kvm_mmu_page.tdp_mmu_page for better cache locality Revert "KVM: x86: mmu: Add guest physical address check in translate_gpa()" KVM: x86/mmu: Remove unused field mmio_cached in struct kvm_mmu_page kvm: x86: Increase KVM_SOFT_MAX_VCPUS to 710 kvm: x86: Increase MAX_VCPUS to 1024 kvm: x86: Set KVM_MAX_VCPU_ID to 4*KVM_MAX_VCPUS KVM: VMX: avoid running vmx_handle_exit_irqoff in case of emulation KVM: x86/mmu: Don't freak out if pml5_root is NULL on 4-level host KVM: s390: index kvm->arch.idle_mask by vcpu_idx KVM: s390: Enable specification exception interpretation KVM: arm64: Trim guest debug exception handling KVM: SVM: Add 5-level page table support for SVM ...
2021-09-06Merge tag 'kvmarm-5.15' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for 5.15 - Page ownership tracking between host EL1 and EL2 - Rely on userspace page tables to create large stage-2 mappings - Fix incompatibility between pKVM and kmemleak - Fix the PMU reset state, and improve the performance of the virtual PMU - Move over to the generic KVM entry code - Address PSCI reset issues w.r.t. save/restore - Preliminary rework for the upcoming pKVM fixed feature - A bunch of MM cleanups - a vGIC fix for timer spurious interrupts - Various cleanups
2021-09-06x86/kvm: Don't enable IRQ when IRQ enabled in kvm_waitLai Jiangshan
Commit f4e61f0c9add3 ("x86/kvm: Fix broken irq restoration in kvm_wait") replaced "local_irq_restore() when IRQ enabled" with "local_irq_enable() when IRQ enabled" to suppress a warnning. Although there is no similar debugging warnning for doing local_irq_enable() when IRQ enabled as doing local_irq_restore() in the same IRQ situation. But doing local_irq_enable() when IRQ enabled is no less broken as doing local_irq_restore() and we'd better avoid it. Cc: Mark Rutland <[email protected]> Cc: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Lai Jiangshan <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-09-03Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc updates from Andrew Morton: "173 patches. Subsystems affected by this series: ia64, ocfs2, block, and mm (debug, pagecache, gup, swap, shmem, memcg, selftests, pagemap, mremap, bootmem, sparsemem, vmalloc, kasan, pagealloc, memory-failure, hugetlb, userfaultfd, vmscan, compaction, mempolicy, memblock, oom-kill, migration, ksm, percpu, vmstat, and madvise)" * emailed patches from Andrew Morton <[email protected]>: (173 commits) mm/madvise: add MADV_WILLNEED to process_madvise() mm/vmstat: remove unneeded return value mm/vmstat: simplify the array size calculation mm/vmstat: correct some wrong comments mm/percpu,c: remove obsolete comments of pcpu_chunk_populated() selftests: vm: add COW time test for KSM pages selftests: vm: add KSM merging time test mm: KSM: fix data type selftests: vm: add KSM merging across nodes test selftests: vm: add KSM zero page merging test selftests: vm: add KSM unmerge test selftests: vm: add KSM merge test mm/migrate: correct kernel-doc notation mm: wire up syscall process_mrelease mm: introduce process_mrelease system call memblock: make memblock_find_in_range method private mm/mempolicy.c: use in_task() in mempolicy_slab_node() mm/mempolicy: unify the create() func for bind/interleave/prefer-many policies mm/mempolicy: advertise new MPOL_PREFERRED_MANY mm/hugetlb: add support for mempolicy MPOL_PREFERRED_MANY ...
2021-09-03memblock: make memblock_find_in_range method privateMike Rapoport
There are a lot of uses of memblock_find_in_range() along with memblock_reserve() from the times memblock allocation APIs did not exist. memblock_find_in_range() is the very core of memblock allocations, so any future changes to its internal behaviour would mandate updates of all the users outside memblock. Replace the calls to memblock_find_in_range() with an equivalent calls to memblock_phys_alloc() and memblock_phys_alloc_range() and make memblock_find_in_range() private method of memblock. This simplifies the callers, ensures that (unlikely) errors in memblock_reserve() are handled and improves maintainability of memblock_find_in_range(). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport <[email protected]> Reviewed-by: Catalin Marinas <[email protected]> [arm64] Acked-by: Kirill A. Shutemov <[email protected]> Acked-by: Rafael J. Wysocki <[email protected]> [ACPI] Acked-by: Russell King (Oracle) <[email protected]> Acked-by: Nick Kossifidis <[email protected]> [riscv] Tested-by: Guenter Roeck <[email protected]> Acked-by: Rob Herring <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-03memcg: enable accounting for ldt_struct objectsVasily Averin
Each task can request own LDT and force the kernel to allocate up to 64Kb memory per-mm. There are legitimate workloads with hundreds of processes and there can be hundreds of workloads running on large machines. The unaccounted memory can cause isolation issues between the workloads particularly on highly utilized machines. It makes sense to account for this objects to restrict the host's memory consumption from inside the memcg-limited container. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Vasily Averin <[email protected]> Acked-by: Borislav Petkov <[email protected]> Reviewed-by: Shakeel Butt <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: Andrei Vagin <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Christian Brauner <[email protected]> Cc: Dmitry Safonov <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: "J. Bruce Fields" <[email protected]> Cc: Jeff Layton <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Jiri Slaby <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Kirill Tkhai <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Serge Hallyn <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vladimir Davydov <[email protected]> Cc: Yutian Yang <[email protected]> Cc: Zefan Li <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-02Merge tag 'dma-mapping-5.15' of git://git.infradead.org/users/hch/dma-mappingLinus Torvalds
Pull dma-mapping updates from Christoph Hellwig: - fix debugfs initialization order (Anthony Iliopoulos) - use memory_intersects() directly (Kefeng Wang) - allow to return specific errors from ->map_sg (Logan Gunthorpe, Martin Oliveira) - turn the dma_map_sg return value into an unsigned int (me) - provide a common global coherent pool іmplementation (me) * tag 'dma-mapping-5.15' of git://git.infradead.org/users/hch/dma-mapping: (31 commits) hexagon: use the generic global coherent pool dma-mapping: make the global coherent pool conditional dma-mapping: add a dma_init_global_coherent helper dma-mapping: simplify dma_init_coherent_memory dma-mapping: allow using the global coherent pool for !ARM ARM/nommu: use the generic dma-direct code for non-coherent devices dma-direct: add support for dma_coherent_default_memory dma-mapping: return an unsigned int from dma_map_sg{,_attrs} dma-mapping: disallow .map_sg operations from returning zero on error dma-mapping: return error code from dma_dummy_map_sg() x86/amd_gart: don't set failed sg dma_address to DMA_MAPPING_ERROR x86/amd_gart: return error code from gart_map_sg() xen: swiotlb: return error code from xen_swiotlb_map_sg() parisc: return error code from .map_sg() ops sparc/iommu: don't set failed sg dma_address to DMA_MAPPING_ERROR sparc/iommu: return error codes from .map_sg() ops s390/pci: don't set failed sg dma_address to DMA_MAPPING_ERROR s390/pci: return error code from s390_dma_map_sg() powerpc/iommu: don't set failed sg dma_address to DMA_MAPPING_ERROR powerpc/iommu: return error code from .map_sg() ops ...
2021-09-01Merge tag 'printk-for-5.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux Pull printk updates from Petr Mladek: - Optionally, provide an index of possible printk messages via <debugfs>/printk/index/. It can be used when monitoring important kernel messages on a farm of various hosts. The monitor has to be updated when some messages has changed or are not longer available by a newly deployed kernel. - Add printk.console_no_auto_verbose boot parameter. It allows to generate crash dump even with slow consoles in a reasonable time frame. - Remove printk_safe buffers. The messages are always stored directly to the main logbuffer, even in NMI or recursive context. Also it allows to serialize syslog operations by a mutex instead of a spin lock. - Misc clean up and build fixes. * tag 'printk-for-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: printk/index: Fix -Wunused-function warning lib/nmi_backtrace: Serialize even messages about idle CPUs printk: Add printk.console_no_auto_verbose boot parameter printk: Remove console_silent() lib/test_scanf: Handle n_bits == 0 in random tests printk: syslog: close window between wait and read printk: convert @syslog_lock to mutex printk: remove NMI tracking printk: remove safe buffers printk: track/limit recursion lib/nmi_backtrace: explicitly serialize banner and regs printk: Move the printk() kerneldoc comment to its new home printk/index: Fix warning about missing prototypes MIPS/asm/printk: Fix build failure caused by printk printk: index: Add indexing support to dev_printk printk: Userspace format indexing support printk: Rework parse_prefix into printk_parse_prefix printk: Straighten out log_flags into printk_info_flags string_helpers: Escape double quotes in escape_special printk/console: Check consistent sequence number when handling race in console_unlock()
2021-09-01Merge tag 'hyperv-next-signed-20210831' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv updates from Wei Liu: - make Hyper-V code arch-agnostic (Michael Kelley) - fix sched_clock behaviour on Hyper-V (Ani Sinha) - fix a fault when Linux runs as the root partition on MSHV (Praveen Kumar) - fix VSS driver (Vitaly Kuznetsov) - cleanup (Sonia Sharma) * tag 'hyperv-next-signed-20210831' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: hv_utils: Set the maximum packet size for VSS driver to the length of the receive buffer Drivers: hv: Enable Hyper-V code to be built on ARM64 arm64: efi: Export screen_info arm64: hyperv: Initialize hypervisor on boot arm64: hyperv: Add panic handler arm64: hyperv: Add Hyper-V hypercall and register access utilities x86/hyperv: fix root partition faults when writing to VP assist page MSR hv: hyperv.h: Remove unused inline functions drivers: hv: Decouple Hyper-V clock/timer code from VMbus drivers x86/hyperv: add comment describing TSC_INVARIANT_CONTROL MSR setting bit 0 Drivers: hv: Move Hyper-V misc functionality to arch-neutral code Drivers: hv: Add arch independent default functions for some Hyper-V handlers Drivers: hv: Make portions of Hyper-V init code be arch neutral x86/hyperv: fix for unwanted manipulation of sched_clock when TSC marked unstable asm-generic/hyperv: Add missing #include of nmi.h
2021-09-01Merge branch 'siginfo-si_trapno-for-v5.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull siginfo si_trapno updates from Eric Biederman: "The full set of si_trapno changes was not appropriate as a fix for the newly added SIGTRAP TRAP_PERF, and so I postponed the rest of the related cleanups. This is the rest of the cleanups for si_trapno that reduces it from being a really weird arch special case that is expect to be always present (but isn't) on the architectures that support it to being yet another field in the _sigfault union of struct siginfo. The changes have been reviewed and marinated in linux-next. With the removal of this awkward special case new code (like SIGTRAP TRAP_PERF) that works across architectures should be easier to write and maintain" * 'siginfo-si_trapno-for-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: signal: Rename SIL_PERF_EVENT SIL_FAULT_PERF_EVENT for consistency signal: Verify the alignment and size of siginfo_t signal: Remove the generic __ARCH_SI_TRAPNO support signal/alpha: si_trapno is only used with SIGFPE and SIGTRAP TRAP_UNK signal/sparc: si_trapno is only used with SIGILL ILL_ILLTRP arm64: Add compile-time asserts for siginfo_t offsets arm: Add compile-time asserts for siginfo_t offsets sparc64: Add compile-time asserts for siginfo_t offsets
2021-09-01Merge tag 'drm-next-2021-08-31-1' of git://anongit.freedesktop.org/drm/drmLinus Torvalds
Pull drm updates from Dave Airlie: "Highlights: - i915 has seen a lot of refactoring and uAPI cleanups due to a change in the upstream direction going forward This has all been audited with known userspace, but there may be some pitfalls that were missed. - i915 now uses common TTM to enable discrete memory on DG1/2 GPUs - i915 enables Jasper and Elkhart Lake by default and has preliminary XeHP/DG2 support - amdgpu adds support for Cyan Skillfish - lots of implicit fencing rules documented and fixed up in drivers - msm now uses the core scheduler - the irq midlayer has been removed for non-legacy drivers - the sysfb code now works on more than x86. Otherwise the usual smattering of stuff everywhere, panels, bridges, refactorings. Detailed summary: core: - extract i915 eDP backlight into core - DP aux bus support - drm_device.irq_enabled removed - port drivers to native irq interfaces - export gem shadow plane handling for vgem - print proper driver name in framebuffer registration - driver fixes for implicit fencing rules - ARM fixed rate compression modifier added - updated fb damage handling - rmfb ioctl logging/docs - drop drm_gem_object_put_locked - define DRM_FORMAT_MAX_PLANES - add gem fb vmap/vunmap helpers - add lockdep_assert(once) helpers - mark drm irq midlayer as legacy - use offset adjusted bo mapping conversion vgaarb: - cleanups fbdev: - extend efifb handling to all arches - div by 0 fixes for multiple drivers udmabuf: - add hugepage mapping support dma-buf: - non-dynamic exporter fixups - document implicit fencing rules amdgpu: - Initial Cyan Skillfish support - switch virtual DCE over to vkms based atomic - VCN/JPEG power down fixes - NAVI PCIE link handling fixes - AMD HDMI freesync fixes - Yellow Carp + Beige Goby fixes - Clockgating/S0ix/SMU/EEPROM fixes - embed hw fence in job - rework dma-resv handling - ensure eviction to system ram amdkfd: - uapi: SVM address range query added - sysfs leak fix - GPUVM TLB optimizations - vmfault/migration counters i915: - Enable JSL and EHL by default - preliminary XeHP/DG2 support - remove all CNL support (never shipped) - move to TTM for discrete memory support - allow mixed object mmap handling - GEM uAPI spring cleaning - add I915_MMAP_OBJECT_FIXED - reinstate ADL-P mmap ioctls - drop a bunch of unused by userspace features - disable and remove GPU relocations - revert some i915 misfeatures - major refactoring of GuC for Gen11+ - execbuffer object locking separate step - reject caching/set-domain on discrete - Enable pipe DMC loading on XE-LPD and ADL-P - add PSF GV point support - Refactor and fix DDI buffer translations - Clean up FBC CFB allocation code - Finish INTEL_GEN() and friends macro conversions nouveau: - add eDP backlight support - implicit fence fix msm: - a680/7c3 support - drm/scheduler conversion panfrost: - rework GPU reset virtio: - fix fencing for planes ast: - add detect support bochs: - move to tiny GPU driver vc4: - use hotplug irqs - HDMI codec support vmwgfx: - use internal vmware device headers ingenic: - demidlayering irq rcar-du: - shutdown fixes - convert to bridge connector helpers zynqmp-dsub: - misc fixes mgag200: - convert PLL handling to atomic mediatek: - MT8133 AAL support - gem mmap object support - MT8167 support etnaviv: - NXP Layerscape LS1028A SoC support - GEM mmap cleanups tegra: - new user API exynos: - missing unlock fix - build warning fix - use refcount_t" * tag 'drm-next-2021-08-31-1' of git://anongit.freedesktop.org/drm/drm: (1318 commits) drm/amd/display: Move AllowDRAMSelfRefreshOrDRAMClockChangeInVblank to bounding box drm/amd/display: Remove duplicate dml init drm/amd/display: Update bounding box states (v2) drm/amd/display: Update number of DCN3 clock states drm/amdgpu: disable GFX CGCG in aldebaran drm/amdgpu: Clear RAS interrupt status on aldebaran drm/amdgpu: Add support for RAS XGMI err query drm/amdkfd: Account for SH/SE count when setting up cu masks. drm/amdgpu: rename amdgpu_bo_get_preferred_pin_domain drm/amdgpu: drop redundant cancel_delayed_work_sync call drm/amdgpu: add missing cleanups for more ASICs on UVD/VCE suspend drm/amdgpu: add missing cleanups for Polaris12 UVD/VCE on suspend drm/amdkfd: map SVM range with correct access permission drm/amdkfd: check access permisson to restore retry fault drm/amdgpu: Update RAS XGMI Error Query drm/amdgpu: Add driver infrastructure for MCA RAS drm/amd/display: Add Logging for HDMI color depth information drm/amd/amdgpu: consolidate PSP TA init shared buf functions drm/amd/amdgpu: add name field back to ras_common_if drm/amdgpu: Fix build with missing pm_suspend_target_state module export ...
2021-09-01x86/setup: Explicitly include acpi.hNathan Chancellor
After commit 342f43af70db ("iscsi_ibft: fix crash due to KASLR physical memory remapping") x86_64_defconfig shows the following errors: arch/x86/kernel/setup.c: In function ‘setup_arch’: arch/x86/kernel/setup.c:916:13: error: implicit declaration of function ‘acpi_mps_check’ [-Werror=implicit-function-declaration] 916 | if (acpi_mps_check()) { | ^~~~~~~~~~~~~~ arch/x86/kernel/setup.c:1110:9: error: implicit declaration of function ‘acpi_table_upgrade’ [-Werror=implicit-function-declaration] 1110 | acpi_table_upgrade(); | ^~~~~~~~~~~~~~~~~~ [... more acpi noise ...] acpi.h was being implicitly included from iscsi_ibft.h in this configuration so the removal of that header means these functions have no definition or declaration. In most other configurations, <linux/acpi.h> continued to be included through at least <linux/tboot.h> if CONFIG_INTEL_TXT was enabled, and there were probably other implicit include paths too. Add acpi.h explicitly so there is no more error, and so that we don't continue to depend on these unreliable implicit include paths. Tested-by: Matthieu Baerts <[email protected]> Signed-off-by: Nathan Chancellor <[email protected]> Cc: Maurizio Lombardi <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Konrad Rzeszutek Wilk <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-01drivers: base: cacheinfo: Get rid of DEFINE_SMP_CALL_CACHE_FUNCTION()Thomas Gleixner
DEFINE_SMP_CALL_CACHE_FUNCTION() was usefel before the CPU hotplug rework to ensure that the cache related functions are called on the upcoming CPU because the notifier itself could run on any online CPU. The hotplug state machine guarantees that the callbacks are invoked on the upcoming CPU. So there is no need to have this SMP function call obfuscation. That indirection was missed when the hotplug notifiers were converted. This also solves the problem of ARM64 init_cache_level() invoking ACPI functions which take a semaphore in that context. That's invalid as SMP function calls run with interrupts disabled. Running it just from the callback in context of the CPU hotplug thread solves this. Fixes: 8571890e1513 ("arm64: Add support for ACPI based firmware tables") Reported-by: Guenter Roeck <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Guenter Roeck <[email protected]> Acked-by: Will Deacon <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/871r69ersb.ffs@tglx
2021-08-31Merge branch 'stable/for-linus-5.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/ibft Pull ibft updates from Konrad Rzeszutek Wilk: "A fix for iBFT parsing code badly interfacing when KASLR is enabled" * 'stable/for-linus-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/ibft: iscsi_ibft: fix warning in reserve_ibft_region() iscsi_ibft: fix crash due to KASLR physical memory remapping
2021-08-31Merge tag 'hwmon-for-v5.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging Pull hwmon updates from Guenter Roeck: "New drivers for: - Aquacomputer D5 Next - SB-RMI power module Added chip support to existing drivers: - Support for various Zen2 and Zen3 APUs and for Yellow Carp (SMU v13) added to k10temp driver - Support for Silicom n5010 PAC added to intel-m10-bmc driver - Support for BPD-RS600 added to pmbus/bpa-rs600 driver Other notable changes: - In k10temp, do not display Tdie on Zen CPUs if there is no difference between Tdie and Tctl - Converted adt7470 and dell-smm drivers to use devm_hwmon_device_register_with_info API - Support for temperature/pwm tables added to axi-fan-control driver - Enabled fan control for Dell Precision 7510 in dell-smm driver Various other minor improvements and fixes in several drivers" * tag 'hwmon-for-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: (41 commits) hwmon: add driver for Aquacomputer D5 Next hwmon: (adt7470) Convert to devm_hwmon_device_register_with_info API hwmon: (adt7470) Convert to use regmap hwmon: (adt7470) Fix some style issues hwmon: (k10temp) Add support for yellow carp hwmon: (k10temp) Rework the temperature offset calculation hwmon: (k10temp) Don't show Tdie for all Zen/Zen2/Zen3 CPU/APU hwmon: (k10temp) Add additional missing Zen2 and Zen3 APUs hwmon: remove amd_energy driver in Makefile hwmon: (dell-smm) Rework SMM function debugging hwmon: (dell-smm) Mark i8k_get_fan_nominal_speed as __init hwmon: (dell-smm) Mark tables as __initconst hwmon: (pmbus/bpa-rs600) Add workaround for incorrect Pin max hwmon: (pmbus/bpa-rs600) Don't use rated limits as warn limits hwmon: (axi-fan-control) Support temperature vs pwm points hwmon: (axi-fan-control) Handle irqs in natural order hwmon: (axi-fan-control) Make sure the clock is enabled hwmon: (pmbus/ibm-cffps) Fix write bits for LED control hwmon: (w83781d) Match on device tree compatibles dt-bindings: hwmon: Add bindings for Winbond W83781D ...
2021-08-30Merge tag 'x86-misc-2021-08-30' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc x86 updates from Thomas Gleixner: "A set of updates for the x86 reboot code: - Limit the Dell Optiplex 990 quirk to early BIOS versions to avoid the full 'power cycle' alike reboot which is required for the buggy BIOSes. - Update documentation for the reboot=pci command line option and document how DMI platform quirks can be overridden" * tag 'x86-misc-2021-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/reboot: Limit Dell Optiplex 990 quirk to early BIOS versions x86/reboot: Document how to override DMI platform quirks x86/reboot: Document the "reboot=pci" option
2021-08-30Merge tag 'x86-irq-2021-08-30' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 PIRQ updates from Thomas Gleixner: "A set of updates to support port 0x22/0x23 based PCI configuration space which can be found on various ALi chipsets and is also available on older Intel systems which expose a PIRQ router. While the Intel support is more or less nostalgia, the ALi chips are still in use on popular embedded boards used for routers" * tag 'x86-irq-2021-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Fix typo s/ECLR/ELCR/ for the PIC register x86: Avoid magic number with ELCR register accesses x86/PCI: Add support for the Intel 82426EX PIRQ router x86/PCI: Add support for the Intel 82374EB/82374SB (ESC) PIRQ router x86/PCI: Add support for the ALi M1487 (IBC) PIRQ router x86: Add support for 0x22/0x23 port I/O configuration space
2021-08-30Merge tag 'x86-cpu-2021-08-30' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cache flush updates from Thomas Gleixner: "A reworked version of the opt-in L1D flush mechanism. This is a stop gap for potential future speculation related hardware vulnerabilities and a mechanism for truly security paranoid applications. It allows a task to request that the L1D cache is flushed when the kernel switches to a different mm. This can be requested via prctl(). Changes vs the previous versions: - Get rid of the software flush fallback - Make the handling consistent with other mitigations - Kill the task when it ends up on a SMT enabled core which defeats the purpose of L1D flushing obviously" * tag 'x86-cpu-2021-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Documentation: Add L1D flushing Documentation x86, prctl: Hook L1D flushing in via prctl x86/mm: Prepare for opt-in based L1D flush in switch_mm() x86/process: Make room for TIF_SPEC_L1D_FLUSH sched: Add task_work callback for paranoid L1D flush x86/mm: Refactor cond_ibpb() to support other use cases x86/smp: Add a per-cpu view of SMT state
2021-08-30Merge tag 'perf-core-2021-08-30' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 perf event updates from Ingo Molnar: - Add support for Intel Sapphire Rapids server CPU uncore events - Allow the AMD uncore driver to be built as a module - Misc cleanups and fixes * tag 'perf-core-2021-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits) perf/x86/amd/ibs: Add bitfield definitions in new <asm/amd-ibs.h> header perf/amd/uncore: Allow the driver to be built as a module x86/cpu: Add get_llc_id() helper function perf/amd/uncore: Clean up header use, use <linux/ include paths instead of <asm/ perf/amd/uncore: Simplify code, use free_percpu()'s built-in check for NULL perf/hw_breakpoint: Replace deprecated CPU-hotplug functions perf/x86/intel: Replace deprecated CPU-hotplug functions perf/x86: Remove unused assignment to pointer 'e' perf/x86/intel/uncore: Fix IIO cleanup mapping procedure for SNR/ICX perf/x86/intel/uncore: Support IMC free-running counters on Sapphire Rapids server perf/x86/intel/uncore: Support IIO free-running counters on Sapphire Rapids server perf/x86/intel/uncore: Factor out snr_uncore_mmio_map() perf/x86/intel/uncore: Add alias PMU name perf/x86/intel/uncore: Add Sapphire Rapids server MDF support perf/x86/intel/uncore: Add Sapphire Rapids server M3UPI support perf/x86/intel/uncore: Add Sapphire Rapids server UPI support perf/x86/intel/uncore: Add Sapphire Rapids server M2M support perf/x86/intel/uncore: Add Sapphire Rapids server IMC support perf/x86/intel/uncore: Add Sapphire Rapids server PCU support perf/x86/intel/uncore: Add Sapphire Rapids server M2PCIe support ...
2021-08-30Merge tag 'x86_cleanups_for_v5.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Borislav Petkov: "The usual round of minor cleanups and fixes" * tag 'x86_cleanups_for_v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/kaslr: Have process_mem_region() return a boolean x86/power: Fix kernel-doc warnings in cpu.c x86/mce/inject: Replace deprecated CPU-hotplug functions. x86/microcode: Replace deprecated CPU-hotplug functions. x86/mtrr: Replace deprecated CPU-hotplug functions. x86/mmiotrace: Replace deprecated CPU-hotplug functions.
2021-08-30Merge tag 'x86_cache_for_v5.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 resource control updates from Borislav Petkov: "A first round of changes towards splitting the arch-specific bits from the filesystem bits of resctrl, the ultimate goal being to support ARM's equivalent technology MPAM, with the same fs interface (James Morse)" * tag 'x86_cache_for_v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits) x86/resctrl: Make resctrl_arch_get_config() return its value x86/resctrl: Merge the CDP resources x86/resctrl: Expand resctrl_arch_update_domains()'s msr_param range x86/resctrl: Remove rdt_cdp_peer_get() x86/resctrl: Merge the ctrl_val arrays x86/resctrl: Calculate the index from the configuration type x86/resctrl: Apply offset correction when config is staged x86/resctrl: Make ctrlval arrays the same size x86/resctrl: Pass configuration type to resctrl_arch_get_config() x86/resctrl: Add a helper to read a closid's configuration x86/resctrl: Rename update_domains() to resctrl_arch_update_domains() x86/resctrl: Allow different CODE/DATA configurations to be staged x86/resctrl: Group staged configuration into a separate struct x86/resctrl: Move the schemata names into struct resctrl_schema x86/resctrl: Add a helper to read/set the CDP configuration x86/resctrl: Swizzle rdt_resource and resctrl_schema in pseudo_lock_region x86/resctrl: Pass the schema to resctrl filesystem functions x86/resctrl: Add resctrl_arch_get_num_closid() x86/resctrl: Store the effective num_closid in the schema x86/resctrl: Walk the resctrl schema list instead of an arch list ...
2021-08-30Merge tag 'ras_core_for_v5.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS update from Borislav Petkov: "A single RAS change for 5.15: - Do not start processing MCEs logged early because the decoding chain is not up yet - delay that processing until everything is ready" * tag 'ras_core_for_v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce: Defer processing of early errors
2021-08-27hwmon: (k10temp) Add support for yellow carpMario Limonciello
Yellow carp matches same behavior as green sardine and other Zen3 products, but have different CCD offsets. Signed-off-by: Mario Limonciello <[email protected]> Acked-by: Borislav Petkov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Guenter Roeck <[email protected]>
2021-08-26x86/cpu: Add get_llc_id() helper functionKim Phillips
Factor out a helper function rather than export cpu_llc_id, which is needed in order to be able to build the AMD uncore driver as a module. Signed-off-by: Kim Phillips <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-08-24x86/mce: Defer processing of early errorsBorislav Petkov
When a fatal machine check results in a system reset, Linux does not clear the error(s) from machine check bank(s) - hardware preserves the machine check banks across a warm reset. During initialization of the kernel after the reboot, Linux reads, logs, and clears all machine check banks. But there is a problem. In: 5de97c9f6d85 ("x86/mce: Factor out and deprecate the /dev/mcelog driver") the call to mce_register_decode_chain() moved later in the boot sequence. This means that /dev/mcelog doesn't see those early error logs. This was partially fixed by: cd9c57cad3fe ("x86/MCE: Dump MCE to dmesg if no consumers") which made sure that the logs were not lost completely by printing to the console. But parsing console logs is error prone. Users of /dev/mcelog should expect to find any early errors logged to standard places. Add a new flag MCP_QUEUE_LOG to machine_check_poll() to be used in early machine check initialization to indicate that any errors found should just be queued to genpool. When mcheck_late_init() is called it will call mce_schedule_work() to actually log and flush any errors queued in the genpool. [ Based on an original patch, commit message by and completely productized by Tony Luck. ] Fixes: 5de97c9f6d85 ("x86/mce: Factor out and deprecate the /dev/mcelog driver") Reported-by: Sumanth Kamatala <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Signed-off-by: Tony Luck <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2021-08-22x86/resctrl: Fix a maybe-uninitialized build warning treated as errorBabu Moger
The recent commit 064855a69003 ("x86/resctrl: Fix default monitoring groups reporting") caused a RHEL build failure with an uninitialized variable warning treated as an error because it removed the default case snippet. The RHEL Makefile uses '-Werror=maybe-uninitialized' to force possibly uninitialized variable warnings to be treated as errors. This is also reported by smatch via the 0day robot. The error from the RHEL build is: arch/x86/kernel/cpu/resctrl/monitor.c: In function ‘__mon_event_count’: arch/x86/kernel/cpu/resctrl/monitor.c:261:12: error: ‘m’ may be used uninitialized in this function [-Werror=maybe-uninitialized] m->chunks += chunks; ^~ The upstream Makefile does not build using '-Werror=maybe-uninitialized'. So, the problem is not seen there. Fix the problem by putting back the default case snippet. [ bp: note that there's nothing wrong with the code and other compilers do not trigger this warning - this is being done just so the RHEL compiler is happy. ] Fixes: 064855a69003 ("x86/resctrl: Fix default monitoring groups reporting") Reported-by: Terry Bowman <[email protected]> Reported-by: kernel test robot <[email protected]> Signed-off-by: Babu Moger <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/162949631908.23903.17090272726012848523.stgit@bmoger-ubuntu
2021-08-15Merge tag 'irq-urgent-2021-08-15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Thomas Gleixner: "A set of fixes for PCI/MSI and x86 interrupt startup: - Mask all MSI-X entries when enabling MSI-X otherwise stale unmasked entries stay around e.g. when a crashkernel is booted. - Enforce masking of a MSI-X table entry when updating it, which mandatory according to speification - Ensure that writes to MSI[-X} tables are flushed. - Prevent invalid bits being set in the MSI mask register - Properly serialize modifications to the mask cache and the mask register for multi-MSI. - Cure the violation of the affinity setting rules on X86 during interrupt startup which can cause lost and stale interrupts. Move the initial affinity setting ahead of actualy enabling the interrupt. - Ensure that MSI interrupts are completely torn down before freeing them in the error handling case. - Prevent an array out of bounds access in the irq timings code" * tag 'irq-urgent-2021-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: driver core: Add missing kernel doc for device::msi_lock genirq/msi: Ensure deactivation on teardown genirq/timings: Prevent potential array overflow in __irq_timings_store() x86/msi: Force affinity setup before startup x86/ioapic: Force affinity setup before startup genirq: Provide IRQCHIP_AFFINITY_PRE_STARTUP PCI/MSI: Protect msi_desc::masked for multi-MSI PCI/MSI: Use msi_mask_irq() in pci_msi_shutdown() PCI/MSI: Correct misleading comments PCI/MSI: Do not set invalid bits in MSI mask PCI/MSI: Enforce MSI[X] entry updates to be visible PCI/MSI: Enforce that MSI-X table entry is masked for update PCI/MSI: Mask all unused MSI-X entries PCI/MSI: Enable and mask MSI-X early