diff options
| author | zhangyang <[email protected]> | 2023-11-10 06:00:24 +0000 |
|---|---|---|
| committer | zhangyang <[email protected]> | 2023-11-10 06:00:24 +0000 |
| commit | d0a3df1e25969f90f5395bdfac0032242e281ac6 (patch) | |
| tree | faa45fb8a27dc1a9c3819be290c5ed297fd7b8de | |
| parent | 29517e160cac927a74bbd9880baeb06ce3d04a69 (diff) | |
update readme
| -rw-r--r-- | README.md | 129 | ||||
| -rw-r--r-- | README_zh.md | 113 |
2 files changed, 182 insertions, 60 deletions
@@ -1,64 +1,101 @@ ## watch_monitor -Monitor the given address and length, and print the system stack information when exceeding the set condition. +Monitor numerical variables (given address, length), and print system stack information when the set conditions are exceeded. + +Number of simultaneous monitoring +- Monitoring with the same timing length will be grouped into one group, corresponding to one timer. +- A set of up to 32 variables, after which a new timer is allocated. +- The global maximum number of timers is 128. +- The above quantity limit is defined in the `watch_module.h` header macro. + +Currently, monitoring is limited to the same application, and simultaneous calls from multiple applications are not currently supported. +- Multiple applications can work normally if only one program calls `cancel_all_watch();`. ## Usage -See example in helloworld.c -- Add #include "watch.h" -- Pass in the address and length to monitor, set the threshold, compare way, timer interval (ns) etc. -- start_watch(watch_arg); to start monitoring -- Call cancel_watch(); when need to cancel monitoring +Example: helloworld.c +- Add `#include "watch.h"` +- Set each variable that needs to be monitored: name && address && length, set threshold, comparison method, timer interval (ns), etc. +- `start_watch(watch_arg);` Start monitoring +- Call `cancel_all_watch();` when you need to cancel monitoring -When exceeding the set condition, print system stack information, check with dmesg, see example below: -- After printing stack, the timer will restart after 1s, and start next round of monitoring after 1s. +When the set conditions are exceeded, the system stack information is printed and viewed with `dmesg`, as shown in the following example: +- Within a timer, if multiple variables exceed the threshold, the stack information will not be output repeatedly; +- The timer restart time after printing the stack is 1s, and the next round of monitoring will start after 1s. ```log -[ 1144.582249] ------------------------------------- -[ 1144.582250] -------------watch monitor----------- -[ 1144.582251] Timestamp (us): 1699411664908117 -[ 1144.582253] Recent Load: 0.32, 0.26, 0.26 -[ 1144.582255] Running task -[ 1144.582342] Uninterrupted task -[ 1144.582343] uninterrupted task: name rcu_gp, pid 3 -[ 1144.582352] rescuer_thread+0x290/0x390 -[ 1144.582360] kthread+0xd7/0x100 -[ 1144.582363] ret_from_fork+0x1f/0x30 -[ 1144.582367] uninterrupted task: name rcu_par_gp, pid 4 -[ 1144.582369] rescuer_thread+0x290/0x390 -[ 1144.582371] kthread+0xd7/0x100 -[ 1144.582373] ret_from_fork+0x1f/0x30 +[86245.364861] ------------------------------------- +[86245.364864] -------------watch monitor----------- +[86245.364865] Threshold reached: + name: temp0, threshold: 150 +[86245.364866] Timestamp (ns): 1699589000606300743 +[86245.364867] Recent Load: 116.65, 126.83, 151.17 +[86245.365669] task: name lcore-worker-4, pid 803327 +[86245.365672] task: name lcore-worker-5, pid 803328 +[86245.365673] task: name lcore-worker-6, pid 803329 +[86245.365674] task: name lcore-worker-7, pid 803330 +[86245.365676] task: name lcore-worker-8, pid 803331 +[86245.365677] task: name lcore-worker-9, pid 803332 +[86245.365679] task: name lcore-worker-10, pid 803333 +[86245.365681] task: name lcore-worker-11, pid 803334 +[86245.365682] task: name lcore-worker-68, pid 803335 +[86245.365683] task: name lcore-worker-69, pid 803336 +[86245.365684] task: name lcore-worker-70, pid 803337 +[86245.365685] task: name lcore-worker-71, pid 803338 +[86245.365686] task: name lcore-worker-72, pid 803339 +[86245.365687] task: name lcore-worker-73, pid 803340 +[86245.365688] task: name lcore-worker-74, pid 803341 +[86245.365689] task: name lcore-worker-75, pid 803342 +[86245.365694] task: name pkt:worker-0, pid 803638 +[86245.365702] hrtimer_nanosleep+0x8d/0x120 +[86245.365709] __x64_sys_nanosleep+0x96/0xd0 +[86245.365711] do_syscall_64+0x37/0x80 +[86245.365716] entry_SYSCALL_64_after_hwframe+0x44/0xae +[86245.365718] task: name pkt:worker-1, pid 803639 +[86245.365721] hrtimer_nanosleep+0x8d/0x120 +[86245.365724] __x64_sys_nanosleep+0x96/0xd0 +[86245.365726] do_syscall_64+0x37/0x80 +[86245.365728] entry_SYSCALL_64_after_hwframe+0x44/0xae +[86245.365730] task: name pkt:worker-2, pid 803640 +[86245.365732] hrtimer_nanosleep+0x8d/0x120 +[86245.365734] __x64_sys_nanosleep+0x96/0xd0 +[86245.365737] do_syscall_64+0x37/0x80 +[86245.365739] entry_SYSCALL_64_after_hwframe+0x44/0xae +[86245.365740] task: name pkt:worker-3, pid 803641 +[86245.365743] hrtimer_nanosleep+0x8d/0x120 ``` ### Parameter Description -start_watch passes in the watch_arg structure. The meanings of each field are as follows: +start_watch passes in the watch_arg structure. The meaning of each field is as follows +- name limit `MAX_NAME_LEN`(15) valid characters ```c typedef struct { pid_t task_id; // current process id + char name[MAX_NAME_LEN + 1]; // name (15+1) void *ptr; // virtual address - int length_byte; // length_byte + int length_byte; // byte long long threshold; // threshold value unsigned char unsigned_flag; // unsigned flag (true: unsigned, false: signed) unsigned char greater_flag; // reverse flag (true: >, false: <) - unsigned long time; // timer interval (ns) + unsigned long time_ns; // timer interval (ns) } watch_arg; ``` -An initialization example: -- When testing in hyper-v virtual machine, setting any value less than 1000ns for hrTimer will freeze the system, so unable to test ns level timing; +An initialization example ```c -watch_arg watch_arg = { +watch_args = (watch_arg){ .task_id = getpid(), .ptr = &temp, + .name = "temp", .length_byte = sizeof(int), - .threshold = 105, - .unsigned_flag = 1, + .threshold = 150 + i, + .unsigned_flag = 0, .greater_flag = 1, - .time = 2, // on hyper-v, 1us will block all system. 2us just fine, maybe 1us is too short for hyper-v + .time_ns = 2000 + (i / 33) * 5000 }; ``` @@ -82,8 +119,32 @@ Only tested on kernel 5.17.15-1.el8.x86_64. ## Other -The program is divided into two parts: character device and user space interface, communicating via ioctl. +The program is divided into two parts: character device and user space interface, both of which communicate through ioctl. User space address access -- The user program passes in a virtual address, get_user_pages_remote is used to obtain the memory page where the address is located, and kmap maps it to the kernel. -- The memory page address + offset is stored in the global variable k_watch_arg, and hrTimer polls to access k_watch_arg to get the real value. +- The variable virtual address passed in by the user program, use `get_user_pages_remote` to obtain the memory page where the address is located, and `kmap` maps it to the kernel. + - In the 192.168.40.204 environment, the HugeTLB Pages test mounts normally. +- The memory page address + offset is stored in the `kernel_watch_arg` corresponding to the timer, and hrTimer accesses `kernel_watch_arg` when polling to get the real value. + +timer grouping +- The hrTimer data structure is defined in the global array `kernel_wtimer_list`. When allocating a timer, it will check the traversal `kernel_wtimer_list` to compare the timer interval. +- Watches with the same timing interval are assigned to the same group and correspond to the same hrTimer. +- If the number of variables monitored by a timer exceeds `TIMER_MAX_WATCH_NUM` (32), a new hrTimer will be created. +- The total number of hrTimers (`kernel_wtimer_list` array length) limit is `MAX_TIMER_NUM`(128). + +Memory page mount/unmount +- `get_user_pages_remote`/ `kmap` will increase the corresponding count and requires the equivalent `put_page`/`kunmap`. +- A global linked list in the module `watch_local_memory_list` stores the page and kt corresponding to each successfully mounted variable. When performing the close operation of the character device, it is traversed and unloaded. + +Stack output conditions: The conditions are referenced from [diagnose-tools::load.c](https://github.com/alibaba/diagnose-tools/blob/e285bc4626a7d207eabd4a69cb276e1a3b1b7c76/SOURCE/module/kernel/load.c#L209) +- `TASK` must satisfy TASK_RUNNING and `__task_contributes_to_load`. +- `__task_contributes_to_load` corresponds to the kernel macro `task_contributes_to_loa`. + +```c +// https://www.spinics.net/lists/kernel/msg3582022.html +// remove from 5.8.rc3,but it still work +// whether the task contributes to the load +#define __task_contributes_to_load(task) \ + ((READ_ONCE(task->__state) & TASK_UNINTERRUPTIBLE) != 0 && (task->flags & PF_FROZEN) == 0 && \ + (READ_ONCE(task->__state) & TASK_NOLOAD) == 0) +```
\ No newline at end of file diff --git a/README_zh.md b/README_zh.md index 0cb2f80..f2cf4ca 100644 --- a/README_zh.md +++ b/README_zh.md @@ -1,64 +1,101 @@ ## watch_monitor -监控给定 地址和长度, 超过设定条件打印系统堆栈信息. +监控 数值变量(给定 地址,长度), 超过设定条件打印系统堆栈信息. + +同时监控数量 +- 相同定时长度的监控 会被分为一组,对应一个定时器. +- 一组最多 32 个变量,超过后会分配一个新的定时器. +- 定时器数量全局最多 128 个. +- 以上数量限制定义在 `watch_module.h` 头部宏. + +目前限制监控 在同一个应用程序下,暂不支持多个应用程序同时调用. +- 多个应用程序如果只有一个程序调用 `cancel_all_watch();`, 那么也可以正常工作. ## 使用 示例如 helloworld.c - 添加 `#include "watch.h"` -- 传入需要监控的变量地址 && 长度, 设置阈值, 比较方式, 定时器间隔(us) 等. +- 对每个需要监控的变量 设置: 名称 && 地址 && 长度, 设置阈值, 比较方式, 定时器间隔(ns) 等. - `start_watch(watch_arg);` 启动监控 -- 需要取消监控时调用 `cancel_watch();` +- 需要取消监控时调用 `cancel_all_watch();` 超出设定条件时,打印系统堆栈信息, `dmesg` 查看,如下示例: +- 一个定时器内,多个变量超过阈值,堆栈信息不会重复输出; - 打印堆栈后定时器再启动时间为 1s, 1s 后开始下一个轮次监控. ```log -[ 1144.582249] ------------------------------------- -[ 1144.582250] -------------watch monitor----------- -[ 1144.582251] Timestamp (us): 1699411664908117 -[ 1144.582253] Recent Load: 0.32, 0.26, 0.26 -[ 1144.582255] Running task -[ 1144.582342] Uninterrupted task -[ 1144.582343] uninterrupted task: name rcu_gp, pid 3 -[ 1144.582352] rescuer_thread+0x290/0x390 -[ 1144.582360] kthread+0xd7/0x100 -[ 1144.582363] ret_from_fork+0x1f/0x30 -[ 1144.582367] uninterrupted task: name rcu_par_gp, pid 4 -[ 1144.582369] rescuer_thread+0x290/0x390 -[ 1144.582371] kthread+0xd7/0x100 -[ 1144.582373] ret_from_fork+0x1f/0x30 +[86245.364861] ------------------------------------- +[86245.364864] -------------watch monitor----------- +[86245.364865] Threshold reached: + name: temp0, threshold: 150 +[86245.364866] Timestamp (ns): 1699589000606300743 +[86245.364867] Recent Load: 116.65, 126.83, 151.17 +[86245.365669] task: name lcore-worker-4, pid 803327 +[86245.365672] task: name lcore-worker-5, pid 803328 +[86245.365673] task: name lcore-worker-6, pid 803329 +[86245.365674] task: name lcore-worker-7, pid 803330 +[86245.365676] task: name lcore-worker-8, pid 803331 +[86245.365677] task: name lcore-worker-9, pid 803332 +[86245.365679] task: name lcore-worker-10, pid 803333 +[86245.365681] task: name lcore-worker-11, pid 803334 +[86245.365682] task: name lcore-worker-68, pid 803335 +[86245.365683] task: name lcore-worker-69, pid 803336 +[86245.365684] task: name lcore-worker-70, pid 803337 +[86245.365685] task: name lcore-worker-71, pid 803338 +[86245.365686] task: name lcore-worker-72, pid 803339 +[86245.365687] task: name lcore-worker-73, pid 803340 +[86245.365688] task: name lcore-worker-74, pid 803341 +[86245.365689] task: name lcore-worker-75, pid 803342 +[86245.365694] task: name pkt:worker-0, pid 803638 +[86245.365702] hrtimer_nanosleep+0x8d/0x120 +[86245.365709] __x64_sys_nanosleep+0x96/0xd0 +[86245.365711] do_syscall_64+0x37/0x80 +[86245.365716] entry_SYSCALL_64_after_hwframe+0x44/0xae +[86245.365718] task: name pkt:worker-1, pid 803639 +[86245.365721] hrtimer_nanosleep+0x8d/0x120 +[86245.365724] __x64_sys_nanosleep+0x96/0xd0 +[86245.365726] do_syscall_64+0x37/0x80 +[86245.365728] entry_SYSCALL_64_after_hwframe+0x44/0xae +[86245.365730] task: name pkt:worker-2, pid 803640 +[86245.365732] hrtimer_nanosleep+0x8d/0x120 +[86245.365734] __x64_sys_nanosleep+0x96/0xd0 +[86245.365737] do_syscall_64+0x37/0x80 +[86245.365739] entry_SYSCALL_64_after_hwframe+0x44/0xae +[86245.365740] task: name pkt:worker-3, pid 803641 +[86245.365743] hrtimer_nanosleep+0x8d/0x120 ``` ### 参数说明 start_watch 传入的是 watch_arg 结构体.各个字段意义如下 +- name 限制 `MAX_NAME_LEN`(15) 个有效字符 ```c typedef struct { pid_t task_id; // current process id + char name[MAX_NAME_LEN + 1]; // name (15+1) void *ptr; // virtual address - int length_byte; // length_byte + int length_byte; // byte long long threshold; // threshold value unsigned char unsigned_flag; // unsigned flag (true: unsigned, false: signed) unsigned char greater_flag; // reverse flag (true: >, false: <) - unsigned long time; // timer interval (us) + unsigned long time_ns; // timer interval (ns) } watch_arg; ``` 一个初始化示例 -- 测试环境为 hyper-v 虚拟机时,直接为 hrTimer 设置任何小于 1000ns 的值,系统卡死,故未能测试 ns 级别定时; ```c -watch_arg watch_arg = { +watch_args = (watch_arg){ .task_id = getpid(), .ptr = &temp, + .name = "temp", .length_byte = sizeof(int), - .threshold = 105, - .unsigned_flag = 1, + .threshold = 150 + i, + .unsigned_flag = 0, .greater_flag = 1, - .time = 2, // on hyper-v, 1us will block all system. 2us just fine, maybe 1us is too short for hyper-v + .time_ns = 2000 + (i / 33) * 5000 }; ``` @@ -86,5 +123,29 @@ rmmod watch_module.ko && make clean 程序分为两部分: 字符设备 和 用户空间接口, 两者通过 ioctl 通信. 用户空间地址访问 -- 用户程序传入了 虚拟地址, 使用 `get_user_pages_remote` 获取地址所在内存页, `kmap` 将其映射到内核. -- 内存页地址 + 偏移量存入全局变量 `k_watch_arg` 中, hrTimer 轮询时访问 `k_watch_arg` 得到真实值.
\ No newline at end of file +- 用户程序传入的变量 虚拟地址, 使用 `get_user_pages_remote` 获取地址所在内存页, `kmap` 将其映射到内核. + - 192.168.40.204 环境下,HugeTLB Pages 测试挂载正常. +- 内存页地址 + 偏移量存入定时器对应的 `kernel_watch_arg` 中, hrTimer 轮询时访问 `kernel_watch_arg` 得到真实值. + +定时器分组 +- hrTimer 数据结构定义在全局数组 `kernel_wtimer_list`.分配定时器时,会检查遍历 `kernel_wtimer_list` 比较定时器间隔, +- 相同定时间隔的 watch 分配到同一组,对应同一个 hrTimer. +- 若一个定时器监控变量数量超过 `TIMER_MAX_WATCH_NUM` (32),则会创建一个新的 hrTimer. +- hrTimer 的总数量(`kernel_wtimer_list` 数组长度)限制是 `MAX_TIMER_NUM`(128). + +内存页 mount/unmount +- `get_user_pages_remote`/ `kmap` 会增加对应的计数,需要对等的 `put_page`/`kunmap`. +- 一个模块内全局链表 `watch_local_memory_list` 存储每一个成功挂载的变量对应的 page 和 kt,执行字符设备的 close 操作时,遍历并卸载. + +堆栈输出条件: 条件参考自 [diagnose-tools::load.c](https://github.com/alibaba/diagnose-tools/blob/e285bc4626a7d207eabd4a69cb276e1a3b1b7c76/SOURCE/module/kernel/load.c#L209) +- `TASK` 要满足 TASK_RUNNING 和 `__task_contributes_to_load`. +- `__task_contributes_to_load` 对应内核宏 `task_contributes_to_loa`. + +```c +// https://www.spinics.net/lists/kernel/msg3582022.html +// remove from 5.8.rc3,but it still work +// whether the task contributes to the load +#define __task_contributes_to_load(task) \ + ((READ_ONCE(task->__state) & TASK_UNINTERRUPTIBLE) != 0 && (task->flags & PF_FROZEN) == 0 && \ + (READ_ONCE(task->__state) & TASK_NOLOAD) == 0) +```
\ No newline at end of file |
