diff options
| author | zy <[email protected]> | 2023-11-28 02:52:45 -0500 |
|---|---|---|
| committer | zy <[email protected]> | 2023-11-28 02:52:45 -0500 |
| commit | 809f581cefe9c9daad8b38cf1fd322583c617b17 (patch) | |
| tree | 77dff34aacc5a46289f72f613a364a76bac1b4c7 /README.md | |
| parent | 07aec699c94a3293a7e6ecbef112d44c77f9e651 (diff) | |
update readme
Diffstat (limited to 'README.md')
| -rw-r--r-- | README.md | 286 |
1 files changed, 193 insertions, 93 deletions
@@ -1,76 +1,83 @@ ## Variable Monitor -Monitor numerical variables (given address, length), and print system stack information when the set conditions are exceeded. +changelog -Number of simultaneous monitoring -- Monitoring with the same timing length will be grouped into one group, corresponding to one timer. -- A set of up to 32 variables, after which a new timer is allocated. -- The global maximum number of timers is 128. -- The above quantity limit is defined in the `watch_module.h` header macro. - -Currently, monitoring is limited to the same application, and simultaneous calls from multiple applications are not currently supported. -- Multiple applications can work normally if only one program calls `cancel_all_watch();`. +```log +11.9 多个变量监控支持 +11.10 按照 pid 区分不同内核结构, 支持每个进程单独申请取消自己的监控. +11.13 用户接口 cancel_all_watch -> cancel_watch, 每个进程互不干扰. +11.28 完全重构,更新文档. +``` -## Usage +## 说明 -Example: helloworld.c -- Add `#include "watch.h"` -- Set each variable that needs to be monitored: name && address && length, set threshold, comparison method, timer interval (ns), etc. -- `start_watch(watch_arg);` Start monitoring -- Call `cancel_all_watch();` when you need to cancel monitoring +监控 数值变量(给定 地址,长度), 达到设定条件打印系统内 Task 信息(用户态堆栈/内核态堆栈/调用链信息). +- 支持多进程, 单个进程退出时,取消该进程的所有监控. +- 相同定时间隔会分配到同一个定时器,一个定时器最多监控 32 个变量,全局最多 128 个定时器. + - 以上数量限制定义在 `source/module/monitor_timer.h`. + - `testcase/helloworld.c` 有测试到单进程 2049 个变量; -When the set conditions are exceeded, the system stack information is printed and viewed with `dmesg`, as shown in the following example: -- Within a timer, if multiple variables exceed the threshold, the stack information will not be output repeatedly; -- The timer restart time after printing the stack is 1s, and the next round of monitoring will start after 1s. +文件结构 ```log -[86245.364861] ------------------------------------- -[86245.364864] -------------watch monitor----------- -[86245.364865] Threshold reached: - name: temp0, threshold: 150 -[86245.364866] Timestamp (ns): 1699589000606300743 -[86245.364867] Recent Load: 116.65, 126.83, 151.17 -[86245.365669] task: name lcore-worker-4, pid 803327 -[86245.365672] task: name lcore-worker-5, pid 803328 -[86245.365673] task: name lcore-worker-6, pid 803329 -[86245.365674] task: name lcore-worker-7, pid 803330 -[86245.365676] task: name lcore-worker-8, pid 803331 -[86245.365677] task: name lcore-worker-9, pid 803332 -[86245.365679] task: name lcore-worker-10, pid 803333 -[86245.365681] task: name lcore-worker-11, pid 803334 -[86245.365682] task: name lcore-worker-68, pid 803335 -[86245.365683] task: name lcore-worker-69, pid 803336 -[86245.365684] task: name lcore-worker-70, pid 803337 -[86245.365685] task: name lcore-worker-71, pid 803338 -[86245.365686] task: name lcore-worker-72, pid 803339 -[86245.365687] task: name lcore-worker-73, pid 803340 -[86245.365688] task: name lcore-worker-74, pid 803341 -[86245.365689] task: name lcore-worker-75, pid 803342 -[86245.365694] task: name pkt:worker-0, pid 803638 -[86245.365702] hrtimer_nanosleep+0x8d/0x120 -[86245.365709] __x64_sys_nanosleep+0x96/0xd0 -[86245.365711] do_syscall_64+0x37/0x80 -[86245.365716] entry_SYSCALL_64_after_hwframe+0x44/0xae -[86245.365718] task: name pkt:worker-1, pid 803639 -[86245.365721] hrtimer_nanosleep+0x8d/0x120 -[86245.365724] __x64_sys_nanosleep+0x96/0xd0 -[86245.365726] do_syscall_64+0x37/0x80 -[86245.365728] entry_SYSCALL_64_after_hwframe+0x44/0xae -[86245.365730] task: name pkt:worker-2, pid 803640 -[86245.365732] hrtimer_nanosleep+0x8d/0x120 -[86245.365734] __x64_sys_nanosleep+0x96/0xd0 -[86245.365737] do_syscall_64+0x37/0x80 -[86245.365739] entry_SYSCALL_64_after_hwframe+0x44/0xae -[86245.365740] task: name pkt:worker-3, pid 803641 -[86245.365743] hrtimer_nanosleep+0x8d/0x120 +├── build // output +├── source // all source code +│ ├── buffer // 模块与用户空间通信的缓冲区 +│ ├── module // 模块代码 +│ ├── uapi // 用户空间接口 +│ ├── ucli // 用户空间命令行工具 +│ └── ucli_py // 用户空间命令行 python (仅测试用,待完成) +│ └── libunwind // python 解析堆栈信息移植库 +├── testcase // 测试用例 +└── tools // 测试工具 ``` -### Parameter Description +## 使用 + +设定对变量监控有两种函数: 宏定义 或 定义 watch_arg 结构体 +- 都需要添加 `source/uapi` 下的头文件 `#include "monitor_user.h"` + +需要取消监控时调用 `cancel_watch();` variant_monitor 会取消该进程所有监控. +- 当进程退出后,也会执行相同的操作,取消该进程所有监控. +- 因此调用 `cancel_watch();` 是个可选项,但依然建议调用以避免可能的内存泄漏. + +获取 Task 信息是一项耗时操作,这里使用了 workqueue 处理,且一次处理后该定时器重启间隔默认为 5s. +- 此值可以在 `/proc/variable_monitor/dump_reset_sec` 查看和修改. + +### 挂载驱动 + +项目根目录 -start_watch passes in the watch_arg structure. The meaning of each field is as follows -- name limit `MAX_NAME_LEN`(15) valid characters +```bash +# 编译加载模块 +make && insmod source/variable_monitor.ko +# 卸载模块,清理编译文件 +# rmmod source/variable_monitor.ko && make clean +# 仅在 `kernel 5.17.15-1.el8.x86_64` 测试,其他内核版本未测试. +``` +### 宏定义 + +示例如 `testcase/helloworld.c`, 对常见数值类型宏定义 方便使用: +- 其他类型见 `source/uapi/monitor_user_sw.h` ```c +// 传入变量名 | 地址 | 阈值 +START_WATCH_INT("temp", &temp, 150); +START_WATCH_INT_LESS("temp", &temp, 150); +``` + +默认情况下,使用宏定义 定时器的时间间隔为 10us; 此值可以在 `/proc/variable_monitor/def_interval_ns` 查看和修改. + +### watch_arg 结构体 + +如果需要对定时间隔等有更多控制,请定义 watch_arg 结构体,start_watch 启动监控: +- 对每个需要监控的变量 设置: 名称 && 地址 && 长度, 设置阈值, 比较方式, 定时器间隔(ns) 等. +- `start_watch(watch_arg);` 启动监控 +- 需要取消监控时调用 `cancel_watch();` + +```c +// start_watch 传入的是 watch_arg 结构体.各个字段意义如下 +// - name 限制 `MAX_NAME_LEN`(15) 个有效字符 typedef struct { pid_t task_id; // current process id @@ -82,63 +89,156 @@ typedef struct unsigned char greater_flag; // reverse flag (true: >, false: <) unsigned long time_ns; // timer interval (ns) } watch_arg; -``` -An initialization example - -```c +//一个初始化示例 watch_args = (watch_arg){ .task_id = getpid(), .ptr = &temp, .name = "temp", .length_byte = sizeof(int), - .threshold = 150 + i, + .threshold = 150, .unsigned_flag = 0, .greater_flag = 1, - .time_ns = 2000 + (i / 33) * 5000 + .time_ns = 2000 + 5000 }; +start_watch(watch_args); ``` -## demo +### 打印输出 -In the main project directory: +定时器不断按照设定间隔轮询变量,当达到设定条件时,采集此时系统内符合要求的 Task 信息(用户态堆栈/内核态堆栈/调用链信息). +- `dmesg` 可以查看到具体的超出设定条件的变量信息; +- Task 信息被输出到缓存区,使用 ucli 工具查看. -```bash -make && insmod watch_module.ko -./watch +`dmesg` 打印示例如下 + +```log +[42865.640988] ------------------------------------- +[42865.640992] -----------variable monitor---------- +[42865.640993] 超出阈值:1701141698684973655 +[42865.640994] : pid: 63936, name: temp0, ptr: 00000000bade6e61, threshold:110 +[42865.648068] ------------------------------------- +[42875.640703] ------------------------------------- +[42875.640706] -----------variable monitor---------- +[42875.640706] 超出阈值:1701141708684881779 +[42875.640708] : pid: 63936, name: temp0, ptr: 00000000bade6e61, threshold:110 +[42875.640710] : pid: 63936, name: temp1, ptr: 00000000ee645b96, threshold:111 +[42875.640711] : pid: 63936, name: temp2, ptr: 00000000f62b7afe, threshold:112 +[42875.640711] : pid: 63936, name: temp3, ptr: 00000000d100fa3c, threshold:113 +[42875.640712] : pid: 63936, name: temp4, ptr: 000000006d31cae1, threshold:114 +[42875.640712] : pid: 63936, name: temp5, ptr: 00000000723c7a2a, threshold:115 +[42875.640713] : pid: 63936, name: temp6, ptr: 0000000026ef6e83, threshold:116 +[42875.640714] : pid: 63936, name: temp7, ptr: 00000000fc1e5d5e, threshold:117 +[42875.640714] : pid: 63936, name: temp8, ptr: 0000000069b2666e, threshold:118 +[42875.640715] : pid: 63936, name: temp9, ptr: 000000000176263d, threshold:119 +[42875.648023] ------------------------------------- ``` -You can see the printed stack information in dmesg +默认情况下 `ucli` 编译后在 build 文件夹下 -```bash -# Unload module and clean compile files -rmmod watch_module.ko && make clean +`ucli > output` +- ucli 会将缓存区内容解析后输出到 `output` 文件中. +- **此操作会清空缓存区** + +`ucli` 工具输出示例如下(详情见 output_example) +- userstack 是 testcase 下的堆栈信息测试程序. + +```log +##CGROUP:[/] 51666 [510] 采样命中[D] + 进程信息: [/ / userstack], PID: 51666 / 51666 +##C++ pid 51666 + 用户态堆栈SP:7ffcd5822298, BP:2, IP:7f071c720838 +#~ 0x7f071c720838 __GI___nanosleep ([symbol]) +#~ 0x7f071c72076e __sleep ([symbol]) +#~ 0x400a08 customFunction1 ([symbol]) +#~ 0x400a64 customFunction3 ([symbol]) +#~ 0x400a42 customFunction2 ([symbol]) +#~ 0x400a21 customFunction1 ([symbol]) +#~ 0x400a64 customFunction3 ([symbol]) +#~ 0x400a42 customFunction2 ([symbol]) +#~ 0x400a21 customFunction1 ([symbol]) +#~ 0x400a64 customFunction3 ([symbol]) +#~ 0x400a42 customFunction2 ([symbol]) +#~ 0x400a21 customFunction1 ([symbol]) +#~ 0x400a64 customFunction3 ([symbol]) +#~ 0x400a42 customFunction2 ([symbol]) +#~ 0x400a21 customFunction1 ([symbol]) +#~ 0x400a64 customFunction3 ([symbol]) +#~ 0x400a42 customFunction2 ([symbol]) +#~ 0x400a21 customFunction1 ([symbol]) +#~ 0x400a64 customFunction3 ([symbol]) +#~ 0x400a42 customFunction2 ([symbol]) +#~ 0x400a21 customFunction1 ([symbol]) +#~ 0x400a64 customFunction3 ([symbol]) +#~ 0x400a42 customFunction2 ([symbol]) +#~ 0x400a21 customFunction1 ([symbol]) +#~ 0x400a64 customFunction3 ([symbol]) +#~ 0x400a42 customFunction2 ([symbol]) +#~ 0x400a21 customFunction1 ([symbol]) +#~ 0x400a64 customFunction3 ([symbol]) +#~ 0x400a42 customFunction2 ([symbol]) +#~ 0x400a21 customFunction1 ([symbol]) +#~ 0x400a64 customFunction3 ([symbol]) +#~ 0x400a42 customFunction2 ([symbol]) +#~ 0x400a21 customFunction1 ([symbol]) +#~ 0x400a75 main ([symbol]) +#~ 0x7f071c661d85 __libc_start_main ([symbol]) +#~ 0x40081e _start ([symbol]) + 内核态堆栈: +#@ 0xffffffff811730dd hrtimer_nanosleep ([kernel.kallsyms]) +#@ 0xffffffff811733a6 __x64_sys_nanosleep ([kernel.kallsyms]) +#@ 0xffffffff819fa117 do_syscall_64 ([kernel.kallsyms]) +#@ 0xffffffff81c0007c entry_SYSCALL_64_after_hwframe ([kernel.kallsyms]) +#@ 0xffffffff819fa117 do_syscall_64 ([kernel.kallsyms]) +#@ 0xffffffff81c0007c entry_SYSCALL_64_after_hwframe ([kernel.kallsyms]) +#@ 0xffffffff819fa117 do_syscall_64 ([kernel.kallsyms]) +#@ 0xffffffff81c0007c entry_SYSCALL_64_after_hwframe ([kernel.kallsyms]) +#* 0xffffffffffffff userstack (UNKNOWN) + 进程链信息: +#^ 0xffffffffffffff ./build/userstack (UNKNOWN) +#^ 0xffffffffffffff /bin/bash --init-file /root/.vscode-server-insiders/cli/servers/Insiders-ca9da6c177fc4cf7429e1d0c1c52f710d6d953c6/server/out/vs/workbench/contrib/terminal/browser/media/shellIntegration-bash.sh (UNKNOWN) +#^ 0xffffffffffffff /root/.vscode-server-insiders/cli/servers/Insiders-ca9da6c177fc4cf7429e1d0c1c52f710d6d953c6/server/node /root/.vscode-server-insiders/cli/servers/Insiders-ca9da6c177fc4cf7429e1d0c1c52f710d6d953c6/server/out/bootstrap-fork --type=ptyHost --logsPath /root/ (UNKNOWN) +#^ 0xffffffffffffff /root/.vscode-server-insiders/cli/servers/Insiders-ca9da6c177fc4cf7429e1d0c1c52f710d6d953c6/server/node /root/.vscode-server-insiders/cli/servers/Insiders-ca9da6c177fc4cf7429e1d0c1c52f710d6d953c6/server/out/server-main.js --connection-token=remotessh --a (UNKNOWN) +#^ 0xffffffffffffff sh /root/.vscode-server-insiders/cli/servers/Insiders-ca9da6c177fc4cf7429e1d0c1c52f710d6d953c6/server/bin/code-server-insiders --connection-token=remotessh --accept-server-license-terms --start-server --enable-remote-auto-shutdown --socket-path=/tmp/code (UNKNOWN) +#^ 0xffffffffffffff /root/.vscode-server-insiders/code-insiders-ca9da6c177fc4cf7429e1d0c1c52f710d6d953c6 command-shell --cli-data-dir /root/.vscode-server-insiders/cli --on-port --require-token b5a047063eb7 (UNKNOWN) +#^ 0xffffffffffffff /usr/lib/systemd/systemd --switched-root --system --deserialize 17 (UNKNOWN) +## ``` -Only tested on kernel 5.17.15-1.el8.x86_64. +## demo + +usercase 文件夹下 +- `helloworld.c`: 测试大量变量监控 +- `userstack.c`: 测试用户态堆栈输出 +- `hptest.c`: 测试 hugePage 挂载 + +## 其他 -## Other +程序分为两部分: 字符设备 和 用户空间接口, 两者通过 ioctl 通信. -The program is divided into two parts: character device and user space interface, both of which communicate through ioctl. +用户空间地址访问 +- 用户程序传入的变量 虚拟地址, 使用 `get_user_pages_remote` 获取地址所在内存页, `kmap` 将其映射到内核. + - 192.168.40.204 环境下,HugeTLB Pages 测试挂载正常. +- 内存页地址 + 偏移量存入定时器对应的 `kernel_watch_arg` 中, hrTimer 轮询时访问 `kernel_watch_arg` 得到真实值. -User space address access -- The variable virtual address passed in by the user program, use `get_user_pages_remote` to obtain the memory page where the address is located, and `kmap` maps it to the kernel. - - In the 192.168.40.204 environment, the HugeTLB Pages test mounts normally. -- The memory page address + offset is stored in the `kernel_watch_arg` corresponding to the timer, and hrTimer accesses `kernel_watch_arg` when polling to get the real value. +定时器分组 +- hrTimer 数据结构定义在全局数组 `kernel_wtimer_list`.分配定时器时,会检查遍历 `kernel_wtimer_list` 比较定时器间隔, +- 相同定时间隔的 watch 分配到同一组,对应同一个 hrTimer. +- 若一个定时器监控变量数量超过 `TIMER_MAX_WATCH_NUM` (32),则会创建一个新的 hrTimer. +- hrTimer 的总数量(`kernel_wtimer_list` 数组长度)限制是 `MAX_TIMER_NUM`(128). -timer grouping -- The hrTimer data structure is defined in the global array `kernel_wtimer_list`. When allocating a timer, it will check the traversal `kernel_wtimer_list` to compare the timer interval. -- Watches with the same timing interval are assigned to the same group and correspond to the same hrTimer. -- If the number of variables monitored by a timer exceeds `TIMER_MAX_WATCH_NUM` (32), a new hrTimer will be created. -- The total number of hrTimers (`kernel_wtimer_list` array length) limit is `MAX_TIMER_NUM`(128). +内存页 mount/unmount +- `get_user_pages_remote`/ `kmap` 会增加对应的计数,需要对等的 `put_page`/`kunmap`. +- 一个模块内全局链表 `watch_local_memory_list` 存储每一个成功挂载的变量对应的 page 和 kt,执行字符设备的 close 操作时,遍历并卸载. -Memory page mount/unmount -- `get_user_pages_remote`/ `kmap` will increase the corresponding count and requires the equivalent `put_page`/`kunmap`. -- A global linked list in the module `watch_local_memory_list` stores the page and kt corresponding to each successfully mounted variable. When performing the close operation of the character device, it is traversed and unloaded. +variable monitor 添加/删除 +- kernel_watch_arg 数据结构中有 pid 的成员变量,但添加变量监控时,不按照进程区分. +- 删除时遍历全部监控变量,比较 pid. +- 删除造成的缺位,将最后的变量移动到空位, sentinel--; hrTimer 同理. -Stack output conditions: The conditions are referenced from [diagnose-tools::load.c](https://github.com/alibaba/diagnose-tools/blob/e285bc4626a7d207eabd4a69cb276e1a3b1b7c76/SOURCE/module/kernel/load.c#L209) -- `TASK` must satisfy TASK_RUNNING and `__task_contributes_to_load`. -- `__task_contributes_to_load` corresponds to the kernel macro `task_contributes_to_loa`. +堆栈输出条件: 条件参考自 [diagnose-tools::load.c](https://github.com/alibaba/diagnose-tools/blob/e285bc4626a7d207eabd4a69cb276e1a3b1b7c76/SOURCE/module/kernel/load.c#L209) +- `TASK` 要满足 TASK_RUNNING 和 `__task_contributes_to_load` 和 `TASK_IDLE`(可能有阻塞进程). +- `__task_contributes_to_load` 对应内核宏 `task_contributes_to_loa`. ```c // https://www.spinics.net/lists/kernel/msg3582022.html |
