update readme

author: zhangyang <[email protected]> 2023-11-10 06:00:24 +0000
committer: zhangyang <[email protected]> 2023-11-10 06:00:24 +0000
commit: d0a3df1e25969f90f5395bdfac0032242e281ac6 (patch)
tree: faa45fb8a27dc1a9c3819be290c5ed297fd7b8de
parent: 29517e160cac927a74bbd9880baeb06ce3d04a69 (diff)
2 files changed, 182 insertions, 60 deletions
diff --git a/README.md b/README.md
index cb3f665..d8ec277 100644
--- a/README.md
+++ b/README.md
@@ -1,64 +1,101 @@
 ## watch_monitor
 
-Monitor the given address and length, and print the system stack information when exceeding the set condition.
+Monitor numerical variables (given address, length), and print system stack information when the set conditions are exceeded.
+
+Number of simultaneous monitoring
+- Monitoring with the same timing length will be grouped into one group, corresponding to one timer.
+- A set of up to 32 variables, after which a new timer is allocated.
+- The global maximum number of timers is 128.
+- The above quantity limit is defined in the `watch_module.h` header macro.
+
+Currently, monitoring is limited to the same application, and simultaneous calls from multiple applications are not currently supported.
+- Multiple applications can work normally if only one program calls `cancel_all_watch();`.
 
 ## Usage
 
-See example in helloworld.c
-- Add #include "watch.h"
-- Pass in the address and length to monitor, set the threshold, compare way, timer interval (ns) etc.
-- start_watch(watch_arg); to start monitoring
-- Call cancel_watch(); when need to cancel monitoring
+Example: helloworld.c
+- Add `#include "watch.h"`
+- Set each variable that needs to be monitored: name && address && length, set threshold, comparison method, timer interval (ns), etc.
+- `start_watch(watch_arg);` Start monitoring
+- Call `cancel_all_watch();` when you need to cancel monitoring
 
-When exceeding the set condition, print system stack information, check with dmesg, see example below:
-- After printing stack, the timer will restart after 1s, and start next round of monitoring after 1s.
+When the set conditions are exceeded, the system stack information is printed and viewed with `dmesg`, as shown in the following example:
+- Within a timer, if multiple variables exceed the threshold, the stack information will not be output repeatedly;
+- The timer restart time after printing the stack is 1s, and the next round of monitoring will start after 1s.
 
 ```log
-[ 1144.582249] -------------------------------------
-[ 1144.582250] -------------watch monitor-----------
-[ 1144.582251] Timestamp (us): 1699411664908117
-[ 1144.582253] Recent Load: 0.32, 0.26, 0.26
-[ 1144.582255] Running task
-[ 1144.582342] Uninterrupted task
-[ 1144.582343] uninterrupted task: name rcu_gp, pid 3
-[ 1144.582352]  rescuer_thread+0x290/0x390
-[ 1144.582360]  kthread+0xd7/0x100
-[ 1144.582363]  ret_from_fork+0x1f/0x30
-[ 1144.582367] uninterrupted task: name rcu_par_gp, pid 4
-[ 1144.582369]  rescuer_thread+0x290/0x390
-[ 1144.582371]  kthread+0xd7/0x100
-[ 1144.582373]  ret_from_fork+0x1f/0x30
+[86245.364861] -------------------------------------
+[86245.364864] -------------watch monitor-----------
+[86245.364865] Threshold reached:
+                 name: temp0, threshold: 150 
+[86245.364866] Timestamp (ns): 1699589000606300743
+[86245.364867] Recent Load: 116.65, 126.83, 151.17
+[86245.365669] task: name lcore-worker-4, pid 803327
+[86245.365672] task: name lcore-worker-5, pid 803328
+[86245.365673] task: name lcore-worker-6, pid 803329
+[86245.365674] task: name lcore-worker-7, pid 803330
+[86245.365676] task: name lcore-worker-8, pid 803331
+[86245.365677] task: name lcore-worker-9, pid 803332
+[86245.365679] task: name lcore-worker-10, pid 803333
+[86245.365681] task: name lcore-worker-11, pid 803334
+[86245.365682] task: name lcore-worker-68, pid 803335
+[86245.365683] task: name lcore-worker-69, pid 803336
+[86245.365684] task: name lcore-worker-70, pid 803337
+[86245.365685] task: name lcore-worker-71, pid 803338
+[86245.365686] task: name lcore-worker-72, pid 803339
+[86245.365687] task: name lcore-worker-73, pid 803340
+[86245.365688] task: name lcore-worker-74, pid 803341
+[86245.365689] task: name lcore-worker-75, pid 803342
+[86245.365694] task: name pkt:worker-0, pid 803638
+[86245.365702]  hrtimer_nanosleep+0x8d/0x120
+[86245.365709]  __x64_sys_nanosleep+0x96/0xd0
+[86245.365711]  do_syscall_64+0x37/0x80
+[86245.365716]  entry_SYSCALL_64_after_hwframe+0x44/0xae
+[86245.365718] task: name pkt:worker-1, pid 803639
+[86245.365721]  hrtimer_nanosleep+0x8d/0x120
+[86245.365724]  __x64_sys_nanosleep+0x96/0xd0
+[86245.365726]  do_syscall_64+0x37/0x80
+[86245.365728]  entry_SYSCALL_64_after_hwframe+0x44/0xae
+[86245.365730] task: name pkt:worker-2, pid 803640
+[86245.365732]  hrtimer_nanosleep+0x8d/0x120
+[86245.365734]  __x64_sys_nanosleep+0x96/0xd0
+[86245.365737]  do_syscall_64+0x37/0x80
+[86245.365739]  entry_SYSCALL_64_after_hwframe+0x44/0xae
+[86245.365740] task: name pkt:worker-3, pid 803641
+[86245.365743]  hrtimer_nanosleep+0x8d/0x120
 ```
 
 ### Parameter Description
 
-start_watch passes in the watch_arg structure. The meanings of each field are as follows:
+start_watch passes in the watch_arg structure. The meaning of each field is as follows
+- name limit `MAX_NAME_LEN`(15) valid characters
 
 ```c
 typedef struct
 {
     pid_t task_id;               // current process id
+    char name[MAX_NAME_LEN + 1]; // name (15+1)
     void *ptr;                   // virtual address
-    int length_byte;              // length_byte
+    int length_byte;             // byte
     long long threshold;         // threshold value
     unsigned char unsigned_flag; // unsigned flag (true: unsigned, false: signed)
     unsigned char greater_flag;  // reverse flag (true: >, false: <)
-    unsigned long time;          // timer interval (ns)
+    unsigned long time_ns;       // timer interval (ns)
 } watch_arg;
 ```
 
-An initialization example:
-- When testing in hyper-v virtual machine, setting any value less than 1000ns for hrTimer will freeze the system, so unable to test ns level timing;
+An initialization example
 
 ```c
-watch_arg watch_arg = {
+watch_args = (watch_arg){
     .task_id = getpid(),
     .ptr = &temp,
+    .name = "temp",
     .length_byte = sizeof(int),
-    .threshold = 105,
-    .unsigned_flag = 1,
+    .threshold = 150 + i,
+    .unsigned_flag = 0,
     .greater_flag = 1,
-    .time = 2, // on hyper-v, 1us will block all system. 2us just fine, maybe 1us is too short for hyper-v
+    .time_ns = 2000 + (i / 33) * 5000
 };
 ```
 
@@ -82,8 +119,32 @@ Only tested on kernel 5.17.15-1.el8.x86_64.
 
 ## Other
 
-The program is divided into two parts: character device and user space interface, communicating via ioctl.
+The program is divided into two parts: character device and user space interface, both of which communicate through ioctl.
 
 User space address access
-- The user program passes in a virtual address, get_user_pages_remote is used to obtain the memory page where the address is located, and kmap maps it to the kernel.
-- The memory page address + offset is stored in the global variable k_watch_arg, and hrTimer polls to access k_watch_arg to get the real value.
+- The variable virtual address passed in by the user program, use `get_user_pages_remote` to obtain the memory page where the address is located, and `kmap` maps it to the kernel.
+     - In the 192.168.40.204 environment, the HugeTLB Pages test mounts normally.
+- The memory page address + offset is stored in the `kernel_watch_arg` corresponding to the timer, and hrTimer accesses `kernel_watch_arg` when polling to get the real value.
+
+timer grouping
+- The hrTimer data structure is defined in the global array `kernel_wtimer_list`. When allocating a timer, it will check the traversal `kernel_wtimer_list` to compare the timer interval.
+- Watches with the same timing interval are assigned to the same group and correspond to the same hrTimer.
+- If the number of variables monitored by a timer exceeds `TIMER_MAX_WATCH_NUM` (32), a new hrTimer will be created.
+- The total number of hrTimers (`kernel_wtimer_list` array length) limit is `MAX_TIMER_NUM`(128).
+
+Memory page mount/unmount
+- `get_user_pages_remote`/ `kmap` will increase the corresponding count and requires the equivalent `put_page`/`kunmap`.
+- A global linked list in the module `watch_local_memory_list` stores the page and kt corresponding to each successfully mounted variable. When performing the close operation of the character device, it is traversed and unloaded.
+
+Stack output conditions: The conditions are referenced from [diagnose-tools::load.c](https://github.com/alibaba/diagnose-tools/blob/e285bc4626a7d207eabd4a69cb276e1a3b1b7c76/SOURCE/module/kernel/load.c#L209)
+- `TASK` must satisfy TASK_RUNNING and `__task_contributes_to_load`.
+- `__task_contributes_to_load` corresponds to the kernel macro `task_contributes_to_loa`.
+
+```c
+// https://www.spinics.net/lists/kernel/msg3582022.html
+// remove from 5.8.rc3,but it still work
+// whether the task contributes to the load
+#define __task_contributes_to_load(task)                                                                               \
+    ((READ_ONCE(task->__state) & TASK_UNINTERRUPTIBLE) != 0 && (task->flags & PF_FROZEN) == 0 &&                       \
+     (READ_ONCE(task->__state) & TASK_NOLOAD) == 0)
+```
+\ No newline at end of file
diff --git a/README_zh.md b/README_zh.md
index 0cb2f80..f2cf4ca 100644
--- a/README_zh.md
+++ b/README_zh.md
@@ -1,64 +1,101 @@
 ## watch_monitor
 
-监控给定 地址和长度, 超过设定条件打印系统堆栈信息.
+监控 数值变量(给定 地址,长度), 超过设定条件打印系统堆栈信息.
+
+同时监控数量
+- 相同定时长度的监控 会被分为一组,对应一个定时器. 
+- 一组最多 32 个变量,超过后会分配一个新的定时器.
+- 定时器数量全局最多 128 个.
+- 以上数量限制定义在 `watch_module.h` 头部宏.
+
+目前限制监控 在同一个应用程序下,暂不支持多个应用程序同时调用.
+- 多个应用程序如果只有一个程序调用 `cancel_all_watch();`, 那么也可以正常工作.
 
 ## 使用
 
 示例如 helloworld.c
 - 添加 `#include "watch.h"`
-- 传入需要监控的变量地址 && 长度, 设置阈值, 比较方式, 定时器间隔(us) 等.
+- 对每个需要监控的变量 设置: 名称 && 地址 && 长度, 设置阈值, 比较方式, 定时器间隔(ns) 等.
 - `start_watch(watch_arg);` 启动监控
-- 需要取消监控时调用 `cancel_watch();` 
+- 需要取消监控时调用 `cancel_all_watch();` 
 
 超出设定条件时,打印系统堆栈信息, `dmesg` 查看,如下示例:
+- 一个定时器内,多个变量超过阈值,堆栈信息不会重复输出;
 - 打印堆栈后定时器再启动时间为 1s, 1s 后开始下一个轮次监控.
 
 ```log
-[ 1144.582249] -------------------------------------
-[ 1144.582250] -------------watch monitor-----------
-[ 1144.582251] Timestamp (us): 1699411664908117
-[ 1144.582253] Recent Load: 0.32, 0.26, 0.26
-[ 1144.582255] Running task
-[ 1144.582342] Uninterrupted task
-[ 1144.582343] uninterrupted task: name rcu_gp, pid 3
-[ 1144.582352]  rescuer_thread+0x290/0x390
-[ 1144.582360]  kthread+0xd7/0x100
-[ 1144.582363]  ret_from_fork+0x1f/0x30
-[ 1144.582367] uninterrupted task: name rcu_par_gp, pid 4
-[ 1144.582369]  rescuer_thread+0x290/0x390
-[ 1144.582371]  kthread+0xd7/0x100
-[ 1144.582373]  ret_from_fork+0x1f/0x30
+[86245.364861] -------------------------------------
+[86245.364864] -------------watch monitor-----------
+[86245.364865] Threshold reached:
+                 name: temp0, threshold: 150 
+[86245.364866] Timestamp (ns): 1699589000606300743
+[86245.364867] Recent Load: 116.65, 126.83, 151.17
+[86245.365669] task: name lcore-worker-4, pid 803327
+[86245.365672] task: name lcore-worker-5, pid 803328
+[86245.365673] task: name lcore-worker-6, pid 803329
+[86245.365674] task: name lcore-worker-7, pid 803330
+[86245.365676] task: name lcore-worker-8, pid 803331
+[86245.365677] task: name lcore-worker-9, pid 803332
+[86245.365679] task: name lcore-worker-10, pid 803333
+[86245.365681] task: name lcore-worker-11, pid 803334
+[86245.365682] task: name lcore-worker-68, pid 803335
+[86245.365683] task: name lcore-worker-69, pid 803336
+[86245.365684] task: name lcore-worker-70, pid 803337
+[86245.365685] task: name lcore-worker-71, pid 803338
+[86245.365686] task: name lcore-worker-72, pid 803339
+[86245.365687] task: name lcore-worker-73, pid 803340
+[86245.365688] task: name lcore-worker-74, pid 803341
+[86245.365689] task: name lcore-worker-75, pid 803342
+[86245.365694] task: name pkt:worker-0, pid 803638
+[86245.365702]  hrtimer_nanosleep+0x8d/0x120
+[86245.365709]  __x64_sys_nanosleep+0x96/0xd0
+[86245.365711]  do_syscall_64+0x37/0x80
+[86245.365716]  entry_SYSCALL_64_after_hwframe+0x44/0xae
+[86245.365718] task: name pkt:worker-1, pid 803639
+[86245.365721]  hrtimer_nanosleep+0x8d/0x120
+[86245.365724]  __x64_sys_nanosleep+0x96/0xd0
+[86245.365726]  do_syscall_64+0x37/0x80
+[86245.365728]  entry_SYSCALL_64_after_hwframe+0x44/0xae
+[86245.365730] task: name pkt:worker-2, pid 803640
+[86245.365732]  hrtimer_nanosleep+0x8d/0x120
+[86245.365734]  __x64_sys_nanosleep+0x96/0xd0
+[86245.365737]  do_syscall_64+0x37/0x80
+[86245.365739]  entry_SYSCALL_64_after_hwframe+0x44/0xae
+[86245.365740] task: name pkt:worker-3, pid 803641
+[86245.365743]  hrtimer_nanosleep+0x8d/0x120
 ```
 
 ### 参数说明
 
 start_watch 传入的是 watch_arg 结构体.各个字段意义如下
+- name 限制 `MAX_NAME_LEN`(15) 个有效字符
 
 ```c
 typedef struct
 {
     pid_t task_id;               // current process id
+    char name[MAX_NAME_LEN + 1]; // name (15+1)
     void *ptr;                   // virtual address
-    int length_byte;              // length_byte
+    int length_byte;             // byte
     long long threshold;         // threshold value
     unsigned char unsigned_flag; // unsigned flag (true: unsigned, false: signed)
     unsigned char greater_flag;  // reverse flag (true: >, false: <)
-    unsigned long time;          // timer interval (us)
+    unsigned long time_ns;       // timer interval (ns)
 } watch_arg;
 ```
 
 一个初始化示例
-- 测试环境为 hyper-v 虚拟机时,直接为 hrTimer 设置任何小于 1000ns 的值,系统卡死,故未能测试 ns 级别定时;
 
 ```c
-watch_arg watch_arg = {
+watch_args = (watch_arg){
     .task_id = getpid(),
     .ptr = &temp,
+    .name = "temp",
     .length_byte = sizeof(int),
-    .threshold = 105,
-    .unsigned_flag = 1,
+    .threshold = 150 + i,
+    .unsigned_flag = 0,
     .greater_flag = 1,
-    .time = 2, // on hyper-v, 1us will block all system. 2us just fine, maybe 1us is too short for hyper-v
+    .time_ns = 2000 + (i / 33) * 5000
 };
 ```
 
@@ -86,5 +123,29 @@ rmmod watch_module.ko && make clean
 程序分为两部分: 字符设备 和 用户空间接口, 两者通过 ioctl 通信.
 
 用户空间地址访问
-- 用户程序传入了 虚拟地址, 使用 `get_user_pages_remote` 获取地址所在内存页, `kmap` 将其映射到内核.
-- 内存页地址 + 偏移量存入全局变量 `k_watch_arg` 中, hrTimer 轮询时访问 `k_watch_arg` 得到真实值.
-\ No newline at end of file
+- 用户程序传入的变量 虚拟地址, 使用 `get_user_pages_remote` 获取地址所在内存页, `kmap` 将其映射到内核.
+    - 192.168.40.204 环境下,HugeTLB Pages 测试挂载正常.
+- 内存页地址 + 偏移量存入定时器对应的 `kernel_watch_arg` 中, hrTimer 轮询时访问 `kernel_watch_arg` 得到真实值.
+
+定时器分组
+- hrTimer 数据结构定义在全局数组 `kernel_wtimer_list`.分配定时器时,会检查遍历 `kernel_wtimer_list` 比较定时器间隔,
+- 相同定时间隔的 watch 分配到同一组,对应同一个 hrTimer. 
+- 若一个定时器监控变量数量超过 `TIMER_MAX_WATCH_NUM` (32),则会创建一个新的 hrTimer.
+- hrTimer 的总数量(`kernel_wtimer_list` 数组长度)限制是 `MAX_TIMER_NUM`(128).
+
+内存页 mount/unmount
+- `get_user_pages_remote`/ `kmap` 会增加对应的计数,需要对等的 `put_page`/`kunmap`.
+- 一个模块内全局链表 `watch_local_memory_list` 存储每一个成功挂载的变量对应的 page 和 kt,执行字符设备的 close 操作时,遍历并卸载.
+
+堆栈输出条件: 条件参考自  [diagnose-tools::load.c](https://github.com/alibaba/diagnose-tools/blob/e285bc4626a7d207eabd4a69cb276e1a3b1b7c76/SOURCE/module/kernel/load.c#L209)
+- `TASK` 要满足 TASK_RUNNING 和 `__task_contributes_to_load`.
+- `__task_contributes_to_load` 对应内核宏 `task_contributes_to_loa`.
+
+```c
+// https://www.spinics.net/lists/kernel/msg3582022.html
+// remove from 5.8.rc3,but it still work
+// whether the task contributes to the load
+#define __task_contributes_to_load(task)                                                                               \
+    ((READ_ONCE(task->__state) & TASK_UNINTERRUPTIBLE) != 0 && (task->flags & PF_FROZEN) == 0 &&                       \
+     (READ_ONCE(task->__state) & TASK_NOLOAD) == 0)
+```
+\ No newline at end of file
author	zhangyang <[email protected]>	2023-11-10 06:00:24 +0000
committer	zhangyang <[email protected]>	2023-11-10 06:00:24 +0000
commit	d0a3df1e25969f90f5395bdfac0032242e281ac6 (patch)
tree	faa45fb8a27dc1a9c3819be290c5ed297fd7b8de
parent	29517e160cac927a74bbd9880baeb06ce3d04a69 (diff)