summaryrefslogtreecommitdiff
path: root/README_en.md
blob: 409f05b2ba9af92c65928e772fc3d7340938fc0f (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
## Variable Monitor

Monitor numerical variables (given address, length), and print system stack information when the set conditions are exceeded.

Number of simultaneous monitoring
- Monitoring with the same timing length will be grouped into one group, corresponding to one timer.
- A set of up to 32 variables, after which a new timer is allocated.
- The global maximum number of timers is 128.
- The above quantity limit is defined in the `watch_module.h` header macro.

Currently, monitoring is limited to the same application, and simultaneous calls from multiple applications are not currently supported.
- Multiple applications can work normally if only one program calls `cancel_all_watch();`.

## Usage

Example: helloworld.c
- Add `#include "watch.h"`
- Set each variable that needs to be monitored: name && address && length, set threshold, comparison method, timer interval (ns), etc.
- `start_watch(watch_arg);` Start monitoring
- Call `cancel_all_watch();` when you need to cancel monitoring

When the set conditions are exceeded, the system stack information is printed and viewed with `dmesg`, as shown in the following example:
- Within a timer, if multiple variables exceed the threshold, the stack information will not be output repeatedly;
- The timer restart time after printing the stack is 1s, and the next round of monitoring will start after 1s.

```log
[86245.364861] -------------------------------------
[86245.364864] -------------watch monitor-----------
[86245.364865] Threshold reached:
                 name: temp0, threshold: 150 
[86245.364866] Timestamp (ns): 1699589000606300743
[86245.364867] Recent Load: 116.65, 126.83, 151.17
[86245.365669] task: name lcore-worker-4, pid 803327
[86245.365672] task: name lcore-worker-5, pid 803328
[86245.365673] task: name lcore-worker-6, pid 803329
[86245.365674] task: name lcore-worker-7, pid 803330
[86245.365676] task: name lcore-worker-8, pid 803331
[86245.365677] task: name lcore-worker-9, pid 803332
[86245.365679] task: name lcore-worker-10, pid 803333
[86245.365681] task: name lcore-worker-11, pid 803334
[86245.365682] task: name lcore-worker-68, pid 803335
[86245.365683] task: name lcore-worker-69, pid 803336
[86245.365684] task: name lcore-worker-70, pid 803337
[86245.365685] task: name lcore-worker-71, pid 803338
[86245.365686] task: name lcore-worker-72, pid 803339
[86245.365687] task: name lcore-worker-73, pid 803340
[86245.365688] task: name lcore-worker-74, pid 803341
[86245.365689] task: name lcore-worker-75, pid 803342
[86245.365694] task: name pkt:worker-0, pid 803638
[86245.365702]  hrtimer_nanosleep+0x8d/0x120
[86245.365709]  __x64_sys_nanosleep+0x96/0xd0
[86245.365711]  do_syscall_64+0x37/0x80
[86245.365716]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[86245.365718] task: name pkt:worker-1, pid 803639
[86245.365721]  hrtimer_nanosleep+0x8d/0x120
[86245.365724]  __x64_sys_nanosleep+0x96/0xd0
[86245.365726]  do_syscall_64+0x37/0x80
[86245.365728]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[86245.365730] task: name pkt:worker-2, pid 803640
[86245.365732]  hrtimer_nanosleep+0x8d/0x120
[86245.365734]  __x64_sys_nanosleep+0x96/0xd0
[86245.365737]  do_syscall_64+0x37/0x80
[86245.365739]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[86245.365740] task: name pkt:worker-3, pid 803641
[86245.365743]  hrtimer_nanosleep+0x8d/0x120
```

### Parameter Description

start_watch passes in the watch_arg structure. The meaning of each field is as follows
- name limit `MAX_NAME_LEN`(15) valid characters

```c
typedef struct
{
    pid_t task_id;               // current process id
    char name[MAX_NAME_LEN + 1]; // name (15+1)
    void *ptr;                   // virtual address
    int length_byte;             // byte
    long long threshold;         // threshold value
    unsigned char unsigned_flag; // unsigned flag (true: unsigned, false: signed)
    unsigned char greater_flag;  // reverse flag (true: >, false: <)
    unsigned long time_ns;       // timer interval (ns)
} watch_arg;
```

An initialization example

```c
watch_args = (watch_arg){
    .task_id = getpid(),
    .ptr = &temp,
    .name = "temp",
    .length_byte = sizeof(int),
    .threshold = 150 + i,
    .unsigned_flag = 0,
    .greater_flag = 1,
    .time_ns = 2000 + (i / 33) * 5000
};
```

## demo

In the main project directory:

```bash
make && insmod watch_module.ko
./watch
```

You can see the printed stack information in dmesg

```bash
# Unload module and clean compile files
rmmod watch_module.ko && make clean
```

Only tested on kernel 5.17.15-1.el8.x86_64.

## Other

The program is divided into two parts: character device and user space interface, both of which communicate through ioctl.

User space address access
- The variable virtual address passed in by the user program, use `get_user_pages_remote` to obtain the memory page where the address is located, and `kmap` maps it to the kernel.
     - In the 192.168.40.204 environment, the HugeTLB Pages test mounts normally.
- The memory page address + offset is stored in the `kernel_watch_arg` corresponding to the timer, and hrTimer accesses `kernel_watch_arg` when polling to get the real value.

timer grouping
- The hrTimer data structure is defined in the global array `kernel_wtimer_list`. When allocating a timer, it will check the traversal `kernel_wtimer_list` to compare the timer interval.
- Watches with the same timing interval are assigned to the same group and correspond to the same hrTimer.
- If the number of variables monitored by a timer exceeds `TIMER_MAX_WATCH_NUM` (32), a new hrTimer will be created.
- The total number of hrTimers (`kernel_wtimer_list` array length) limit is `MAX_TIMER_NUM`(128).

Memory page mount/unmount
- `get_user_pages_remote`/ `kmap` will increase the corresponding count and requires the equivalent `put_page`/`kunmap`.
- A global linked list in the module `watch_local_memory_list` stores the page and kt corresponding to each successfully mounted variable. When performing the close operation of the character device, it is traversed and unloaded.

Stack output conditions: The conditions are referenced from [diagnose-tools::load.c](https://github.com/alibaba/diagnose-tools/blob/e285bc4626a7d207eabd4a69cb276e1a3b1b7c76/SOURCE/module/kernel/load.c#L209)
- `TASK` must satisfy TASK_RUNNING and `__task_contributes_to_load`.
- `__task_contributes_to_load` corresponds to the kernel macro `task_contributes_to_loa`.

```c
// https://www.spinics.net/lists/kernel/msg3582022.html
// remove from 5.8.rc3,but it still work
// whether the task contributes to the load
#define __task_contributes_to_load(task)                                                                               \
    ((READ_ONCE(task->__state) & TASK_UNINTERRUPTIBLE) != 0 && (task->flags & PF_FROZEN) == 0 &&                       \
     (READ_ONCE(task->__state) & TASK_NOLOAD) == 0)
```