1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
|
# UDTF
> The functions for table processors.
## Function of content
- [UNROLL](#unroll)
- [JSON_UNROLL](#json_unroll)
## Description
The UDTFs(user-defined table functions) are used to process the data from source to sink. It is a part of the processing pipeline. It can be used in the pre-processing, processing, and post-processing pipeline. Each processor can assemble UDTFs into a pipeline. Within the pipeline, events are processed by each Function in order, top‑>down.
Unlike scalar functions, which return a single value, UDTFs are particularly useful when you need to explode or unroll data, transforming a single input row into multiple output rows.
## UDTF Definition
The UDTFs and UDFs share similar input and context structures, please refer to [UDF](udf.md).
## Functions
### UNROLL
The Unroll Function handles an array field—or an expression evaluating to an array—and unrolls it into individual events.
```UNROLL(filter, lookup_fields, output_fields[, parameters])```
- filter: optional
- lookup_fields: required
- output_fields: required
- parameters: optional
- regex: `<String>` optional. If lookup_fields is a string, the regex parameter is used to split the string into an array. The default value is a comma.
Example
```yaml
functions:
- function: UNROLL
lookup_fields: [ monitor_rule_list ]
output_fields: [ monitor_rule ]
```
### JSON_UNROLL
The JSON Unroll Function handles a JSON object, unrolls/explodes an array of objects therein into individual events, while also inheriting top level fields.
```JSON_UNROLL(filter, lookup_fields, output_fields[, parameters])```
- filter: optional
- lookup_fields: required
- output_fields: required
- parameters: optional
- path: `<String>` optional. Path to array to unroll, default is the root of the JSON object.
- new_path: `<String>` optional. Rename path to new_path, default is the same as path.
Example
```yaml
functions:
- function: JSON_UNROLL
lookup_fields: [ device_tag ]
output_fields: [ device_tag ]
parameters:
- path: tags
- new_path: tag
```
### Path Unroll
The PATH_UNROLL function processes a given file path, breaking it down into individual steps and transforming each step into a separate event while retaining top-level fields. At the final level, it outputs both the full file path and the file name.
```PATH_UNROLL(filter, lookup_fields, output_fields[, parameters])```
- filter: optional
- lookup_fields: required
- output_fields: required
- parameters: optional
- separator: <String> optional. The delimiter used to split the path. Default is `/`.
Example Usage:
```yaml
- function: PATH_UNROLL
lookup_fields: [ decoded_path, app]
output_fields: [ protocol_stack_id, app_name ]
parameters:
separator: "."
```
Input:
```json
{"decoded_path":"ETHERNET.IPv4.TCP.ssl","app":"wechat"}
```
When the input is processed, the following events are generated:
```
#Event1: {"protocol_stack_id":"ETHERNET"}
#Event2: {"protocol_stack_id":"ETHERNET.IPv4"}
#Event3: {"protocol_stack_id":"ETHERNET.IPv4.TCP"}
#Event4: {"protocol_stack_id":"ETHERNET.IPv4.TCP.ssl"}
#Event5: {"app_name":"wechat","protocol_stack_id":"ETHERNET.IPv4.TCP.ssl.wechat"}
```
If decoded_path contains app value of `ETHERNET.IPv4.TCP.ssl`, the output will be as follows:
```json
{"decoded_path":"ETHERNET.IPv4.TCP.ssl","app":"ssl"}
```
In this case, the output will be:
```
#Event1: {"protocol_stack_id":"ETHERNET"}
#Event2: {"protocol_stack_id":"ETHERNET.IPv4"}
#Event3: {"protocol_stack_id":"ETHERNET.IPv4.TCP"}
#Event4: {"protocol_stack_id":"ETHERNET.IPv4.TCP.ssl", "app_name":"ssl"}
```
|