[Improve][e2e-base-test] Integrate multiple types of processors into the test topology.

author: doufenghu <[email protected]> 2024-10-20 11:46:32 +0800
committer: doufenghu <[email protected]> 2024-10-20 11:46:32 +0800
commit: 031224fe43961cd1df2c7b0239c6f813e765c105 (patch)
tree: 3f8736907c1e98d0475ca5bdfcd7a33a21b1df20 /docs/processor
parent: 9f51ce8d96879aa5c383ac34bac543ad6fe3ed44 (diff)
3 files changed, 77 insertions, 26 deletions
diff --git a/docs/processor/udaf.md b/docs/processor/udaf.md
index dd1dd70..66d6ad5 100644
--- a/docs/processor/udaf.md
+++ b/docs/processor/udaf.md
@@ -41,7 +41,7 @@ COLLECT_LIST is used to collect the value of the field in the group of events.
 - lookup_fields: required. Now only support one field.
 - output_fields: optional. If not set, the output field name is `lookup_field_name`.
 
-### Example
+Example: 
 
 ```yaml
 - function: COLLECT_LIST
@@ -59,7 +59,7 @@ COLLECT_SET is used to collect the unique value of the field in the group of eve
 - lookup_fields: required. Now only support one field.
 - output_fields: optional. If not set, the output field name is `lookup_field_name`.
 
-### Example
+Example
 
 ```yaml
 - function: COLLECT_SET
@@ -76,7 +76,7 @@ FIRST_VALUE is used to get the first value of the field in the group of events.
 - lookup_fields: required. Now only support one field.
 - output_fields: optional. If not set, the output field name is `lookup_field_name`.
 
-### Example
+Example
 
 ```yaml
 - function: FIRST_VALUE
@@ -92,7 +92,7 @@ LAST_VALUE is used to get the last value of the field in the group of events.
 - lookup_fields: required. Now only support one field.
 - output_fields: optional. If not set, the output field name is `lookup_field_name`.
 
-### Example
+Example
 
 ```yaml
 - function: LAST_VALUE
@@ -109,7 +109,7 @@ LONG_COUNT is used to count the number of events in the group of events.
 - lookup_fields: optional.
 - output_fields: required.
 
-### Example
+Example
 
 ```yaml
 - function: LONG_COUNT
@@ -127,7 +127,7 @@ MEAN is used to calculate the mean value of the field in the group of events. Th
 - parameters: optional.
     - precision: `<Integer>` required. The precision of the mean value. Default is 2.
 
-### Example
+Example
 
 ```yaml
 - function: MEAN
@@ -144,7 +144,7 @@ NUMBER_SUM is used to sum the value of the field in the group of events. The loo
 - lookup_fields: required. Now only support one field.
 - output_fields: optional. If not set, the output field name is `lookup_field_name`.
 
-### Example
+Example
 
 ```yaml
 - function: NUMBER_SUM
@@ -164,7 +164,8 @@ hlld is a high-performance C server which is used to expose HyperLogLog sets and
     - precision: `<Integer>` optional. The precision of the hlld value. Default is 12.
     - output_format: `<String>` optional. The output format can be either `base64(encoded string)` or `binary(byte[])`. The default is `base64`.
 
-### Example
+Example
+
   Merge multiple string field into a HyperLogLog data structure.
 ```yaml
   - function: HLLD
@@ -194,8 +195,8 @@ Approx Count Distinct HLLD is used to count the approximate number of distinct v
   - input_type: `<String>` optional. Refer to `HLLD` function.
   - precision: `<Integer>` optional. Refer to `HLLD` function.
 
-### Example
-    
+Example
+
 ```yaml
 - function: APPROX_COUNT_DISTINCT_HLLD
   lookup_fields: [client_ip]
@@ -228,8 +229,8 @@ A High Dynamic Range (HDR) Histogram. More details can be found in [HDR Histogra
   - autoResize: `<Boolean>` optional. If true, the highestTrackableValue will auto-resize. Default is true.
   - output_format: `<String>` optional. The output format can be either `base64(encoded string)` or `binary(byte[])`. The default is `base64`.
 
-### Example
-    
+Example
+
   ```yaml
   - function: HDR_HISTOGRAM
     lookup_fields: [latency_ms]
@@ -264,8 +265,8 @@ Approx Quantile HDR is used to calculate the approximate quantile value of the f
   - autoResize: `<Boolean>` optional. Refer to `HDR_HISTOGRAM` function.
   - probability: `<Double>` optional. The probability of the quantile. Default is 0.5.
 
-### Example
-  
+Example
+
   ```yaml
   - function: APPROX_QUANTILE_HDR
     lookup_fields: [latency_ms]
@@ -301,8 +302,8 @@ Approx Quantiles HDR is used to calculate the approximate quantile values of the
   - autoResize: `<Boolean>` optional. Refer to `HDR_HISTOGRAM` function.
   - probabilities: `<Array<Double>>` required. The list of probabilities of the quantiles. Range is 0 to 1.
 
-### Example
-    
+Example
+
 ```yaml
 - function: APPROX_QUANTILES_HDR
   lookup_fields: [latency_ms]
diff --git a/docs/processor/udf.md b/docs/processor/udf.md
index 3298374..9ba93e9 100644
--- a/docs/processor/udf.md
+++ b/docs/processor/udf.md
@@ -201,17 +201,18 @@ If the value of `direction` is `69`, the value of `internal_ip` will be `client_
 - function: EVAL
   output_fields: [internal_ip]
   parameters:
-    value_expression: 'direction=69 ? client_ip : server_ip'
+    value_expression: "direction=69 ? client_ip : server_ip"
 ```
 
 ### Flatten
 
-Flatten the fields of nested structure to the top level. The new fields name are named using the field name prefixed with the names of the struct fields to reach it, separated by dots as default.
+Flatten the fields of nested structure to the top level. The new fields name are named using the field name prefixed with the names of the struct fields to reach it, separated by dots as default.  The original fields will be removed.
 
 ```FLATTEN(filter, lookup_fields, output_fields[, parameters])```
+
 - filter: optional
 - lookup_fields: optional
-- output_fields: not required
+- output_fields: not required. 
 - parameters: optional
   - prefix: `<String>` optional. Prefix string for flattened field names. Default is empty.
   - depth: `<Integer>` optional. Number representing the nested levels to consider for flattening. Minimum 1. Default is `5`.
@@ -255,6 +256,7 @@ Output:
 From unix timestamp function is used to convert the unix timestamp to date time string. The default time zone is UTC+0.
 
 ```FROM_UNIX_TIMESTAMP(filter, lookup_fields, output_fields[, parameters])```
+
 - filter: optional
 - lookup_fields: required
 - output_fields: required
@@ -427,7 +429,7 @@ Remove the prefix "tags_" from the field names and rename the field "timestamp_m
 
 ```yaml
 - function: RENAME
-- parameters:
+  parameters:
     rename_fields:
       timestamp_ms: recv_time_ms
     rename_expression: key=string.replace_all(key,'tags_',''); return key;
@@ -440,7 +442,7 @@ Rename the field `client_ip` to `source_ip`, including the fields under the `enc
 
 ```yaml
 - function: RENAME
-- parameters:
+  parameters:
     parent_fields: [encapsulation.ipv4]
     rename_fields:
       client_ip: source_ip
@@ -509,7 +511,7 @@ Unix timestamp converter function is used to convert the unix timestamp precisio
 - parameters: required
   - precision: `<String>` required. Enum: `milliseconds`, `seconds`, `minutes`. The minutes precision is used to generate Unix timestamp, round it to the minute level, and output it in seconds format.
 - Example:
-_`__timestamp` Internal field, from source ingestion time or current unix timestamp.
+  `__timestamp` Internal field, from source ingestion time or current unix timestamp.
 
 ```yaml
 - function: UNIX_TIMESTAMP_CONVERTER
diff --git a/docs/processor/udtf.md b/docs/processor/udtf.md
index a6e8444..65a7840 100644
--- a/docs/processor/udtf.md
+++ b/docs/processor/udtf.md
@@ -29,8 +29,8 @@ The Unroll Function handles an array field—or an expression evaluating to an a
 - parameters: optional
   - regex: `<String>` optional. If lookup_fields is a string, the regex parameter is used to split the string into an array. The default value is a comma.
 
-#### Example
-    
+Example
+
 ```yaml
 functions:
   - function: UNROLL
@@ -50,8 +50,8 @@ The JSON Unroll Function handles a JSON object, unrolls/explodes an array of obj
   - path: `<String>` optional. Path to array to unroll, default is the root of the JSON object.
   - new_path: `<String>` optional. Rename path to new_path, default is the same as path.
 
-#### Example
-    
+Example
+
 ```yaml
 functions:
   - function: JSON_UNROLL
@@ -62,5 +62,53 @@ functions:
       - new_path: tag
 ```
 
+### Path Unroll
+
+The PATH_UNROLL function processes a given file path, breaking it down into individual steps and transforming each step into a separate event while retaining top-level fields. At the final level, it outputs both the full file path and the file name.
+
+```PATH_UNROLL(filter, lookup_fields, output_fields[, parameters])```
+
+- filter: optional
+- lookup_fields: required
+- output_fields: required
+- parameters: optional
+  - separator: <String> optional. The delimiter used to split the path. Default is `/`.
+
+Example Usage:
+
+```yaml
+- function: PATH_UNROLL
+  lookup_fields: [ decoded_path, app]
+  output_fields: [ protocol_stack_id, app_name ]
+  parameters:
+    separator: "."
+```
+Input:
+
+```json
+{"decoded_path":"ETHERNET.IPv4.TCP.ssl","app":"wechat"}
+```
+When the input is processed, the following events are generated:
+```
+  #Event1: {"protocol_stack_id":"ETHERNET"}
+  #Event2: {"protocol_stack_id":"ETHERNET.IPv4"}
+  #Event3: {"protocol_stack_id":"ETHERNET.IPv4.TCP"}
+  #Event4: {"protocol_stack_id":"ETHERNET.IPv4.TCP.ssl"}
+  #Event5: {"app_name":"wechat","protocol_stack_id":"ETHERNET.IPv4.TCP.ssl.wechat"}
+```
+
+If decoded_path contains app value of `ETHERNET.IPv4.TCP.ssl`, the output will be as follows:
+```json
+{"decoded_path":"ETHERNET.IPv4.TCP.ssl","app":"ssl"}
+```
+In this case, the output will be:
+```
+  #Event1: {"protocol_stack_id":"ETHERNET"}
+  #Event2: {"protocol_stack_id":"ETHERNET.IPv4"}
+  #Event3: {"protocol_stack_id":"ETHERNET.IPv4.TCP"}
+  #Event4: {"protocol_stack_id":"ETHERNET.IPv4.TCP.ssl", "app_name":"ssl"}
+```
+
+
author	doufenghu <[email protected]>	2024-10-20 11:46:32 +0800
committer	doufenghu <[email protected]>	2024-10-20 11:46:32 +0800
commit	031224fe43961cd1df2c7b0239c6f813e765c105 (patch)
tree	3f8736907c1e98d0475ca5bdfcd7a33a21b1df20 /docs/processor
parent	9f51ce8d96879aa5c383ac34bac543ad6fe3ed44 (diff)