Merge branch 'test/e2e-test-clickhouse' into 'develop'release/1.5.0-SNAPSHOT

Test/e2e test clickhouse See merge request galaxy/platform/groot-stream!94
author: 窦凤虎 <[email protected]> 2024-08-19 11:08:46 +0000
committer: 窦凤虎 <[email protected]> 2024-08-19 11:08:46 +0000
commit: 56b21d494bfa07012b1cc4e43dcb4ccdb6257d12 (patch)
tree: 0fa1a094dbb4f4703ecbf013c678b3bb485b385b /docs
parent: 07332297c1306aa0dac649c7d15bf131e8edbc7e (diff)
parent: 6564a5e9a43ecd88f5497e2b75a219a8a54101bb (diff)
6 files changed, 365 insertions, 38 deletions
diff --git a/docs/connector/connector.md b/docs/connector/connector.md
index 1123385..766b73e 100644
--- a/docs/connector/connector.md
+++ b/docs/connector/connector.md
@@ -85,41 +85,49 @@ schema:
 
 The mock data type is used to define the template of the mock data.
 
-|                Mock Type                |  Parameter  |      Result Type      |       Default       |                                                                 Description                                                                 |
-|-----------------------------------------|-------------|-----------------------|---------------------|---------------------------------------------------------------------------------------------------------------------------------------------|
-| **[Number](#Number)**                   | -           | **int/bigint/double** | -                   | **Randomly generate a number.**                                                                                                             |
-| -                                       | min         | number                | 0                   | The minimum value (include).                                                                                                                |
-| -                                       | max         | number                | int32.max           | The maximum value (exclusive).                                                                                                              |
-| -                                       | options     | array of number       | (none)              | The optional values. If set, the random value will be selected from the options and `start` and `end`  will be ignored.                     |
-| -                                       | random      | boolean               | true                | Default is random mode. If set to false, the value will be generated in order.                                                              |
-| **[Sequence](#Sequence)**               | -           | **bigint**            | -                   | **Generate a sequence number based on a specific step value .**                                                                             |
-| -                                       | start       | bigint                | 0                   | The first number in the sequence (include).                                                                                                 |
-| -                                       | step        | bigint                | 1                   | The number to add to each subsequent value.                                                                                                 |
-| **[UniqueSequence](#UniqueSequence)**   | -           | **bigint**            | -                   | **Generate a global unique sequence number.**                                                                                               |
-| -                                       | start       | bigint                | 0                   | The first number in the sequence (include).                                                                                                 |
-| **[String](#String)**                   | -           | string                | -                   | **Randomly generate a string.**                                                                                                             |
-| -                                       | regex       | string                | [a-zA-Z]{0,5}       | The regular expression.                                                                                                                     |
-| -                                       | options     | array of string       | (none)              | The optional values. If set, the random value will be selected from the options and `regex` will be ignored.                                |
-| -                                       | random      | boolean               | true                | Default is random mode. If set to false, the options value will be generated in order.                                                      |
-| **[Timestamp](#Timestamp)**             | -           | **bigint**            | -                   | **Generate a unix timestamp in milliseconds or seconds.**                                                                                   |
-| -                                       | unit        | string                | second              | The unit of the timestamp. The optional values are `second`, `millis`.                                                                      |
-| **[FormatTimestamp](#FormatTimestamp)** | -           | **string**            | -                   | **Generate a formatted timestamp.**                                                                                                         |
-| -                                       | format      | string                | yyyy-MM-dd HH:mm:ss | The format to output.                                                                                                                       |
-| -                                       | utc         | boolean               | false               | Default is local time. If set to true, the time will be converted to UTC time.                                                              |
-| **[IPv4](#IPv4)**                       | -           | **string**            | -                   | **Randomly generate a IPv4 address.**                                                                                                       |
-| -                                       | start       | string                | 0.0.0.0             | The minimum value of the IPv4 address(include).                                                                                             |
-| -                                       | end         | string                | 255.255.255.255     | The maximum value of the IPv4 address(include).                                                                                             |
-| **[Expression](#Expression)**           | -           | string                | -                   | **Use library  [Datafaker](https://www.datafaker.net/documentation/expressions/) expressions to generate fake data.**                       |
-| -                                       | expression  | string                | (none)              | The datafaker expression used  #{expression}.                                                                                               |
-| **[Eval](#Eval)**                       | -           | **string**            | -                   | **Use AviatorScript value expression to generate data.**                                                                                    |
-| -                                       | expression  | string                | (none)              | Support basic arithmetic operations and function calls. More details sess [AviatorScript](https://www.yuque.com/boyan-avfmj/aviatorscript). |
-| **[Object](#Object)**                   | -           | **struct/object**     | -                   | **Generate a object data structure. It used to define the nested structure of the mock data.**                                              |
-| -                                       | fields      | array of object       | (none)              | The fields of the object.                                                                                                                   |
-| **[Union](#Union)**                     | -           | -                     | -                   | **Generate a union data structure with multiple mock data type fields.**                                                                    |
-| -                                       | unionFields | array of object       | (none)              | The fields of the object.                                                                                                                   |
-| -                                       | - fields    | - array of object     | (none)              |                                                                                                                                             |
-| -                                       | - weight    | - int                 | 0                   | The weight of the generated object.                                                                                                         |
-|                                         | random      | boolean               | true                | Default is random mode. If set to false, the options value will be generated in order.                                                      |
+| Mock Type                               | Parameter                       | Result Type           | Default             | Description                                                                                                                                                   |
+|-----------------------------------------|---------------------------------|-----------------------|---------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| **[Number](#Number)**                   | -                               | **int/bigint/double** | -                   | **Randomly generate a number.**                                                                                                                               |
+|                                         | min                             | number                | 0                   | The minimum value (inclusive).                                                                                                                                |
+|                                         | max                             | number                | int32.max           | The maximum value (exclusive).                                                                                                                                |
+|                                         | options                         | array of number       | (none)              | The optional values. If set, the random value will be selected from the options and `min` and `max` will be ignored.                                          |
+|                                         | random                          | boolean               | true                | Default is random mode. If set to false, the value will be generated in order.                                                                                |
+| **[Sequence](#Sequence)**               | -                               | **bigint**            | -                   | **Generate a sequence number based on a specific step value.**                                                                                                |
+|                                         | start                           | bigint                | 0                   | The first number in the sequence (inclusive).                                                                                                                 |
+|                                         | step                            | bigint                | 1                   | The number to add to each subsequent value.                                                                                                                   |
+| **[UniqueSequence](#UniqueSequence)**   | -                               | **bigint**            | -                   | **Generate a globally unique sequence number.**                                                                                                               |
+|                                         | start                           | bigint                | 0                   | The first number in the sequence (inclusive).                                                                                                                 |
+| **[String](#String)**                   | -                               | string                | -                   | **Randomly generate a string.**                                                                                                                               |
+|                                         | regex                           | string                | [a-zA-Z]{0,5}       | The regular expression used to generate the string.                                                                                                           |
+|                                         | options                         | array of string       | (none)              | The optional values. If set, the random value will be selected from the options and `regex` will be ignored.                                                  |
+|                                         | random                          | boolean               | true                | Default is random mode. If set to false, the options value will be generated in order.                                                                        |
+| **[Timestamp](#Timestamp)**             | -                               | **bigint**            | -                   | **Generate a Unix timestamp in milliseconds or seconds.**                                                                                                     |
+|                                         | unit                            | string                | second              | The unit of the timestamp. Options are `second` or `millis`.                                                                                                  |
+| **[FormatTimestamp](#FormatTimestamp)** | -                               | **string**            | -                   | **Generate a formatted timestamp.**                                                                                                                           |
+|                                         | format                          | string                | yyyy-MM-dd HH:mm:ss | The format to output the timestamp in.                                                                                                                        |
+|                                         | utc                             | boolean               | false               | Default is local time. If set to true, the time will be converted to UTC time.                                                                                |
+| **[IPv4](#IPv4)**                       | -                               | **string**            | -                   | **Randomly generate an IPv4 address.**                                                                                                                        |
+|                                         | start                           | string                | 0.0.0.0             | The minimum value of the IPv4 address (inclusive).                                                                                                            |
+|                                         | end                             | string                | 255.255.255.255     | The maximum value of the IPv4 address (inclusive).                                                                                                            |
+| **[Expression](#Expression)**           | -                               | string                | -                   | **Use library [Datafaker](https://www.datafaker.net/documentation/expressions/) expressions to generate fake data.**                                          |
+|                                         | expression                      | string                | (none)              | The Datafaker expression to use, in the format `#{expression}`.                                                                                               |
+| **[Hlld](#HLLD)**                       | -                               | **string**            | -                   | **Generate a IP Address HyperLogLog data structure and store it as a base64 string. Use library [HLLD](https://github.com/armon/hlld).**                      |                                                              
+|                                         | itemCount                       | bigint                | 1000000             | The total number of items.                                                                                                                                    |
+|                                         | batchCount                      | int                   | 10000               | The number of items in each batch.                                                                                                                            | 
+|                                         | precision                       | int                   | 12                  | The precision of the HyperLogLog data structure. Allowed range is [4, 18].                                                                                    |
+| **[HdrHistogram](#HdrHistogram)**       | -                               | **string**            | -                   | **Generate a Latency HdrHistogram data structure and store it as a base64 string. Use library [HdrHistogram](https://github.com/HdrHistogram/HdrHistogram).** |
+|                                         | max                             | bigint                | 100000              | The maximum value of the histogram.                                                                                                                           |
+|                                         | batchCount                      | int                   | 1000                | The random number of items in each batch.                                                                                                                     |
+|                                         | numberOfSignificantValueDigits  | int                   | 1                   | The precision of the histogram data structure. Allowed range is [1, 5].                                                                                       |
+| **[Eval](#Eval)**                       | -                               | **string**            | -                   | **Use AviatorScript value expression to generate data.**                                                                                                      |
+|                                         | expression                      | string                | (none)              | Support basic arithmetic operations and function calls. More details in [AviatorScript](https://www.yuque.com/boyan-avfmj/aviatorscript).                     |
+| **[Object](#Object)**                   | -                               | **struct/object**     | -                   | **Generate an object data structure. Used to define the nested structure of the mock data.**                                                                  |
+|                                         | fields                          | array of object       | (none)              | The fields of the object.                                                                                                                                     |
+| **[Union](#Union)**                     | -                               | -                     | -                   | **Generate a union data structure with multiple mock data type fields.**                                                                                      |
+|                                         | unionFields                     | array of object       | (none)              | The fields of the union.                                                                                                                                      |
+|                                         | weight                          | int                   | 0                   | The weight of the generated object.                                                                                                                           |
+|                                         | random                          | boolean               | true                | Default is random mode. If set to false, the options value will be generated in order.                                                                        |
+
 
 ### Common Parameters
 
@@ -250,6 +258,22 @@ Mock data type supports some common parameters.
 {"name":"phoneNumber","type":"Expression","expression":"#{phoneNumber.phoneNumber}"}
 ```
 
+### HLLD
+
+- Generate a IP Address HyperLogLog data structure, stored as a base64 string. At most 1000 IP addresses are generated in each batch.
+
+```json
+{"name":"hll","type":"Hlld","itemCount":1000000,"batchCount":1000,"precision":12}
+```
+
+### HdrHistogram
+
+- Generate a Latency HdrHistogram data structure, stored as a base64 string. The maximum value of the histogram is 100000, and at most 1000 items are generated in each batch.
+
+```json
+{"name":"distribution","type":"HdrHistogram","max":100000,"batchCount":1000,"numberOfSignificantValueDigits":1}
+```
+
 ### Eval
 
 - Generate a value by using AviatorScript expression. Commonly used for arithmetic operations.
diff --git a/docs/images/groot_stream_architecture.jpg b/docs/images/groot_stream_architecture.jpg
index 1fff0e5..d8f1d4b 100644
--- a/docs/images/groot_stream_architecture.jpg
+++ b/docs/images/groot_stream_architecture.jpg
diff --git a/docs/processor/aggregate-processor.md b/docs/processor/aggregate-processor.md
index af82d4e..5ab0ae0 100644
--- a/docs/processor/aggregate-processor.md
+++ b/docs/processor/aggregate-processor.md
@@ -12,7 +12,7 @@ Note：Default will output internal fields `__window_start_timestamp` and `__win
 
 | name                     |  type  | required | default value                                                                                                                                                                                                        |
 |--------------------------|--------|----------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| type                     | String | Yes      | The type of the processor, now only support `com.geedgenetworks.core.processor.projection.AggregateProcessor`                                                                                                        |
+| type                     | String | Yes      | The type of the processor, now only support `com.geedgenetworks.core.processor.aggregate.AggregateProcessor`                                                                                                         |
 | output_fields            | Array  | No       | Array of String. The list of fields that need to be kept. Fields not in the list will be removed.                                                                                                                    |
 | remove_fields            | Array  | No       | Array of String. The list of fields that need to be removed.                                                                                                                                                         |
 | group_by_fields          | Array  | yes      | Array of String. The list of fields that need to be grouped.                                                                                                                                                         |
diff --git a/docs/processor/table-processor.md b/docs/processor/table-processor.md
new file mode 100644
index 0000000..7b3066c
--- /dev/null
+++ b/docs/processor/table-processor.md
@@ -0,0 +1,61 @@
+# Table Processor
+
+> Processing pipelines for table processors using UDTFs
+
+## Description
+
+Table processor is used to process the data from source to sink. It is a part of the processing pipeline. It can be used in the pre-processing, processing, and post-processing pipeline. Each processor can assemble UDTFs(User-defined Table functions) into a pipeline. Within the pipeline, events are processed by each Function in order, top‑>down. More details can be found in user-defined table functions [(UDTFs)](udtf.md).
+
+## Options
+
+| name            | type   | required | default value                                                                                        |
+|-----------------|--------|----------|------------------------------------------------------------------------------------------------------|
+| type            | String | Yes      | The type of the processor, now only support `com.geedgenetworks.core.processor.table.TableProcessor` |
+| output_fields   | Array  | No       | Array of String. The list of fields that ne ed to be kept. Fields not in the list will be removed.   |
+| remove_fields   | Array  | No       | Array of String. The list of fields that need to be removed.                                         |
+| functions       | Array  | No       | Array of Object. The list of functions that need to be applied to the data.                          |
+
+## Usage Example
+This example uses a table processor to unroll the encapsulation field, converting one row into multiple rows.
+
+```yaml
+sources:
+  inline_source:
+    type: inline
+    properties:
+      data: '[{"tcp_rtt_ms":128,"decoded_as":"HTTP","http_version":"http1","http_request_line":"GET / HTTP/1.1","http_host":"www.ct.cn","http_url":"www.ct.cn/","http_user_agent":"curl/8.0.1","http_status_code":200,"http_response_line":"HTTP/1.1 200 OK","http_response_content_type":"text/html; charset=UTF-8","http_response_latency_ms":31,"http_session_duration_ms":5451,"in_src_mac":"ba:bb:a7:3c:67:1c","in_dest_mac":"86:dd:7a:8f:ae:e2","out_src_mac":"86:dd:7a:8f:ae:e2","out_dest_mac":"ba:bb:a7:3c:67:1c","tcp_client_isn":678677906,"tcp_server_isn":1006700307,"address_type":4,"client_ip":"192.11.22.22","server_ip":"8.8.8.8","client_port":42751,"server_port":80,"in_link_id":65535,"out_link_id":65535,"start_timestamp_ms":1703646546127,"end_timestamp_ms":1703646551702,"duration_ms":5575,"sent_pkts":97,"sent_bytes":5892,"received_pkts":250,"received_bytes":333931,"encapsulation":"[{\"tunnels_schema_type\":\"MULTIPATH_ETHERNET\",\"c2s_source_mac\":\"48:73:97:96:38:27\",\"c2s_destination_mac\":\"58:b3:8f:fa:3b:11\",\"s2c_source_mac\":\"58:b3:8f:fa:3b:11\",\"s2c_destination_mac\":\"48:73:97:96:38:27\"}]"},{"tcp_rtt_ms":256,"decoded_as":"HTTP","http_version":"http1","http_request_line":"GET / HTTP/1.1","http_host":"www.abc.cn","http_url":"www.cabc.cn/","http_user_agent":"curl/8.0.1","http_status_code":200,"http_response_line":"HTTP/1.1 200 OK","http_response_content_type":"text/html; charset=UTF-8","http_response_latency_ms":31,"http_session_duration_ms":5451,"in_src_mac":"ba:bb:a7:3c:67:1c","in_dest_mac":"86:dd:7a:8f:ae:e2","out_src_mac":"86:dd:7a:8f:ae:e2","out_dest_mac":"ba:bb:a7:3c:67:1c","tcp_client_isn":678677906,"tcp_server_isn":1006700307,"address_type":4,"client_ip":"192.168.10.198","server_ip":"4.4.4.4","client_port":42751,"server_port":80,"in_link_id":65535,"out_link_id":65535,"start_timestamp_ms":1703646546127,"end_timestamp_ms":1703646551702,"duration_ms":2575,"sent_pkts":197,"sent_bytes":5892,"received_pkts":350,"received_bytes":533931,"device_tag":"{\"tags\":[{\"tag\":\"data_center\",\"value\":\"center-xxg-tsgx\"},{\"tag\":\"device_group\",\"value\":\"group-xxg-tsgx\"}]}"}]'
+      format: json
+      json.ignore.parse.errors: false
+
+processing_pipelines:
+  table_processor:
+    type: table
+    functions:
+      - function: JSON_UNROLL
+        lookup_fields: [ encapsulation]
+        output_fields: [ encapsulation ]
+
+sinks:
+  print_sink:
+    type: print
+    properties:
+      format: json
+      mode: log_warn
+
+application:
+  env:
+    name: example-inline-to-print-use-udtf
+    parallelism: 3
+    pipeline:
+      object-reuse: true
+  topology:
+    - name: inline_source
+      downstream: [table_processor]
+    - name: table_processor
+      downstream: [ print_sink ]
+    - name: print_sink
+      downstream: []
+
+```
+
+
diff --git a/docs/processor/udaf.md b/docs/processor/udaf.md
index e22846f..dd1dd70 100644
--- a/docs/processor/udaf.md
+++ b/docs/processor/udaf.md
@@ -11,7 +11,11 @@
 - [Long Count](#Long-Count)
 - [MEAN](#Mean)
 - [Number SUM](#Number-SUM)
-
+- [HLLD](#HLLD)
+- [Approx Count Distinct HLLD](#Approx-Count-Distinct-HLLD)
+- [HDR Histogram](#HDR-Histogram)
+- [Approx Quantile HDR](#APPROX_QUANTILE_HDR)
+- [Approx Quantiles HDR](#APPROX_QUANTILES_HDR)
 
 ## Description
 
@@ -146,4 +150,176 @@ NUMBER_SUM is used to sum the value of the field in the group of events. The loo
 - function: NUMBER_SUM
   lookup_fields: [received_bytes]
   output_fields: [received_bytes_sum]
-```
-\ No newline at end of file
+```
+
+### HLLD
+hlld is a high-performance C server which is used to expose HyperLogLog sets and operations over them to networked clients. More details can be found in [hlld](https://github.com/armon/hlld).
+
+```HLLD(filter, lookup_fields, output_fields[, parameters])```
+- filter: optional
+- lookup_fields: required. 
+- output_fields: required.
+- parameters: optional.
+    - input_type: `<String>` optional. input field type can be `regular` or `sketch`. Default is `sketch`. regular field data type includes `string`, `int`, `long`, `float`, `double` etc.
+    - precision: `<Integer>` optional. The precision of the hlld value. Default is 12.
+    - output_format: `<String>` optional. The output format can be either `base64(encoded string)` or `binary(byte[])`. The default is `base64`.
+
+### Example
+  Merge multiple string field into a HyperLogLog data structure.
+```yaml
+  - function: HLLD
+    lookup_fields: [client_ip]
+    output_fields: [client_ip_hlld]
+    parameters:
+      input_type: regular
+
+```
+  Merge multiple `unique_count ` metric type fields into a HyperLogLog data structure
+```yaml 
+  - function: HLLD
+    lookup_fields: [client_ip_hlld]
+    output_fields: [client_ip_hlld]
+    parameters:
+      input_type: sketch
+```
+
+### Approx Count Distinct HLLD
+Approx Count Distinct HLLD is used to count the approximate number of distinct values in the group of events.
+
+```APPROX_COUNT_DISTINCT_HLLD(filter, lookup_fields, output_fields[, parameters])```
+- filter: optional
+- lookup_fields: required.
+- output_fields: required.
+- parameters: optional.
+  - input_type: `<String>` optional. Refer to `HLLD` function.
+  - precision: `<Integer>` optional. Refer to `HLLD` function.
+
+### Example
+    
+```yaml
+- function: APPROX_COUNT_DISTINCT_HLLD
+  lookup_fields: [client_ip]
+  output_fields: [unique_client_ip]
+  parameters:
+    input_type: regular
+```
+
+```yaml
+- function: APPROX_COUNT_DISTINCT_HLLD
+  lookup_fields: [client_ip_hlld]
+  output_fields: [unique_client_ip]
+  parameters:
+      input_type: sketch
+```
+
+### HDR Histogram
+
+A High Dynamic Range (HDR) Histogram. More details can be found in [HDR Histogram](https://github.com/HdrHistogram/HdrHistogram).
+
+```HDR_HISTOGRAM(filter, lookup_fields, output_fields[, parameters])```
+- filter: optional
+- lookup_fields: required.
+- output_fields: required.
+- parameters: optional.
+  - input_type: `<String>` optional. input field type can be `regular` or `sketch`. Default is `sketch`. regular field is a number. 
+  - lowestDiscernibleValue: `<Integer>` optional. The lowest trackable value. Default is 1.
+  - highestTrackableValue: `<Integer>` optional. The highest trackable value. Default is 2.
+  - numberOfSignificantValueDigits: `<Integer>` optional. The number of significant value digits. Default is 1. The range is 1 to 5. 
+  - autoResize: `<Boolean>` optional. If true, the highestTrackableValue will auto-resize. Default is true.
+  - output_format: `<String>` optional. The output format can be either `base64(encoded string)` or `binary(byte[])`. The default is `base64`.
+
+### Example
+    
+  ```yaml
+  - function: HDR_HISTOGRAM
+    lookup_fields: [latency_ms]
+    output_fields: [latency_ms_histogram]
+    parameters:
+      input_type: regular
+      lowestDiscernibleValue: 1
+      highestTrackableValue: 3600000
+      numberOfSignificantValueDigits: 3
+  ```
+  ```yaml
+  - function: HDR_HISTOGRAM
+    lookup_fields: [latency_ms_histogram]
+    output_fields: [latency_ms_histogram]
+    parameters:
+      input_type: sketch
+  ```
+
+### Approx Quantile HDR
+
+Approx Quantile HDR is used to calculate the approximate quantile value of the field in the group of events.
+
+```APPROX_QUANTILE_HDR(filter, lookup_fields, output_fields, quantile[, parameters])```
+- filter: optional
+- lookup_fields: required.
+- output_fields: required.
+- parameters: optional.
+  - input_type: `<String>` optional. Refer to `HDR_HISTOGRAM` function.
+  - lowestDiscernibleValue: `<Integer>` optional. Refer to `HDR_HISTOGRAM` function.
+  - highestTrackableValue: `<Integer>` required. Refer to `HDR_HISTOGRAM` function.
+  - numberOfSignificantValueDigits: `<Integer>` optional. Refer to `HDR_HISTOGRAM` function.
+  - autoResize: `<Boolean>` optional. Refer to `HDR_HISTOGRAM` function.
+  - probability: `<Double>` optional. The probability of the quantile. Default is 0.5.
+
+### Example
+  
+  ```yaml
+  - function: APPROX_QUANTILE_HDR
+    lookup_fields: [latency_ms]
+    output_fields: [latency_ms_p95]
+    parameters:
+      input_type: regular
+      probability: 0.95
+  ```
+
+  ```yaml
+  - function: APPROX_QUANTILE_HDR
+    lookup_fields: [latency_ms_HDR]
+    output_fields: [latency_ms_p95]
+    parameters:
+      input_type: sketch
+      probability: 0.95
+
+  ```
+
+### Approx Quantiles HDR
+
+Approx Quantiles HDR is used to calculate the approximate quantile values of the field in the group of events.
+
+```APPROX_QUANTILES_HDR(filter, lookup_fields, output_fields, quantiles[, parameters])```
+- filter: optional
+- lookup_fields: required.
+- output_fields: required.
+- parameters: optional.
+  - input_type: `<String>` optional. Refer to `HDR_HISTOGRAM` function.
+  - lowestDiscernibleValue: `<Integer>` optional. Refer to `HDR_HISTOGRAM` function.
+  - highestTrackableValue: `<Integer>` required. Refer to `HDR_HISTOGRAM` function.
+  - numberOfSignificantValueDigits: `<Integer>` optional. Refer to `HDR_HISTOGRAM` function.
+  - autoResize: `<Boolean>` optional. Refer to `HDR_HISTOGRAM` function.
+  - probabilities: `<Array<Double>>` required. The list of probabilities of the quantiles. Range is 0 to 1.
+
+### Example
+    
+```yaml
+- function: APPROX_QUANTILES_HDR
+  lookup_fields: [latency_ms]
+  output_fields: [latency_ms_quantiles]
+  parameters:
+    input_type: regular
+    probabilities: [0.5, 0.95, 0.99]
+```
+
+```yaml
+- function: APPROX_QUANTILES_HDR
+  lookup_fields: [latency_ms_HDR]
+  output_fields: [latency_ms_quantiles]
+  parameters:
+    input_type: sketch
+    probabilities: [0.5, 0.95, 0.99]
+```
+
+
+
diff --git a/docs/processor/udtf.md b/docs/processor/udtf.md
new file mode 100644
index 0000000..a6e8444
--- /dev/null
+++ b/docs/processor/udtf.md
@@ -0,0 +1,66 @@
+# UDTF
+
+> The functions for table processors.
+
+## Function of content
+
+- [UNROLL](#unroll)
+- [JSON_UNROLL](#json_unroll)
+
+## Description
+
+The UDTFs(user-defined table functions) are used to process the data from source to sink. It is a part of the processing pipeline. It can be used in the pre-processing, processing, and post-processing pipeline. Each processor can assemble UDTFs into a pipeline. Within the pipeline, events are processed by each Function in order, top‑>down.
+Unlike scalar functions, which return a single value, UDTFs are particularly useful when you need to explode or unroll data, transforming a single input row into multiple output rows.
+
+## UDTF Definition
+
+ The UDTFs and UDFs share similar input and context structures, please refer to [UDF](udf.md).
+
+## Functions
+
+### UNROLL
+
+The Unroll Function handles an array field—or an expression evaluating to an array—and unrolls it into individual events.
+
+```UNROLL(filter, lookup_fields, output_fields[, parameters])```
+- filter: optional
+- lookup_fields: required
+- output_fields: required
+- parameters: optional
+  - regex: `<String>` optional. If lookup_fields is a string, the regex parameter is used to split the string into an array. The default value is a comma.
+
+#### Example
+    
+```yaml
+functions:
+  - function: UNROLL
+    lookup_fields: [ monitor_rule_list ]
+    output_fields: [ monitor_rule ]
+```
+
+### JSON_UNROLL
+
+The JSON Unroll Function handles a JSON object, unrolls/explodes an array of objects therein into individual events, while also inheriting top level fields.
+
+```JSON_UNROLL(filter, lookup_fields, output_fields[, parameters])```
+- filter: optional
+- lookup_fields: required
+- output_fields: required
+- parameters: optional
+  - path: `<String>` optional. Path to array to unroll, default is the root of the JSON object.
+  - new_path: `<String>` optional. Rename path to new_path, default is the same as path.
+
+#### Example
+    
+```yaml
+functions:
+  - function: JSON_UNROLL
+    lookup_fields: [ device_tag ]
+    output_fields: [ device_tag ]
+    parameters:
+      - path: tags
+      - new_path: tag
+```
+
+
+
author	窦凤虎 <[email protected]>	2024-08-19 11:08:46 +0000
committer	窦凤虎 <[email protected]>	2024-08-19 11:08:46 +0000
commit	56b21d494bfa07012b1cc4e43dcb4ccdb6257d12 (patch)
tree	0fa1a094dbb4f4703ecbf013c678b3bb485b385b /docs
parent	07332297c1306aa0dac649c7d15bf131e8edbc7e (diff)
parent	6564a5e9a43ecd88f5497e2b75a219a8a54101bb (diff)