diff options
| author | doufenghu <[email protected]> | 2024-06-01 18:31:17 +0800 |
|---|---|---|
| committer | doufenghu <[email protected]> | 2024-06-01 18:31:17 +0800 |
| commit | 258e0fbdf263ed1edde1964a505387b018933a16 (patch) | |
| tree | 5980f63340ae935b7eb9287dd821e3bb33fb781d /docs | |
| parent | 0f0f3b17b3036cd2f09ea6ff2737e5b6c555b704 (diff) | |
[Improve][docs] Update ClickHouse connector and UDF document.
Diffstat (limited to 'docs')
| -rw-r--r-- | docs/connector/sink/clickhouse.md | 21 | ||||
| -rw-r--r-- | docs/processor/udf.md | 86 |
2 files changed, 89 insertions, 18 deletions
diff --git a/docs/connector/sink/clickhouse.md b/docs/connector/sink/clickhouse.md index c11ecbd..5256fd7 100644 --- a/docs/connector/sink/clickhouse.md +++ b/docs/connector/sink/clickhouse.md @@ -38,16 +38,17 @@ In order to use the ClickHouse connector, the following dependencies are require ClickHouse sink custom properties. If properties belongs to ClickHouse JDBC Config, you can use `connection.` prefix to set. -| Name | Type | Required | Default | Description | -|---------------------|----------|----------|---------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| host | String | Yes | (none) | `ClickHouse` cluster address, the format is `host:port` , allowing multiple `hosts` to be specified. Such as `"host1:8123,host2:8123"`. | -| connection.database | String | Yes | (none) | The `ClickHouse` database. | -| table | String | Yes | (none) | The table name. | -| batch.size | Integer | Yes | 100000 | The number of rows written through [Clickhouse-jdbc](https://github.com/ClickHouse/clickhouse-jdbc) each time, the `default is 20000`. | -| batch.interval | Duration | Yes | 30s | The time interval for writing data through. | -| connection.user | String | Yes | (none) | The username to use to connect to `ClickHouse`. | -| connection.password | String | Yes | (none) | The password to use to connect to `ClickHouse`. | -| connection.config | Map | No | (none) | In addition to the above mandatory parameters that must be specified by `clickhouse-jdbc` , users can also specify multiple optional parameters, which cover all the [parameters](https://github.com/ClickHouse/clickhouse-jdbc/tree/master/clickhouse-client#configuration) provided by `clickhouse-jdbc`. | +| Name | Type | Required | Default | Description | +|---------------------|----------|----------|---------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| host | String | Yes | (none) | `ClickHouse` cluster address, the format is `host:port` , allowing multiple `hosts` to be specified. Such as `"host1:8123,host2:8123"`. | +| connection.database | String | Yes | (none) | The `ClickHouse` database. | +| table | String | Yes | (none) | The table name. | +| batch.size | Integer | Yes | 100000 | The number of rows written through each time, the `default is 20000`. | +| batch.byte.size | Memory | No | 200MB | The number of bytes written through each time, the `default is 200MB`. | +| batch.interval | Duration | Yes | 30s | The time interval for writing data through. | +| connection.user | String | Yes | (none) | The username to use to connect to `ClickHouse`. | +| connection.password | String | Yes | (none) | The password to use to connect to `ClickHouse`. | +| connection.config | Map | No | (none) | In addition to the above mandatory parameters that must be specified by `clickhouse-jdbc` , users can also specify multiple optional parameters, which cover all the [parameters](https://github.com/ClickHouse/clickhouse-jdbc/tree/master/clickhouse-client#configuration) provided by `clickhouse-jdbc`. | ## Example This example read data of inline test source and write to ClickHouse table `test`. diff --git a/docs/processor/udf.md b/docs/processor/udf.md index 5e6bd6a..74fa2d0 100644 --- a/docs/processor/udf.md +++ b/docs/processor/udf.md @@ -9,6 +9,7 @@ - [Domain](#domain) - [Drop](#drop) - [Eval](#eval) +- [Flatten](#flatten) - [From Unix Timestamp](#from-unix-timestamp) - [Generate String Array](#generate-string-array) - [GeoIP Lookup](#geoip-lookup) @@ -180,6 +181,47 @@ If the value of `direction` is `69`, the value of `internal_ip` will be `client_ parameters: value_expression: 'direction=69 ? client_ip : server_ip' ``` +### Flatten +Flatten the fields of nested structure to the top level. The new fields name are named using the field name prefixed with the names of the struct fields to reach it, separated by dots as default. + +```FLATTEN(filter, lookup_fields, output_fields[, parameters])``` +- filter: optional +- lookup_fields: optional +- output_fields: not required +- parameters: optional + - prefix: `<String>` optional. Prefix string for flattened field names. Default is empty. + - depth: `<Integer>` optional. Number representing the nested levels to consider for flattening. Minimum 1. Default is `5`. + - delimiter: `<String>` optional. The string used to join nested keys Default is `.`. + - json_string_keys: `<Array>` optional. The keys of the json string fields. It indicates keys that contain JSON strings and should be parsed and flattened. Default is empty. + +Example 1: + +Flatten the nested structure of fields and tags in Metrics. If lookup_fields is empty, flatten all nested structures. + +```yaml + - function: FLATTEN + lookup_fields: [tags,fields] +``` + +Example 2: + +Flatten the nested structure of the session record field `encapsulation` (JSON String format), add the prefix `tunnels`, specify the nesting depth as `3`, and use an dot "." as the delimiter. +```yaml + - function: FLATTEN + lookup_fields: [encapsulation] + parameters: + prefix: tunnels + depth: 3 + delimiter: . + json_string_keys: [encapsulation] +``` +Output: +```json +{ + "tunnels.encapsulation.ipv4.client_ip:": "192.168.11.12", + "tunnels.encapsulation.ipv4.server_ip": "8.8.8.8" +} +``` ### From Unix Timestamp From unix timestamp function is used to convert the unix timestamp to date time string. The default time zone is UTC+0. @@ -311,20 +353,48 @@ Example: ``` ### Rename -Rename function is used to rename the field name. +Rename function is used to rename or reformat(e.g. by replacing character underscores with dots) the field name. -```RENAME(filter, lookup_fields, output_fields)``` +```RENAME(filter, lookup_fields, output_fields, parameters)``` - filter: optional -- lookup_fields: required -- output_fields: required -- parameters: not required +- lookup_fields: not required +- output_fields: not required +- parameters: required + - parent_fields: `<Array>` optional. Specify fields whose children will inherit the Rename fields and Rename expression operations. + - rename_fields: `Map<String, String>` required. The key is the original field name, and the value is the new field name. + - current_field_name: `<String>` required. The original field name. + - new_field_name: `<String>` required. The new field name. + - rename_expression: `<String>` optional. AviatorScript expression whose returned value will be used to rename fields. + +``` +A single Function can include both rename_fields (to rename specified field names) and rename_expression (to globally rename fields). However, the Rename fields strategy will execute first. +``` +Example 1: + +Remove the prefix "tags_" from the field names and rename the field "timestamp_ms" to "recv_time_ms". -Example: ```yaml - function: RENAME - lookup_fields: [http_domain] - output_fields: [server_domain] + - parameters: + rename_fields: + - timestamp_ms: recv_time_ms + rename_expression: key=string.replace_all(key,'tags_',''); return key; + +``` + +Example 2: + +Rename the field `client_ip` to `source_ip`, including the fields under the `encapsulation.ipv4` tunnel. + +```yaml + - function: RENAME + - parameters: + parent_fields: [encapsulation.ipv4] + rename_fields: + - client_ip: source_ip + ``` +Output: `source_ip:192.168.4.1, encapsulation.ipv4.source_ip:192.168.12.12` ### Snowflake ID |
