diff options
| author | doufenghu <[email protected]> | 2024-11-01 20:40:46 +0800 |
|---|---|---|
| committer | doufenghu <[email protected]> | 2024-11-01 20:40:46 +0800 |
| commit | 5818ed2ac9ca31a35a55f330160a9cf7f63bf6f3 (patch) | |
| tree | 0d2f00c6d6c1791de8c5588572e0e7fb538803f2 | |
| parent | e25eabde3ccb3f0d52346cb11cac757763c41be8 (diff) | |
[Improve][docs] Add a description of the new features for version 1.7.1-SNAPSHOT.
| -rw-r--r-- | docs/connector/formats/csv.md | 11 | ||||
| -rw-r--r-- | docs/connector/sink/starrocks.md | 10 | ||||
| -rw-r--r-- | docs/grootstream-design-cn.md | 46 | ||||
| -rw-r--r-- | docs/processor/udaf.md | 38 | ||||
| -rw-r--r-- | docs/processor/udf.md | 52 | ||||
| -rw-r--r-- | pom.xml | 2 |
6 files changed, 143 insertions, 16 deletions
diff --git a/docs/connector/formats/csv.md b/docs/connector/formats/csv.md index ca8d10b..76769b2 100644 --- a/docs/connector/formats/csv.md +++ b/docs/connector/formats/csv.md @@ -4,8 +4,7 @@ > > ## Description > -> The CSV format allows to read and write CSV data based on an CSV schema. Currently, the CSV schema is derived from table schema. -> **The CSV format must config schema for source/sink**. +> The CSV format allows for reading and writing CSV data based on a schema. Currently, the CSV schema is derived from the table schema. | Name | Supported Versions | Maven | |--------------|--------------------|---------------------------------------------------------------------------------------------------------------------------| @@ -16,12 +15,12 @@ | Name | Type | Required | Default | Description | |-----------------------------|-----------|----------|---------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------| | format | String | Yes | (none) | Specify what format to use, here should be 'csv'. | -| csv.field.delimiter | String | No | , | Field delimiter character (',' by default), must be single character. You can use backslash to specify special characters, e.g. '\t' represents the tab character. | -| csv.disable.quote.character | Boolean | No | false | Disabled quote character for enclosing field values (false by default). If true, option 'csv.quote.character' can not be set. | -| csv.quote.character | String | No | " | Quote character for enclosing field values (" by default). | +| csv.field.delimiter | String | No | , | Field delimiter character (`,` by default), must be single character. You can use backslash to specify special characters, e.g. '\t' represents the tab character. | +| csv.disable.quote.character | Boolean | No | false | Disabled quote character for enclosing field values (`false` by default). If true, option `csv.quote.character` can not be set. | +| csv.quote.character | String | No | " | Quote character for enclosing field values (`"` by default). | | csv.allow.comments | Boolean | No | false | Ignore comment lines that start with '#' (disabled by default). If enabled, make sure to also ignore parse errors to allow empty rows. | | csv.ignore.parse.errors | Boolean | No | false | Skip fields and rows with parse errors instead of failing. Fields are set to null in case of errors. | -| csv.array.element.delimiter | String | No | ; | Array element delimiter string for separating array and row element values (';' by default). | +| csv.array.element.delimiter | String | No | ; | Array element delimiter string for separating array and row element values (`;` by default). | | csv.escape.character | String | No | (none) | Escape character for escaping values (disabled by default). | | csv.null.literal | String | No | (none) | Null literal string that is interpreted as a null value (disabled by default). | diff --git a/docs/connector/sink/starrocks.md b/docs/connector/sink/starrocks.md index f07e432..208fa39 100644 --- a/docs/connector/sink/starrocks.md +++ b/docs/connector/sink/starrocks.md @@ -1,25 +1,25 @@ # Starrocks -> Starrocks sink connector +> StarRocks sink connector > > ## Description > -> Sink connector for Starrocks, know more in https://docs.starrocks.io/zh/docs/loading/Flink-connector-starrocks/. +> Sink connector for StarRocks, know more in https://docs.starrocks.io/zh/docs/loading/Flink-connector-starrocks/. ## Sink Options -Starrocks sink custom properties. If properties belongs to Starrocks Flink Connector Config, you can use `connection.` prefix to set. +StarRocks sink custom properties. If properties belongs to StarRocks Flink Connector Config, you can use `connection.` prefix to set. | Name | Type | Required | Default | Description | |---------------------|---------|----------|---------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | log.failures.only | Boolean | No | true | Optional flag to whether the sink should fail on errors, or only log them; If this is set to true, then exceptions will be only logged, if set to false, exceptions will be eventually thrown, true by default. | | connection.jdbc-url | String | Yes | (none) | The address that is used to connect to the MySQL server of the FE. You can specify multiple addresses, which must be separated by a comma (,). Format: jdbc:mysql://<fe_host1>:<fe_query_port1>,<fe_host2>:<fe_query_port2>,<fe_host3>:<fe_query_port3>.. | | connection.load-url | String | Yes | (none) | The address that is used to connect to the HTTP server of the FE. You can specify multiple addresses, which must be separated by a semicolon (;). Format: <fe_host1>:<fe_http_port1>;<fe_host2>:<fe_http_port2>.. | -| connection.config | Map | No | (none) | Starrocks Flink Connector Options, know more in https://docs.starrocks.io/docs/loading/Flink-connector-starrocks/#options. | +| connection.config | Map | No | (none) | StarRocks Flink Connector Options, know more in https://docs.starrocks.io/docs/loading/Flink-connector-starrocks/#options. | ## Example -This example read data of inline test source and write to Starrocks table `test`. +This example read data of inline test source and write to StarRocks table `test`. ```yaml sources: # [object] Define connector source diff --git a/docs/grootstream-design-cn.md b/docs/grootstream-design-cn.md index 41fcd0d..8579dc8 100644 --- a/docs/grootstream-design-cn.md +++ b/docs/grootstream-design-cn.md @@ -114,7 +114,8 @@ grootstream: vault: type: vault url: <vault-url> - token: <vault-token> + username: <vault-username> + password: <vault-password> default_key_path: <default-vault-key-path> plugin_key_path: <plugin-vault-key-path> @@ -1295,6 +1296,23 @@ sinks: format: raw ``` +### CSV + +按照既定的Schema读取/写入csv格式数据。 + +| 属性名 | 必填 | 默认值 | 类型 | 描述 | +| --------------------------- | ---- | ------ | ------- | ------------------------------------------------------------ | +| csv.field.delimiter | Y | , | String | 指定字段值之间的分隔符,默认为逗号 | +| csv.quote.character | N | " | String | 指定用于包围字段值的引号字符,默认为双引号"。如果csv.disable.quote.character为true,无法使用该选项。 | +| csv.disable.quote.character | N | false | Boolean | 是否禁用包围字段值的引号字符。默认为false | +| csv.allow.comments | N | false | Boolean | 忽略以 `#` 开头的注释行(默认情况下禁用)。如果启用此选项,确保同时忽略解析错误,以允许存在空行。这意味着在处理 CSV 文件时,任何以 `#` 开头的行都将被视为注释,不会被解析或读取。 | +| csv.ignore.parse.errors | N | false | Boolean | 忽略解析错误,默认为false。遇到格式错误输出异常日志。 | +| csv.array.element.delimiter | N | ; | String | 数组中元素的分隔符 | +| csv.escape.character | N | | String | 转义特殊字符的字符。例如:分隔符、引号或换行符。 | +| csv.null.literal | N | | String | 指定NULL值的字符串 | + + + # 任务编排 ```yaml @@ -1480,7 +1498,7 @@ Parameters: identifier: aes-128-gcm96 ``` -Note : 读取任务变量`projection.encrypt.schema.registry.uri`,返回加密字段,数据类型为Array。 +Note : 读取任务变量`projection.encrypt.schema.registry.uri`,返回加密的字段,数据类型为Array。 #### Eval @@ -1621,7 +1639,7 @@ Parameters: - secret_key = `<string>` 用于生成MAC的密钥。 - algorithm= `<string>` 用于生成MAC的HASH算法。默认是`sha256` -- output_format = `<string>` 输出MAC的格式。默认为`'hex'` 。支持:`base64` | `hex `。 +- output_format = `<string>` 输出MAC的格式。默认为`'base64'` 。支持:`base64` | `hex `。 ``` - function: HMAC @@ -1850,6 +1868,28 @@ Parameters: output_fields: [ sessions ] ``` + + + #### Max + +在时间窗口内获取最大值 + +```yaml +- function: MAX + lookup_fields: [ received_time ] + output_fields: [ received_time ] +``` + + #### Min + +在时间窗口内获取最小值 + +```yaml +- function: MIN + lookup_fields: [ received_time ] + output_fields: [ received_time ] +``` + #### Mean 在时间窗口内对指定的数值对象求平均值。 diff --git a/docs/processor/udaf.md b/docs/processor/udaf.md index 66d6ad5..f305201 100644 --- a/docs/processor/udaf.md +++ b/docs/processor/udaf.md @@ -9,7 +9,9 @@ - [First Value](#First-Value) - [Last Value](#Last-Value) - [Long Count](#Long-Count) +- [Max](#Max) - [MEAN](#Mean) +- [Min](#Min) - [Number SUM](#Number-SUM) - [HLLD](#HLLD) - [Approx Count Distinct HLLD](#Approx-Count-Distinct-HLLD) @@ -116,6 +118,23 @@ Example output_fields: [sessions] ``` +### Max + +MAX is used to get the maximum value of the field in the group of events. + +```MAX(filter, lookup_fields, output_fields)``` +- filter: optional +- lookup_fields: required. Now only support one field. +- output_fields: optional. If not set, the output field name is `lookup_field_name`. + +Example + +```yaml +- function: MAX + lookup_fields: [receive_time] + output_fields: [receive_time] +``` + ### Mean MEAN is used to calculate the mean value of the field in the group of events. The lookup field value must be a number. @@ -135,6 +154,25 @@ Example output_fields: [received_bytes_mean] ``` + +### Min + +MIN is used to get the minimum value of the field in the group of events. + +```MIN(filter, lookup_fields, output_fields)``` +- filter: optional +- lookup_fields: required. Now only support one field. +- output_fields: optional. If not set, the output field name is `lookup_field_name`. + +Example + +```yaml +- function: MIN + lookup_fields: [receive_time] + output_fields: [receive_time] +``` + + ### Number SUM NUMBER_SUM is used to sum the value of the field in the group of events. The lookup field value must be a number. diff --git a/docs/processor/udf.md b/docs/processor/udf.md index e480275..7f5c656 100644 --- a/docs/processor/udf.md +++ b/docs/processor/udf.md @@ -10,11 +10,13 @@ - [Current Unix Timestamp](#current-unix-timestamp) - [Domain](#domain) - [Drop](#drop) +- [Encrypt](#encrypt) - [Eval](#eval) - [Flatten](#flatten) - [From Unix Timestamp](#from-unix-timestamp) - [Generate String Array](#generate-string-array) - [GeoIP Lookup](#geoip-lookup) +- [HMAC](#hmac) - [JSON Extract](#json-extract) - [Path Combine](#path-combine) - [Rename](#rename) @@ -174,6 +176,30 @@ Example: filter: event.server_ip == '4.4.4.4' ``` +### Encrypt + +Encrypt function is used to encrypt the field value by the specified algorithm. + +Note: This feature allows you to use a third-party RESTful API to retrieve encrypted fields. By using these fields as criteria, you can determine whether the current field is encrypted. You must also set the projection.encrypt.schema.registry.uri as a job property. +For example, setting `projection.encrypt.schema.registry.uri=127.0.0.1:9999/v1/schema/session_record?option=encrypt_fields` will return the encrypted fields in an array format. + +```ENCRYPT(filter, lookup_fields, output_fields[, parameters])``` +- filter: optional +- lookup_fields: required +- output_fields: required +- parameters: required + - identifier: `<String>` required. The identifier of the encryption algorithm. Supports `aes-128-gcm96`, `aes-256-gcm96`, and `sm4-gcm96`. + +Example: +Encrypt the phone number by the AES-128-GCM96 algorithm. Here phone_number will replace the original value with the encrypted value. +```yaml +- function: ENCRYPT + lookup_fields: [phone_number] + output_fields: [phone_number] + parameters: + identifier: aes-128-gcm96 +``` + ### Eval Eval function is used to adds or removes fields from events by evaluating an value expression. @@ -383,6 +409,29 @@ Example: CITY: server_administrative_area ``` +### HMAC + +HMAC function is used to generate the hash-based message authentication code (HMAC) by the specified algorithm. + +```HMAC(filter, lookup_fields, output_fields[, parameters])``` +- filter: optional +- lookup_fields: required +- output_fields: required +- parameters: required + - secret_key: `<String>` required. The secret key used to generate the HMAC. + - output_format: `<String>` required. Enum: `HEX`, `BASE64`. Default is `BASE64`. + +Example: + +```yaml + - function: HMAC + lookup_fields: [phone_number] + output_fields: [phone_number_hmac] + parameters: + secret_key: abcdefg + output_format: BASE64 +``` + ### JSON Extract JSON extract function is used to extract the value from json string. @@ -604,4 +653,5 @@ Example: output_fields: [log_uuid] ``` -Result: such as 2ed6657d-e927-568b-95e1-2665a8aea6a2.
\ No newline at end of file +Result: such as 2ed6657d-e927-568b-95e1-2665a8aea6a2. + @@ -23,7 +23,7 @@ </modules> <properties> - <revision>1.7.0-SNAPSHOT</revision> + <revision>1.7.1-SNAPSHOT</revision> <java.version>11</java.version> <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> <maven.compiler.source>${java.version}</maven.compiler.source> |
