summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authordoufenghu <[email protected]>2024-11-01 20:40:46 +0800
committerdoufenghu <[email protected]>2024-11-01 20:40:46 +0800
commit5818ed2ac9ca31a35a55f330160a9cf7f63bf6f3 (patch)
tree0d2f00c6d6c1791de8c5588572e0e7fb538803f2
parente25eabde3ccb3f0d52346cb11cac757763c41be8 (diff)
[Improve][docs] Add a description of the new features for version 1.7.1-SNAPSHOT.
-rw-r--r--docs/connector/formats/csv.md11
-rw-r--r--docs/connector/sink/starrocks.md10
-rw-r--r--docs/grootstream-design-cn.md46
-rw-r--r--docs/processor/udaf.md38
-rw-r--r--docs/processor/udf.md52
-rw-r--r--pom.xml2
6 files changed, 143 insertions, 16 deletions
diff --git a/docs/connector/formats/csv.md b/docs/connector/formats/csv.md
index ca8d10b..76769b2 100644
--- a/docs/connector/formats/csv.md
+++ b/docs/connector/formats/csv.md
@@ -4,8 +4,7 @@
>
> ## Description
>
-> The CSV format allows to read and write CSV data based on an CSV schema. Currently, the CSV schema is derived from table schema.
-> **The CSV format must config schema for source/sink**.
+> The CSV format allows for reading and writing CSV data based on a schema. Currently, the CSV schema is derived from the table schema.
| Name | Supported Versions | Maven |
|--------------|--------------------|---------------------------------------------------------------------------------------------------------------------------|
@@ -16,12 +15,12 @@
| Name | Type | Required | Default | Description |
|-----------------------------|-----------|----------|---------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| format | String | Yes | (none) | Specify what format to use, here should be 'csv'. |
-| csv.field.delimiter | String | No | , | Field delimiter character (',' by default), must be single character. You can use backslash to specify special characters, e.g. '\t' represents the tab character. |
-| csv.disable.quote.character | Boolean | No | false | Disabled quote character for enclosing field values (false by default). If true, option 'csv.quote.character' can not be set. |
-| csv.quote.character | String | No | " | Quote character for enclosing field values (" by default). |
+| csv.field.delimiter | String | No | , | Field delimiter character (`,` by default), must be single character. You can use backslash to specify special characters, e.g. '\t' represents the tab character. |
+| csv.disable.quote.character | Boolean | No | false | Disabled quote character for enclosing field values (`false` by default). If true, option `csv.quote.character` can not be set. |
+| csv.quote.character | String | No | " | Quote character for enclosing field values (`"` by default). |
| csv.allow.comments | Boolean | No | false | Ignore comment lines that start with '#' (disabled by default). If enabled, make sure to also ignore parse errors to allow empty rows. |
| csv.ignore.parse.errors | Boolean | No | false | Skip fields and rows with parse errors instead of failing. Fields are set to null in case of errors. |
-| csv.array.element.delimiter | String | No | ; | Array element delimiter string for separating array and row element values (';' by default). |
+| csv.array.element.delimiter | String | No | ; | Array element delimiter string for separating array and row element values (`;` by default). |
| csv.escape.character | String | No | (none) | Escape character for escaping values (disabled by default). |
| csv.null.literal | String | No | (none) | Null literal string that is interpreted as a null value (disabled by default). |
diff --git a/docs/connector/sink/starrocks.md b/docs/connector/sink/starrocks.md
index f07e432..208fa39 100644
--- a/docs/connector/sink/starrocks.md
+++ b/docs/connector/sink/starrocks.md
@@ -1,25 +1,25 @@
# Starrocks
-> Starrocks sink connector
+> StarRocks sink connector
>
> ## Description
>
-> Sink connector for Starrocks, know more in https://docs.starrocks.io/zh/docs/loading/Flink-connector-starrocks/.
+> Sink connector for StarRocks, know more in https://docs.starrocks.io/zh/docs/loading/Flink-connector-starrocks/.
## Sink Options
-Starrocks sink custom properties. If properties belongs to Starrocks Flink Connector Config, you can use `connection.` prefix to set.
+StarRocks sink custom properties. If properties belongs to StarRocks Flink Connector Config, you can use `connection.` prefix to set.
| Name | Type | Required | Default | Description |
|---------------------|---------|----------|---------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| log.failures.only | Boolean | No | true | Optional flag to whether the sink should fail on errors, or only log them; If this is set to true, then exceptions will be only logged, if set to false, exceptions will be eventually thrown, true by default. |
| connection.jdbc-url | String | Yes | (none) | The address that is used to connect to the MySQL server of the FE. You can specify multiple addresses, which must be separated by a comma (,). Format: jdbc:mysql://<fe_host1>:<fe_query_port1>,<fe_host2>:<fe_query_port2>,<fe_host3>:<fe_query_port3>.. |
| connection.load-url | String | Yes | (none) | The address that is used to connect to the HTTP server of the FE. You can specify multiple addresses, which must be separated by a semicolon (;). Format: <fe_host1>:<fe_http_port1>;<fe_host2>:<fe_http_port2>.. |
-| connection.config | Map | No | (none) | Starrocks Flink Connector Options, know more in https://docs.starrocks.io/docs/loading/Flink-connector-starrocks/#options. |
+| connection.config | Map | No | (none) | StarRocks Flink Connector Options, know more in https://docs.starrocks.io/docs/loading/Flink-connector-starrocks/#options. |
## Example
-This example read data of inline test source and write to Starrocks table `test`.
+This example read data of inline test source and write to StarRocks table `test`.
```yaml
sources: # [object] Define connector source
diff --git a/docs/grootstream-design-cn.md b/docs/grootstream-design-cn.md
index 41fcd0d..8579dc8 100644
--- a/docs/grootstream-design-cn.md
+++ b/docs/grootstream-design-cn.md
@@ -114,7 +114,8 @@ grootstream:
vault:
type: vault
url: <vault-url>
- token: <vault-token>
+ username: <vault-username>
+ password: <vault-password>
default_key_path: <default-vault-key-path>
plugin_key_path: <plugin-vault-key-path>
@@ -1295,6 +1296,23 @@ sinks:
format: raw
```
+### CSV
+
+按照既定的Schema读取/写入csv格式数据。
+
+| 属性名 | 必填 | 默认值 | 类型 | 描述 |
+| --------------------------- | ---- | ------ | ------- | ------------------------------------------------------------ |
+| csv.field.delimiter | Y | , | String | 指定字段值之间的分隔符,默认为逗号 |
+| csv.quote.character | N | " | String | 指定用于包围字段值的引号字符,默认为双引号"。如果csv.disable.quote.character为true,无法使用该选项。 |
+| csv.disable.quote.character | N | false | Boolean | 是否禁用包围字段值的引号字符。默认为false |
+| csv.allow.comments | N | false | Boolean | 忽略以 `#` 开头的注释行(默认情况下禁用)。如果启用此选项,确保同时忽略解析错误,以允许存在空行。这意味着在处理 CSV 文件时,任何以 `#` 开头的行都将被视为注释,不会被解析或读取。 |
+| csv.ignore.parse.errors | N | false | Boolean | 忽略解析错误,默认为false。遇到格式错误输出异常日志。 |
+| csv.array.element.delimiter | N | ; | String | 数组中元素的分隔符 |
+| csv.escape.character | N | | String | 转义特殊字符的字符。例如:分隔符、引号或换行符。 |
+| csv.null.literal | N | | String | 指定NULL值的字符串 |
+
+
+
# 任务编排
```yaml
@@ -1480,7 +1498,7 @@ Parameters:
identifier: aes-128-gcm96
```
-Note : 读取任务变量`projection.encrypt.schema.registry.uri`,返回加密字段,数据类型为Array。
+Note : 读取任务变量`projection.encrypt.schema.registry.uri`,返回加密的字段,数据类型为Array。
#### Eval
@@ -1621,7 +1639,7 @@ Parameters:
- secret_key = `<string>` 用于生成MAC的密钥。
- algorithm= `<string>` 用于生成MAC的HASH算法。默认是`sha256`
-- output_format = `<string>` 输出MAC的格式。默认为`'hex'` 。支持:`base64` | `hex `。
+- output_format = `<string>` 输出MAC的格式。默认为`'base64'` 。支持:`base64` | `hex `。
```
- function: HMAC
@@ -1850,6 +1868,28 @@ Parameters:
output_fields: [ sessions ]
```
+
+
+ #### Max
+
+在时间窗口内获取最大值
+
+```yaml
+- function: MAX
+ lookup_fields: [ received_time ]
+ output_fields: [ received_time ]
+```
+
+ #### Min
+
+在时间窗口内获取最小值
+
+```yaml
+- function: MIN
+ lookup_fields: [ received_time ]
+ output_fields: [ received_time ]
+```
+
#### Mean
在时间窗口内对指定的数值对象求平均值。
diff --git a/docs/processor/udaf.md b/docs/processor/udaf.md
index 66d6ad5..f305201 100644
--- a/docs/processor/udaf.md
+++ b/docs/processor/udaf.md
@@ -9,7 +9,9 @@
- [First Value](#First-Value)
- [Last Value](#Last-Value)
- [Long Count](#Long-Count)
+- [Max](#Max)
- [MEAN](#Mean)
+- [Min](#Min)
- [Number SUM](#Number-SUM)
- [HLLD](#HLLD)
- [Approx Count Distinct HLLD](#Approx-Count-Distinct-HLLD)
@@ -116,6 +118,23 @@ Example
output_fields: [sessions]
```
+### Max
+
+MAX is used to get the maximum value of the field in the group of events.
+
+```MAX(filter, lookup_fields, output_fields)```
+- filter: optional
+- lookup_fields: required. Now only support one field.
+- output_fields: optional. If not set, the output field name is `lookup_field_name`.
+
+Example
+
+```yaml
+- function: MAX
+ lookup_fields: [receive_time]
+ output_fields: [receive_time]
+```
+
### Mean
MEAN is used to calculate the mean value of the field in the group of events. The lookup field value must be a number.
@@ -135,6 +154,25 @@ Example
output_fields: [received_bytes_mean]
```
+
+### Min
+
+MIN is used to get the minimum value of the field in the group of events.
+
+```MIN(filter, lookup_fields, output_fields)```
+- filter: optional
+- lookup_fields: required. Now only support one field.
+- output_fields: optional. If not set, the output field name is `lookup_field_name`.
+
+Example
+
+```yaml
+- function: MIN
+ lookup_fields: [receive_time]
+ output_fields: [receive_time]
+```
+
+
### Number SUM
NUMBER_SUM is used to sum the value of the field in the group of events. The lookup field value must be a number.
diff --git a/docs/processor/udf.md b/docs/processor/udf.md
index e480275..7f5c656 100644
--- a/docs/processor/udf.md
+++ b/docs/processor/udf.md
@@ -10,11 +10,13 @@
- [Current Unix Timestamp](#current-unix-timestamp)
- [Domain](#domain)
- [Drop](#drop)
+- [Encrypt](#encrypt)
- [Eval](#eval)
- [Flatten](#flatten)
- [From Unix Timestamp](#from-unix-timestamp)
- [Generate String Array](#generate-string-array)
- [GeoIP Lookup](#geoip-lookup)
+- [HMAC](#hmac)
- [JSON Extract](#json-extract)
- [Path Combine](#path-combine)
- [Rename](#rename)
@@ -174,6 +176,30 @@ Example:
filter: event.server_ip == '4.4.4.4'
```
+### Encrypt
+
+Encrypt function is used to encrypt the field value by the specified algorithm.
+
+Note: This feature allows you to use a third-party RESTful API to retrieve encrypted fields. By using these fields as criteria, you can determine whether the current field is encrypted. You must also set the projection.encrypt.schema.registry.uri as a job property.
+For example, setting `projection.encrypt.schema.registry.uri=127.0.0.1:9999/v1/schema/session_record?option=encrypt_fields` will return the encrypted fields in an array format.
+
+```ENCRYPT(filter, lookup_fields, output_fields[, parameters])```
+- filter: optional
+- lookup_fields: required
+- output_fields: required
+- parameters: required
+ - identifier: `<String>` required. The identifier of the encryption algorithm. Supports `aes-128-gcm96`, `aes-256-gcm96`, and `sm4-gcm96`.
+
+Example:
+Encrypt the phone number by the AES-128-GCM96 algorithm. Here phone_number will replace the original value with the encrypted value.
+```yaml
+- function: ENCRYPT
+ lookup_fields: [phone_number]
+ output_fields: [phone_number]
+ parameters:
+ identifier: aes-128-gcm96
+```
+
### Eval
Eval function is used to adds or removes fields from events by evaluating an value expression.
@@ -383,6 +409,29 @@ Example:
CITY: server_administrative_area
```
+### HMAC
+
+HMAC function is used to generate the hash-based message authentication code (HMAC) by the specified algorithm.
+
+```HMAC(filter, lookup_fields, output_fields[, parameters])```
+- filter: optional
+- lookup_fields: required
+- output_fields: required
+- parameters: required
+ - secret_key: `<String>` required. The secret key used to generate the HMAC.
+ - output_format: `<String>` required. Enum: `HEX`, `BASE64`. Default is `BASE64`.
+
+Example:
+
+```yaml
+ - function: HMAC
+ lookup_fields: [phone_number]
+ output_fields: [phone_number_hmac]
+ parameters:
+ secret_key: abcdefg
+ output_format: BASE64
+```
+
### JSON Extract
JSON extract function is used to extract the value from json string.
@@ -604,4 +653,5 @@ Example:
output_fields: [log_uuid]
```
-Result: such as 2ed6657d-e927-568b-95e1-2665a8aea6a2. \ No newline at end of file
+Result: such as 2ed6657d-e927-568b-95e1-2665a8aea6a2.
+
diff --git a/pom.xml b/pom.xml
index cdf6569..36eadf4 100644
--- a/pom.xml
+++ b/pom.xml
@@ -23,7 +23,7 @@
</modules>
<properties>
- <revision>1.7.0-SNAPSHOT</revision>
+ <revision>1.7.1-SNAPSHOT</revision>
<java.version>11</java.version>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<maven.compiler.source>${java.version}</maven.compiler.source>