summaryrefslogtreecommitdiff
path: root/docs
diff options
context:
space:
mode:
authordoufenghu <[email protected]>2024-10-29 20:42:50 +0800
committerdoufenghu <[email protected]>2024-10-29 20:42:50 +0800
commitd2579028fb90bd60ca9e5f9fa36cbde8a6db8872 (patch)
tree062db25f5d5740cc76a6a66edc2ef3484b624614 /docs
parent06975ee829f9395f095a12c10eaedffcd89b3d83 (diff)
[Improve][core] Add CheckUDFContextUtil for verifying UDF configurations. Rename lookup_fields and output_fields to lookupFields and outputFields.
Diffstat (limited to 'docs')
-rw-r--r--docs/develop-guide.md4
-rw-r--r--docs/processor/udf.md15
2 files changed, 10 insertions, 9 deletions
diff --git a/docs/develop-guide.md b/docs/develop-guide.md
index 75e8803..927d2d3 100644
--- a/docs/develop-guide.md
+++ b/docs/develop-guide.md
@@ -21,7 +21,7 @@ Groot Stream based all stream processing on data records common known as events.
```json
{
"__timestamp": "<Timestamp in UNIX epoch format (milliseconds)>",
- "__input_id": "ID/Name of the source that delivered the event",
+ "__headers": "Map<String, String> headers of the source that delivered the event",
"__window_start_timestamp" : "<Timestamp in UNIX epoch format (milliseconds)>",
"__window_end_timestamp" : "<Timestamp in UNIX epoch format (milliseconds)>",
"key1": "<value1>",
@@ -35,7 +35,7 @@ Groot Stream add internal fields during pipeline processing. A few notes about i
- Treat internal fields as read-only. Modifying them can result in unintended consequences to your data flows.
- Internal fields only exist for the duration of the event processing pipeline. They are not documented under sources or sinks.
- If you do not configure a timestamp for extraction, the Pipeline process assigns the current time (in UNIX epoch format) to the __timestamp field.
-- If you have multiple sources, you can determine which source the event came form by looking at the `__input_id` field. For example, the Kafka source adds the topic name to the `__input_id` field.
+- If you have multiple sources, you can determine the origin of the event by examining the `__headers` field. For example, the Kafka source appends the topic name as the `__input_id` key in the `__headers`.
## How to write a high quality Git commit message
diff --git a/docs/processor/udf.md b/docs/processor/udf.md
index 0475192..e480275 100644
--- a/docs/processor/udf.md
+++ b/docs/processor/udf.md
@@ -96,18 +96,19 @@ Base64 encode function is commonly used to encode the binary data to base64 stri
```BASE64_ENCODE_TO_STRING(filter, output_fields[, parameters])```
- filter: optional
-- lookup_fields: not required
+- lookup_fields: required
- output_fields: required
- parameters: required
- - value_field: `<String>` required.
+ - input_type: `<String>` required. Enum: `string`, `byte_array`. The input type of the value field.
Example:
```yaml
- function: BASE64_ENCODE_TO_STRING
+ lookup_fields: [packet]
output_fields: [packet]
parameters:
- value_field: packet
+ input_type: string
```
### Current Unix Timestamp
@@ -141,7 +142,7 @@ Domain function is used to extract the domain from the url.
- parameters: required
- option: `<String>` required. Enum: `TOP_LEVEL_DOMAIN`, `FIRST_SIGNIFICANT_SUBDOMAIN`.
-#### Option
+**Option**
- `TOP_LEVEL_DOMAIN` is used to extract the top level domain from the url. For example, `www.abc.com` will be extracted to `com`.
- `FIRST_SIGNIFICANT_SUBDOMAIN` is used to extract the first significant subdomain from the url. For example, `www.abc.com` will be extracted to `abc.com`.
@@ -283,7 +284,7 @@ From unix timestamp function is used to convert the unix timestamp to date time
- parameters: optional
- precision: `<String>` optional. Default is `seconds`. Enum: `milliseconds`, `seconds`.
-#### Precision
+**Precision**
- `milliseconds` is used to convert the unix timestamp to milliseconds date time string. For example, `1619712000` will be converted to `2021-04-30 00:00:00.000`.
- `seconds` is used to convert the unix timestamp to seconds date time string. For example, `1619712000` will be converted to `2021-04-30 00:00:00`.
@@ -336,7 +337,7 @@ GeoIP lookup function is used to lookup the geoip information by ip address. You
- ISP: `<String>` optional.
- ORGANIZATION: `<String>` optional.
-#### Option
+**Option**
- `IP_TO_COUNTRY` is used to lookup the country or region information by ip address.
- `IP_TO_PROVINCE` is used to lookup the province or state information by ip address.
@@ -348,7 +349,7 @@ GeoIP lookup function is used to lookup the geoip information by ip address. You
- `IP_TO_JSON` is used to lookup the above information by ip address. The result is a json string.
- `IP_TO_OBJECT` is used to lookup the above information by ip address. The result is a `LocationResponse` object.
-#### GeoLocation Field Mapping
+**GeoLocation Field Mapping**
- `COUNTRY` is used to map the country information to the event field.
- `PROVINCE` is used to map the province information to the event field.