[Improve][docs] Improve 1.0.0 release docs.

author: doufenghu <[email protected]> 2024-01-27 12:21:28 +0800
committer: doufenghu <[email protected]> 2024-01-27 12:21:28 +0800
commit: 38424d6655d952dcda4123f97aa66386312151e9 (patch)
tree: 16605cc07f4fe5774c41a8867a1e106aebe4f44c /docs
parent: eb055c2917289b4ce8df0935a43b0b13d87bd561 (diff)
12 files changed, 502 insertions, 19 deletions
diff --git a/docs/connector/formats/protobuf.md b/docs/connector/formats/protobuf.md
index c798447..2efbeff 100644
--- a/docs/connector/formats/protobuf.md
+++ b/docs/connector/formats/protobuf.md
@@ -20,10 +20,19 @@ It is very popular in Streaming Data Pipeline. Now support protobuf format in so
 
 ## Data Type Mapping
 
-| Data Type    | Protobuf Data Type                                                                          | Description |
-|--------------|---------------------------------------------------------------------------------------------|-------------|
-| int / bigint | int32 / int64 / uint32 / uint64 / sint32 / sint64 / fixed32 / fixed64 / sfixed32 / sfixed64 |             |
-| string       | string                                                                                      |             |
+| Protobuf Data Type                              | Data Type | Description                                                                                                            |
+|-------------------------------------------------|-----------|------------------------------------------------------------------------------------------------------------------------|
+| int32 / uint32 / sint32 / fixed32 / sfixed32    | int       | Protobuf data type is recommended to use `int32 / sint32`. The data type also support `int / bigint / float / dobule`. |
+| int64 / uint64 / sint64 / fixed64 / sfixed64    | bigint    | Protobuf data type is recommended to use `int64 / sint64`. The data type also support `int / bigint / float / dobule`. |
+| float                                           | float     | Protobuf data type is recommended to use `double`. The data type also support `int / bigint / float / dobule`.         |
+| double                                          | double    | Protobuf data type is recommended to use `double`. The data type also support `int / bigint / float / dobule`.         |
+| bool                                            | boolean   | Protobuf data type is recommended to use `int32`. The data type also support `boolean / int(0:false,1:true) `.         |
+| enum                                            | int       | Protobuf data type is recommended to use `int32`. The data type also support `int`.                                    |
+| string                                          | string    | In data serialization support all data type converted to `String`.                                                     |
+| bytes                                           | binary    | -                                                                                                                      |
+| message                                         | struct    | -                                                                                                                      |
+| repeated                                        | array     | -                                                                                                                      |
+
 
 # How to use
 ## protobuf uses example
@@ -262,7 +271,7 @@ message SessionRecord {
   string tunnel_endpoint_b_desc = 223;
 }
 ```
-Build protobuf file to binary descriptor file.
+Build protobuf file to binary descriptor file. Only support `proto3` syntax. If data type has null value, need add `optional` keyword. We recommend add `optional` keyword for int and double etc.
 ```shell
 protoc --descriptor_set_out=session_record_test.desc session_record_test.proto
 ```
diff --git a/docs/connector/sink/ClickHouse.md b/docs/connector/sink/clickhouse.md
index 79ba1db..d794767 100644
--- a/docs/connector/sink/ClickHouse.md
+++ b/docs/connector/sink/clickhouse.md
@@ -32,16 +32,16 @@ In order to use the ClickHouse connector, the following dependencies are require
 
 ClickHouse sink custom properties. If properties belongs to ClickHouse JDBC Config, you can use `connection.` prefix to set.
 
-| Name                  | Type     | Required | Default | Description                                                                                                                                                                                                                                                                                                  |
-|-----------------------|----------|----------|---------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| host                  | String   | Yes      | -       | `ClickHouse` cluster address, the format is `host:port` , allowing multiple `hosts` to be specified. Such as `"host1:8123,host2:8123"`.                                                                                                                                                                      |
-| database              | String   | Yes      | -       | The `ClickHouse` database.                                                                                                                                                                                                                                                                                   |
-| table                 | String   | Yes      | -       | The table name.                                                                                                                                                                                                                                                                                              |
-| batch.size            | Integer  | Yes      | 100000  | The number of rows written through [Clickhouse-jdbc](https://github.com/ClickHouse/clickhouse-jdbc) each time, the `default is 20000`.                                                                                                                                                                       |
-| batch.interval        | Duration | Yes      | 30s     | The time interval for writing data through.                                                                                                                                                                                                                                                                  |
-| connection.user       | String   | Yes      | -       | The username to use to connect to `ClickHouse`.                                                                                                                                                                                                                                                              |
-| connection.password   | String   | Yes      | -       | The password to use to connect to `ClickHouse`.                                                                                                                                                                                                                                                              |
-| connection.config     | Map      | No       | -       | In addition to the above mandatory parameters that must be specified by `clickhouse-jdbc` , users can also specify multiple optional parameters, which cover all the [parameters](https://github.com/ClickHouse/clickhouse-jdbc/tree/master/clickhouse-client#configuration) provided by `clickhouse-jdbc`.  |
+| Name                | Type     | Required | Default | Description                                                                                                                                                                                                                                                                                                  |
+|---------------------|----------|----------|---------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| host                | String   | Yes      | -       | `ClickHouse` cluster address, the format is `host:port` , allowing multiple `hosts` to be specified. Such as `"host1:8123,host2:8123"`.                                                                                                                                                                      |
+| connection.database | String   | Yes      | -       | The `ClickHouse` database.                                                                                                                                                                                                                                                                                   |
+| table               | String   | Yes      | -       | The table name.                                                                                                                                                                                                                                                                                              |
+| batch.size          | Integer  | Yes      | 100000  | The number of rows written through [Clickhouse-jdbc](https://github.com/ClickHouse/clickhouse-jdbc) each time, the `default is 20000`.                                                                                                                                                                       |
+| batch.interval      | Duration | Yes      | 30s     | The time interval for writing data through.                                                                                                                                                                                                                                                                  |
+| connection.user     | String   | Yes      | -       | The username to use to connect to `ClickHouse`.                                                                                                                                                                                                                                                              |
+| connection.password | String   | Yes      | -       | The password to use to connect to `ClickHouse`.                                                                                                                                                                                                                                                              |
+| connection.config   | Map      | No       | -       | In addition to the above mandatory parameters that must be specified by `clickhouse-jdbc` , users can also specify multiple optional parameters, which cover all the [parameters](https://github.com/ClickHouse/clickhouse-jdbc/tree/master/clickhouse-client#configuration) provided by `clickhouse-jdbc`.  |
 
 ## Example
 This example read data of inline test source and write to ClickHouse table `test`.
diff --git a/docs/connector/sink/Kafka.md b/docs/connector/sink/kafka.md
index 04dae2f..04dae2f 100644
--- a/docs/connector/sink/Kafka.md
+++ b/docs/connector/sink/kafka.md
diff --git a/docs/connector/sink/Print.md b/docs/connector/sink/print.md
index 271d7a2..271d7a2 100644
--- a/docs/connector/sink/Print.md
+++ b/docs/connector/sink/print.md
diff --git a/docs/connector/source/Inline.md b/docs/connector/source/inline.md
index c91d1a7..c91d1a7 100644
--- a/docs/connector/source/Inline.md
+++ b/docs/connector/source/inline.md
diff --git a/docs/connector/source/IPFIX.md b/docs/connector/source/ipfix.md
index 550a5ab..424aa65 100644
--- a/docs/connector/source/IPFIX.md
+++ b/docs/connector/source/ipfix.md
@@ -13,9 +13,9 @@ IPFIX source custom properties.
 
 | Name                                    | Type    | Required | Default   | Description                                                                                                                                                   |
 |-----------------------------------------|---------|----------|-----------|---------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| .port.range                             | Integer | Yes      | -         | UDP port range                                                                                                                                                |
-| buffer.size                             | Integer | No       | 65535     | The maximum size of packet for UDP                                                                                                                            |
-| receive.buffer.size                     | Integer | No       | 104857600 | UDP receive buffer size in bytes                                                                                                                              |
+| port.range                              | String  | Yes      | -         | Range of ports such as 3000 -3010                                                                                                                             |
+| max.packet.size                         | Integer | No       | 65535     | The maximum size of packet for UDP                                                                                                                            |
+| max.receive.buffer.size                 | Integer | No       | 104857600 | UDP maximum receive buffer size in bytes                                                                                                                      |
 | service.discovery.registry.mode         | String  | No       | -         | Service discovery registry mode, support `0(nacos)` and `1(consul)`                                                                                           |
 | service.discovery.service.name          | String  | No       | -         | Service discovery service name                                                                                                                                |
 | service.discovery.health.check.interval | Integer | No       | -         | Service discovery health check interval in milliseconds                                                                                                       |
diff --git a/docs/connector/source/Kafka.md b/docs/connector/source/kafka.md
index 0565fd4..0565fd4 100644
--- a/docs/connector/source/Kafka.md
+++ b/docs/connector/source/kafka.md
diff --git a/docs/faq.md b/docs/faq.md
new file mode 100644
index 0000000..e69de29
--- /dev/null
+++ b/docs/faq.md
diff --git a/docs/filter/Aviator.md b/docs/filter/aviator.md
index f478af7..8aa136c 100644
--- a/docs/filter/Aviator.md
+++ b/docs/filter/aviator.md
@@ -1,5 +1,5 @@
 # Aviator
-> Aviator filter operator
+> Filter data by AviatorFilter. The event whether drop or pass downstream is based on the Aviator expression.
 ## Description
 AviatorScript is a lightweight, high performance scripting language hosted on the JVM. It uses filter operator that uses the Aviator expression to filter data.
 More details about AviatorScript can be found at [AviatorScript](https://www.yuque.com/boyan-avfmj/aviatorscript).
diff --git a/docs/grootstream-config.md b/docs/grootstream-config.md
new file mode 100644
index 0000000..a359c39
--- /dev/null
+++ b/docs/grootstream-config.md
@@ -0,0 +1,20 @@
+# Groot Stream Config
+The purpose of this file is to provide a global configuration for the groot-stream server, such as the default configuration of the job.
+
+## Config file structure
+
+```yaml
+grootstream: 
+  knowledge_base: # Define the knowledge base list.
+    - name: ${knowledge_base_name} # Define the name of the knowledge base, used to kb function.
+      fs_type: ${file_system_type} # Define the type of the file system.Support: local,hdfs,http.
+      fs_path: ${file_system_path} # Define the path of the file system.
+      files:
+        - ${file_name} # Define the file name of the knowledge base.
+  properties: # Custom parameters.
+    hos.path: ${hos_path}
+    hos.bucket.name.traffic_file: ${traffic_file_bucket}
+    hos.bucket.name.troubleshooting_file: ${troubleshooting_file_bucket}
+    scheduler.knowledge_base.update.interval.minutes: ${knowledge_base_update_interval_minutes}
+
+```
diff --git a/docs/processor/projection-processor.md b/docs/processor/projection-processor.md
new file mode 100644
index 0000000..0d4f4c9
--- /dev/null
+++ b/docs/processor/projection-processor.md
@@ -0,0 +1,67 @@
+# Projection Processor
+> Processing pipelines for projection processor
+## Description
+Projection processor is used to project the data from source to sink. It can be used to filter the fields, rename the fields, and add the fields.
+The projection processor is a part of the processing pipeline. It can be used in the pre-processing pipeline, processing pipeline, and post-processing pipeline.
+Each processor can assemble UDFs(User-defined functions) into a pipeline. More UDF detail can be found in [UDF](udf.md).
+## Options
+
+| name           | type    | required | default value                                                                                                 |
+|----------------|---------|----------|---------------------------------------------------------------------------------------------------------------|
+| type           | String  | Yes      | The type of the processor, now only support `com.geedgenetworks.core.processor.projection.ProjectionProcessor` |
+| output_fields  | Array   | No       | Array of String. The list of fields that need to be kept. Fields not in the list will be removed              |
+| remove_fields  | Array   | No       | Array of String. The list of fields that need to be removed.                                                  |
+| functions      | Array   | No       | Array of Object. The list of functions that need to be applied to the data.                                |
+
+## Usage Example
+This example use projection processor to remove the fields `http_request_line`, `http_response_line`, `http_response_content_type` and using DROP function filter all event that `server_ip` is `4.4.4.4`.
+```yaml
+sources:
+  inline_source:
+    type: inline
+    properties:
+      data: '[{"tcp_rtt_ms":128,"decoded_as":"HTTP","http_version":"http1","http_request_line":"GET / HTTP/1.1","http_host":"www.ct.cn","http_url":"www.ct.cn/","http_user_agent":"curl/8.0.1","http_status_code":200,"http_response_line":"HTTP/1.1 200 OK","http_response_content_type":"text/html; charset=UTF-8","http_response_latency_ms":31,"http_session_duration_ms":5451,"in_src_mac":"ba:bb:a7:3c:67:1c","in_dest_mac":"86:dd:7a:8f:ae:e2","out_src_mac":"86:dd:7a:8f:ae:e2","out_dest_mac":"ba:bb:a7:3c:67:1c","tcp_client_isn":678677906,"tcp_server_isn":1006700307,"address_type":4,"client_ip":"192.11.22.22","server_ip":"8.8.8.8","client_port":42751,"server_port":80,"in_link_id":65535,"out_link_id":65535,"start_timestamp_ms":1703646546127,"end_timestamp_ms":1703646551702,"duration_ms":5575,"sent_pkts":97,"sent_bytes":5892,"received_pkts":250,"received_bytes":333931},{"tcp_rtt_ms":256,"decoded_as":"HTTP","http_version":"http1","http_request_line":"GET / HTTP/1.1","http_host":"www.abc.cn","http_url":"www.cabc.cn/","http_user_agent":"curl/8.0.1","http_status_code":200,"http_response_line":"HTTP/1.1 200 OK","http_response_content_type":"text/html; charset=UTF-8","http_response_latency_ms":31,"http_session_duration_ms":5451,"in_src_mac":"ba:bb:a7:3c:67:1c","in_dest_mac":"86:dd:7a:8f:ae:e2","out_src_mac":"86:dd:7a:8f:ae:e2","out_dest_mac":"ba:bb:a7:3c:67:1c","tcp_client_isn":678677906,"tcp_server_isn":1006700307,"address_type":4,"client_ip":"192.168.10.198","server_ip":"4.4.4.4","client_port":42751,"server_port":80,"in_link_id":65535,"out_link_id":65535,"start_timestamp_ms":1703646546127,"end_timestamp_ms":1703646551702,"duration_ms":2575,"sent_pkts":197,"sent_bytes":5892,"received_pkts":350,"received_bytes":533931}]'
+      format: json
+      json.ignore.parse.errors: false
+
+filters:
+  filter_operator:
+    type: com.geedgenetworks.core.filter.AviatorFilter
+    properties:
+      expression: event.server_ip != '12.12.12.12'
+
+processing_pipelines: # [object] Define Processors
+  projection_processor: # [object] Define projection processor name
+    type: com.geedgenetworks.core.processor.projection.ProjectionProcessorImpl
+    remove_fields: [http_request_line, http_response_line, http_response_content_type]
+    functions: # [array of object] Define UDFs
+      - function: DROP # [string] Define DROP function for filter event
+        lookup_fields: []
+        output_fields: []
+        filter:  event.server_ip == '4.4.4.4'
+
+sinks:
+  print_sink:
+    type: print
+    properties:
+      format: json
+      mode: log_warn
+
+application:
+  env:
+    name: example-inline-to-print
+    parallelism: 3
+    pipeline:
+      object-reuse: true
+  topology:
+    - name: inline_source
+      downstream: [filter_operator]
+    - name: filter_operator
+      downstream: [ projection_processor ]
+    - name: projection_processor
+      downstream: [ print_sink ]
+    - name: print_sink
+      downstream: []
+```
+
+
diff --git a/docs/processor/udf.md b/docs/processor/udf.md
new file mode 100644
index 0000000..7d77b07
--- /dev/null
+++ b/docs/processor/udf.md
@@ -0,0 +1,387 @@
+# UDF
+> The functions for projection processor
+## Function of content
+
+- [Asn Lookup](#asn-lookup)
+- [Base64 Decode](#base64-decode)
+- [Current Unix Timestamp](#current-unix-timestamp)
+- [Domain](#domain)
+- [Drop](#drop)
+- [Eval](#eval)
+- [From Unix Timestamp](#from-unix-timestamp)
+- [Generate String Array](#generate-string-array)
+- [GeoIP Lookup](#geoip-lookup)
+- [JSON Extract](#json-extract)
+- [Path Combine](#path-combine)
+- [Rename](#rename)
+- [Snowflake ID](#snowflake-id)
+- [String Joiner](#string-joiner)
+- [Unix Timestamp Converter](#unix-timestamp-converter)
+
+## Description
+UDF(User Defined Function) is used to extend the functions of projection processor. The UDF is a part of the processing pipeline. It can be used in the pre-processing pipeline, processing pipeline, and post-processing pipeline.
+## UDF Definition
+A UDF includes the following parts: name, event(processing data), context, evaluate function, open function, and close function.
+- name: Function name, with uppercase words separated by underscores, used for function registration.
+- event: The data to be processed. It is organized in a Map<String, Object> structure. 
+- context: Function context, used to store the state of the function. Including the following parameters:
+  - `filter`: Filter expression, string type. It is used to filter events that need to processed by the function. The expression is written in Aviator expression language. For example, `event.server_ip == '.
+  - `lookup_fields`: The fields that need to be used as lookup keys. It is an array of string type. For example, `['server_ip', 'client_ip']`.
+  - `output_fields`: The fields are used to append the result to the event. It is an array of string type. For example, `['server_ip', 'client_ip']`. If the field already exists in the event, the value will be overwritten.
+  - `parameters`: Custom parameters. It is a Map<String, Object> type.
+- evaluate function: The function to process the event. It is a function that returns a Map<String, Object> type. 
+- open function: Initialize the resources used by the function.
+- close function: Release the resources used by the function.
+  
+### Functions
+
+Function define common parameters: `filter`, `lookup_fields`, `output_fields`, `parameters`, and will return a Map<String, Object> value of the event.
+``` FUNCTION_NAME(filter, lookup_fields, output_fields[, parameters])``` 
+
+### Asn Lookup
+Asn lookup function is used to lookup the asn information by ip address. You need to host the `.mmdb` database file from Knowledge Base Repository.
+
+```ASN_LOOKUP(filter, lookup_fields, output_fields[, parameters])```
+- filter: optional
+- lookup_fields: required
+- output_fields: required
+- parameters: required
+  - kb_name: required. The name of the knowledge base.
+  - option: required. Now only support `IP_TO_ASN`.
+
+Example:
+```yaml
+      - function: ASN_LOOKUP
+        lookup_fields: [client_ip]
+        output_fields: [client_asn]
+        parameters:
+          kb_name: tsg_ip_asn
+          option: IP_TO_ASN
+```
+
+### Base64 Decode
+Base64 decode function is used to decode the base64 encoded string.
+
+```BASE64_DECODE(filter, output_fields[, parameters])```
+- filter: optional
+- lookup_fields: not required
+- output_fields: required
+- parameters: required
+  - value_field: `<String>` required. 
+  - charset_field:`<String>` optional. Default is `UTF-8`.
+
+Example:
+```yaml
+      - function: BASE64_DECODE
+        output_fields: [mail_attachment_name]
+        parameters:
+          value_field: mail_attachment_name
+          charset_field: mail_attachment_name_charset
+```
+
+### Current Unix Timestamp
+Current unix timestamp function is used to get the current unix timestamp.
+
+```CURRENT_UNIX_TIMESTAMP(output_fields[, parameters])```  
+- filter: not required
+- lookup_fields: not required
+- output_fields: required
+- parameters: optional
+  - precision: `<String>` optional. Default is `seconds`. Enum: `milliseconds`, `seconds`.
+
+Example:
+```yaml
+      - function: CURRENT_UNIX_TIMESTAMP
+        output_fields: [recv_time]
+        parameters:
+          precision: seconds
+```
+
+### Domain
+Domain function is used to extract the domain from the url.
+
+```DOMAIN(filter, lookup_fields, output_fields[, parameters])```  
+- filter: optional
+- lookup_fields: required. Support more than one fields. All fields will be processed from left to right, and the result will be overwritten if the field processed value is not null.
+- output_fields: required
+- parameters: required
+  - option: `<String>` required. Enum: `TOP_LEVEL_DOMAIN`, `FIRST_SIGNIFICANT_SUBDOMAIN`.
+
+#### Option 
+- `TOP_LEVEL_DOMAIN` is used to extract the top level domain from the url. For example, `www.abc.com` will be extracted to `com`.
+- `FIRST_SIGNIFICANT_SUBDOMAIN` is used to extract the first significant subdomain from the url. For example, `www.abc.com` will be extracted to `abc.com`.
+
+Example:
+
+```yaml
+      - function: DOMAIN
+        lookup_fields: [http_host, ssl_sni, quic_sni] 
+        output_fields: [server_domain]
+        parameters:
+          option: FIRST_SIGNIFICANT_SUBDOMAIN
+```
+
+### Drop
+Drop function is used to filter the event. If the filter expression is true, the event will be dropped. Otherwise, the event will be passed to downstream.
+
+```DROP(filter)```
+- filter: required
+- lookup_fields: not required
+- output_fields: not required
+- parameters: not required
+
+Example:
+```yaml
+      - function: DROP
+        filter:  event.server_ip == '4.4.4.4'
+```
+### Eval
+Eval function is used to adds or removes fields from events by evaluating an value expression.
+
+```EVAL(filter, output_fields[, parameters])```
+- filter: optional
+- lookup_fields: not required
+- output_fields: required
+- parameters: required
+  - value_expression: `<String>` required. Enter a value expression to set the field’s value – this can be a constant. 
+
+Example 1:
+Add a field `ingestion_time` with value `recv_time`:
+```yaml
+      - function: EVAL
+        output_fields: [ingestion_time]
+        parameters:
+          value_expression: recv_time
+```
+Example 2:
+If the value of `direction` is `69`, the value of `internal_ip` will be `client_ip`, otherwise the value of `internal_ip` will be `server_ip`.
+```yaml
+      - function: EVAL
+        output_fields: [internal_ip]
+        parameters:
+          value_expression: 'direction=69 ? client_ip : server_ip'
+```
+
+### From Unix Timestamp
+From unix timestamp function is used to convert the unix timestamp to date time string. The default time zone is UTC+0.
+
+```FROM_UNIX_TIMESTAMP(filter, lookup_fields, output_fields[, parameters])```
+- filter: optional
+- lookup_fields: required
+- output_fields: required
+- parameters: optional
+  - precision: `<String>` optional. Default is `seconds`. Enum: `milliseconds`, `seconds`.
+
+#### Precision 
+- `milliseconds` is used to convert the unix timestamp to milliseconds date time string. For example, `1619712000` will be converted to `2021-04-30 00:00:00.000`.
+- `seconds` is used to convert the unix timestamp to seconds date time string. For example, `1619712000` will be converted to `2021-04-30 00:00:00`.
+
+Example:
+```yaml
+      - function: FROM_UNIX_TIMESTAMP
+        lookup_fields: [recv_time]
+        output_fields: [recv_time_string]
+        parameters:
+          precision: seconds
+```
+
+### Generate String Array
+Generate string array function is used to merge string fields to an array.
+
+```GENERATE_STRING_ARRAY(filter, lookup_fields, output_fields)```
+- filter: optional
+- lookup_fields: required. more than one fields.
+- output_fields: required
+- parameters: not required
+
+Example:
+```yaml
+      - function: GENERATE_STRING_ARRAY
+        lookup_fields: [http_host, ssl_sni, quic_sni] 
+        output_fields: [server_domains]
+```
+### GeoIP Lookup
+GeoIP lookup function is used to lookup the geoip information by ip address. You need to host the `.mmdb` database file from Knowledge Base Repository.
+
+```GEOIP_LOOKUP(filter, lookup_fields, output_fields[, parameters])```  
+- filter: optional
+- lookup_fields: required
+- output_fields: optional
+- parameters: required
+  - kb_name: `<String>` required. The name of the knowledge base.
+  - option: `<String>` required. Enum: `IP_TO_COUNTRY`, `IP_TO_PROVINCE`, `IP_TO_CITY`, `IP_TO_SUBDIVISION_ADDR`, `IP_TO_DETAIL`, `IP_TO_LATLNG`, `IP_TO_PROVIDER`, `IP_TO_JSON`, `IP_TO_OBJECT`.
+  - geolocation_field_mapping : `<Map<String, String>>` optional. The option is required when the option is `IP_TO_OBJECT`. The mapping of the geolocation fields. The key is the field name of the knowledge base , and the value is the field name of the event.
+    - COUNTRY: `<String>` optional. 
+    - PROVINCE: `<String>` optional.
+    - CITY: `<String>` optional.
+    - LONGITUDE: `<String>` optional.
+    - LATITUDE: `<String>` optional.
+    - ISP: `<String>` optional.
+    - ORGANIZATION: `<String>` optional.
+  
+#### Option 
+ - `IP_TO_COUNTRY` is used to lookup the country or region information by ip address.
+ - `IP_TO_PROVINCE` is used to lookup the province or state information by ip address. 
+ - `IP_TO_CITY` is used to lookup the city information by ip address.
+ - `IP_TO_SUBDIVISION_ADDR` is used to lookup the subdivision address information by ip address.
+ - `IP_TO_DETAIL` is used to lookup the above four levels of information by ip address. It separated by `.`.
+ - `IP_TO_LATLNG` is used to lookup the latitude and longitude information by ip address. It separated by `,`.
+ - `IP_TO_PROVIDER` is used to lookup the provider information by ip address. 
+ - `IP_TO_JSON` is used to lookup the above information by ip address. The result is a json string.
+ - `IP_TO_OBJECT` is used to lookup the above information by ip address. The result is a `LocationResponse` object.
+
+#### GeoLocation Field Mapping 
+- `COUNTRY` is used to map the country information to the event field.
+- `PROVINCE` is used to map the province information to the event field.
+- `CITY` is used to map the city information to the event field.
+- `LONGITUDE` is used to map the longitude information to the event field.
+- `LATITUDE` is used to map the latitude information to the event field.
+- `ISP` is used to map the isp information to the event field.
+- `ORGANIZATION` is used to map the organization information to the event field.
+
+Example:
+
+```yaml
+      - function: GEOIP_LOOKUP
+        lookup_fields: [ client_ip ]
+        output_fields: [ client_geolocation ]
+        parameters:
+          kb_name: tsg_ip_location
+          option: IP_TO_DETAIL
+```
+
+### JSON Extract
+JSON extract function is used to extract the value from json string.
+
+```JSON_EXTRACT(filter, lookup_fields, output_fields[, parameters])```
+- filter: optional
+- lookup_fields: required
+- output_fields: required
+- parameters: required
+  - value_expression: `<String>` required. The json path expression.
+
+Example:
+
+```yaml
+    - function: JSON_EXTRACT
+      lookup_fields: [ device_tag ]
+      output_fields: [ device_group ]
+      parameters:
+        value_expression: $.tags[?(@.tag=='device_group')][0].value
+```
+
+### Path Combine
+
+Path combine function is used to combine the file path. The path value can be configuration parameter with prefix `props.` or a constant string.
+
+```PATH_COMBINE(filter, lookup_fields, output_fields[, parameters])```
+- filter: optional
+- lookup_fields: required
+- output_fields: required
+- parameters: required
+  - path: `<Array>` required.
+
+Example:
+    
+```yaml
+    - function: PATH_COMBINE
+      lookup_fields: [ packet_capture_file ]
+      output_fields: [ packet_capture_file ]
+      parameters:
+        path: [ props.hos.path, props.hos.bucket.name.traffic_file, packet_capture_file ]
+```
+
+### Rename
+Rename function is used to rename the field name.
+
+```RENAME(filter, lookup_fields, output_fields)```
+- filter: optional
+- lookup_fields: required
+- output_fields: required
+- parameters: not required
+
+Example:
+```yaml
+      - function: RENAME
+        lookup_fields: [http_domain]
+        output_fields: [server_domain]
+```
+
+
+### Snowflake ID
+
+Snowflake ID function is used to generate the snowflake id. The snowflake id is a 64-bit integer. The snowflake id is composed of the following parts:
+- 1 bit sign bit. The highest bit is 0.
+- 39 bits timestamp. The maximum timestamp that can be represented using 39 bits is 2^39-1 or 549755813887, which comes out to be 17 years, 1 month, 7 days, 20 hours, 31 minutes and 35 seconds. That gives us 17 years with respect to a custom epoch.
+- 13 bits machine id. 8 bits for the worker id and 5 bits for the datacenter id.
+- 11 bits sequence number. The maximum sequence number is 2^11-1 or 2047, which means that a maximum of 2047 IDs can be generated in the same milliseconds in the same machine.
+
+```SNOWFLAKE_ID(filter, output_fields[, parameters])```
+- filter: optional
+- lookup_fields: not required
+- output_fields: required
+- parameters: optional
+  - data_center_id_num: `<Integer>` optional. Default is `0`, range is `0-31`.
+
+Example:
+```yaml
+      - function: SNOWFLAKE_ID
+        output_fields: [log_id]
+        parameters:
+          data_center_id_num: 1
+```
+
+### String Joiner
+
+String joiner function joins multiple string fields using a delimiter, prefix, and suffix.
+
+```STRING_JOINER(filter, lookup_fields, output_fields[, parameters])```
+- filter: optional
+- lookup_fields: required. Support more than one fields.
+- output_fields: required
+- parameters: optional
+  - delimiter: `<String>` optional. Default is `,`.
+  - prefix: `<String>` optional. Default is empty string.
+  - suffix: `<String>` optional. Default is empty string.
+
+Example:
+```yaml
+      - function: STRING_JOINER
+        lookup_fields: [http_host, ssl_sni, quic_sni] 
+        output_fields: [server_domains]
+        parameters:
+          delimiter: ','
+          prefix: '['
+          suffix: ']'
+```
+
+### Unix Timestamp Converter
+
+Unix timestamp converter function is used to convert the unix timestamp precision. 
+
+```UNIX_TIMESTAMP_CONVERTER(filter, lookup_fields, output_fields[, parameters])```
+- filter: optional
+- lookup_fields: required
+- output_fields: required
+- parameters: required
+  - precision: `<String>` required. Enum: `milliseconds`, `seconds`.
+
+Example:
+
+_`__timestamp` Internal field, from source ingestion time or current unix timestamp._
+
+```yaml
+      - function: UNIX_TIMESTAMP_CONVERTER
+        lookup_fields: [__timestamp]
+        output_fields: [recv_time]
+        parameters:
+          precision: seconds
+```
+
+
+
+
+
+
+ 
+
author	doufenghu <[email protected]>	2024-01-27 12:21:28 +0800
committer	doufenghu <[email protected]>	2024-01-27 12:21:28 +0800
commit	38424d6655d952dcda4123f97aa66386312151e9 (patch)
tree	16605cc07f4fe5774c41a8867a1e106aebe4f44c /docs
parent	eb055c2917289b4ce8df0935a43b0b13d87bd561 (diff)