summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authordoufenghu <[email protected]>2024-02-05 11:46:20 +0800
committerdoufenghu <[email protected]>2024-02-05 11:46:20 +0800
commitd4fe431ca1104b44bc620c7cb69a08a2c52842e1 (patch)
treef324c6d46ec63792819e55825cf6156bd23b1554
parentef5d5bb503a7db64be92cdf6078e5edb27ba9023 (diff)
Update docs
-rw-r--r--README.md11
-rw-r--r--docs/processor/projection-processor.md16
-rw-r--r--docs/processor/udf.md3
3 files changed, 15 insertions, 15 deletions
diff --git a/README.md b/README.md
index b76e5af..92c364b 100644
--- a/README.md
+++ b/README.md
@@ -26,16 +26,17 @@ Groot Stream is designed to simplify the operation of ETL (Extract, Transform, L
Configure a job, you'll set up Sources, Filters, Processing Pipeline, and Sinks, and will assemble several built-in functions into a Processing Pipeline. The job will then be deployed to a Flink cluster for execution.
- **Source**: The data source of the job, which can be a Kafka topic, a IPFIX Collector, or a file.
- **Filter**: Filters data based on specified conditions.
-- **Pipelines**: The fundamental unit of data stream processing is the processor, categorized by functionality into stateless and stateful processors. Each processor can be assemble `UDFs`(User-defined functions) or `UDAFs`(User-defined aggregation functions) into a pipeline. The detail of processor is listed in [Processor](docs/processor).
- - **Pre-processing Pipeline**: Optional. Processes data before it enters the processing pipeline.
- - **Processing Pipeline**: Core data transformation pipeline.
- - **Post-processing Pipeline**: Optional. Processes data after it exits the processing pipeline.
+- **Types of Pipelines**: The fundamental unit of data stream processing is the processor, categorized by functionality into stateless and stateful processors. Each processor can be assemble `UDFs`(User-defined functions) or `UDAFs`(User-defined aggregation functions) into a pipeline. There are 3 types of pipelines at different stages of the data processing process:
+ - **Pre-processing Pipeline**: Optional. These pipelines that are attached to a source to normalize the events before they enter the processing pipeline.
+ - **Processing Pipeline**: Event processing pipeline.
+ - **Post-processing Pipeline**: Optional. These pipelines that are attached to a sink to normalize the events before they're written to the sink.
- **Sink**: The data sink of the job, which can be a Kafka topic, a ClickHouse table, or a file.
-## Supported Connectors & Functions
+## Supported Connectors & Processors & Functions
- [Source Connectors](docs/connector/source)
- [Sink Connectors](docs/connector/sink)
+- [Processor](docs/processor)
- [Functions](docs/processor/udf.md)
## Minimum Requirements
diff --git a/docs/processor/projection-processor.md b/docs/processor/projection-processor.md
index 0d4f4c9..65c7545 100644
--- a/docs/processor/projection-processor.md
+++ b/docs/processor/projection-processor.md
@@ -1,17 +1,17 @@
# Projection Processor
> Processing pipelines for projection processor
## Description
-Projection processor is used to project the data from source to sink. It can be used to filter the fields, rename the fields, and add the fields.
-The projection processor is a part of the processing pipeline. It can be used in the pre-processing pipeline, processing pipeline, and post-processing pipeline.
-Each processor can assemble UDFs(User-defined functions) into a pipeline. More UDF detail can be found in [UDF](udf.md).
+Projection processor is used to project the data from source to sink. It can be used to filter, remove, and transform fields.
+It is a part of the processing pipeline. It can be used in the pre-processing, processing, and post-processing pipeline. Each processor can assemble UDFs(User-defined functions) into a pipeline.
+Within the pipeline, events are processed by each Function in order, top‑>down. The UDF usage detail can be found in [UDF](udf.md).
## Options
-| name | type | required | default value |
-|----------------|---------|----------|---------------------------------------------------------------------------------------------------------------|
+| name | type | required | default value |
+|----------------|---------|----------|----------------------------------------------------------------------------------------------------------------|
| type | String | Yes | The type of the processor, now only support `com.geedgenetworks.core.processor.projection.ProjectionProcessor` |
-| output_fields | Array | No | Array of String. The list of fields that need to be kept. Fields not in the list will be removed |
-| remove_fields | Array | No | Array of String. The list of fields that need to be removed. |
-| functions | Array | No | Array of Object. The list of functions that need to be applied to the data. |
+| output_fields | Array | No | Array of String. The list of fields that need to be kept. Fields not in the list will be removed. |
+| remove_fields | Array | No | Array of String. The list of fields that need to be removed. |
+| functions | Array | No | Array of Object. The list of functions that need to be applied to the data. |
## Usage Example
This example use projection processor to remove the fields `http_request_line`, `http_response_line`, `http_response_content_type` and using DROP function filter all event that `server_ip` is `4.4.4.4`.
diff --git a/docs/processor/udf.md b/docs/processor/udf.md
index 7d77b07..7b69254 100644
--- a/docs/processor/udf.md
+++ b/docs/processor/udf.md
@@ -186,7 +186,7 @@ Example:
```
### Generate String Array
-Generate string array function is used to merge string fields to an array.
+Generate string array function is used to merge multiple fields into a string array. The merged field may be a string or a string array.
```GENERATE_STRING_ARRAY(filter, lookup_fields, output_fields)```
- filter: optional
@@ -382,6 +382,5 @@ _`__timestamp` Internal field, from source ingestion time or current unix timest
-