Update docs

author: doufenghu <[email protected]> 2024-02-05 11:46:20 +0800
committer: doufenghu <[email protected]> 2024-02-05 11:46:20 +0800
commit: d4fe431ca1104b44bc620c7cb69a08a2c52842e1 (patch)
tree: f324c6d46ec63792819e55825cf6156bd23b1554
parent: ef5d5bb503a7db64be92cdf6078e5edb27ba9023 (diff)
3 files changed, 15 insertions, 15 deletions
diff --git a/README.md b/README.md
index b76e5af..92c364b 100644
--- a/README.md
+++ b/README.md
@@ -26,16 +26,17 @@ Groot Stream is designed to simplify the operation of ETL (Extract, Transform, L
 Configure a job, you'll set up Sources, Filters, Processing Pipeline, and Sinks, and will assemble several built-in functions into a Processing Pipeline. The job will then be deployed to a Flink cluster for execution. 
 - **Source**: The data source of the job, which can be a Kafka topic, a IPFIX Collector, or a file.
 - **Filter**: Filters data based on specified conditions.
-- **Pipelines**: The fundamental unit of data stream processing is the processor, categorized by functionality into stateless and stateful processors. Each processor can be assemble `UDFs`(User-defined functions) or `UDAFs`(User-defined aggregation functions) into a pipeline. The detail of processor is listed in [Processor](docs/processor).
-  - **Pre-processing Pipeline**: Optional. Processes data before it enters the processing pipeline.
-  - **Processing Pipeline**: Core data transformation pipeline.
-  - **Post-processing Pipeline**: Optional. Processes data after it exits the processing pipeline.
+- **Types of Pipelines**: The fundamental unit of data stream processing is the processor, categorized by functionality into stateless and stateful processors. Each processor can be assemble `UDFs`(User-defined functions) or `UDAFs`(User-defined aggregation functions) into a pipeline. There are 3 types of pipelines at different stages of the data processing process:
+  - **Pre-processing Pipeline**: Optional. These pipelines that are attached to a source to normalize the events before they enter the processing pipeline.
+  - **Processing Pipeline**: Event processing pipeline.
+  - **Post-processing Pipeline**: Optional. These pipelines that are attached to a sink to normalize the events before they're written to the sink.
 - **Sink**: The data sink of the job, which can be a Kafka topic, a ClickHouse table, or a file.
 
-## Supported Connectors & Functions
+## Supported Connectors & Processors & Functions
 
 - [Source Connectors](docs/connector/source)
 - [Sink Connectors](docs/connector/sink)
+- [Processor](docs/processor)
 - [Functions](docs/processor/udf.md)
 
 ## Minimum Requirements
diff --git a/docs/processor/projection-processor.md b/docs/processor/projection-processor.md
index 0d4f4c9..65c7545 100644
--- a/docs/processor/projection-processor.md
+++ b/docs/processor/projection-processor.md
@@ -1,17 +1,17 @@
 # Projection Processor
 > Processing pipelines for projection processor
 ## Description
-Projection processor is used to project the data from source to sink. It can be used to filter the fields, rename the fields, and add the fields.
-The projection processor is a part of the processing pipeline. It can be used in the pre-processing pipeline, processing pipeline, and post-processing pipeline.
-Each processor can assemble UDFs(User-defined functions) into a pipeline. More UDF detail can be found in [UDF](udf.md).
+Projection processor is used to project the data from source to sink. It can be used to filter, remove, and transform fields.
+It is a part of the processing pipeline. It can be used in the pre-processing, processing, and post-processing pipeline. Each processor can assemble UDFs(User-defined functions) into a pipeline.
+Within the pipeline, events are processed by each Function in order, top‑>down. The UDF usage detail can be found in [UDF](udf.md).
 ## Options
 
-| name           | type    | required | default value                                                                                                 |
-|----------------|---------|----------|---------------------------------------------------------------------------------------------------------------|
+| name           | type    | required | default value                                                                                                  |
+|----------------|---------|----------|----------------------------------------------------------------------------------------------------------------|
 | type           | String  | Yes      | The type of the processor, now only support `com.geedgenetworks.core.processor.projection.ProjectionProcessor` |
-| output_fields  | Array   | No       | Array of String. The list of fields that need to be kept. Fields not in the list will be removed              |
-| remove_fields  | Array   | No       | Array of String. The list of fields that need to be removed.                                                  |
-| functions      | Array   | No       | Array of Object. The list of functions that need to be applied to the data.                                |
+| output_fields  | Array   | No       | Array of String. The list of fields that need to be kept. Fields not in the list will be removed.              |
+| remove_fields  | Array   | No       | Array of String. The list of fields that need to be removed.                                                   |
+| functions      | Array   | No       | Array of Object. The list of functions that need to be applied to the data.                                    |
 
 ## Usage Example
 This example use projection processor to remove the fields `http_request_line`, `http_response_line`, `http_response_content_type` and using DROP function filter all event that `server_ip` is `4.4.4.4`.
diff --git a/docs/processor/udf.md b/docs/processor/udf.md
index 7d77b07..7b69254 100644
--- a/docs/processor/udf.md
+++ b/docs/processor/udf.md
@@ -186,7 +186,7 @@ Example:
 ```
 
 ### Generate String Array
-Generate string array function is used to merge string fields to an array.
+Generate string array function is used to merge multiple fields into a string array. The merged field may be a string or a string array.
 
 ```GENERATE_STRING_ARRAY(filter, lookup_fields, output_fields)```
 - filter: optional
@@ -382,6 +382,5 @@ _`__timestamp` Internal field, from source ingestion time or current unix timest
 
 
 
-
author	doufenghu <[email protected]>	2024-02-05 11:46:20 +0800
committer	doufenghu <[email protected]>	2024-02-05 11:46:20 +0800
commit	d4fe431ca1104b44bc620c7cb69a08a2c52842e1 (patch)
tree	f324c6d46ec63792819e55825cf6156bd23b1554
parent	ef5d5bb503a7db64be92cdf6078e5edb27ba9023 (diff)