diff options
| author | doufenghu <[email protected]> | 2024-02-05 11:46:20 +0800 |
|---|---|---|
| committer | doufenghu <[email protected]> | 2024-02-05 11:46:20 +0800 |
| commit | d4fe431ca1104b44bc620c7cb69a08a2c52842e1 (patch) | |
| tree | f324c6d46ec63792819e55825cf6156bd23b1554 | |
| parent | ef5d5bb503a7db64be92cdf6078e5edb27ba9023 (diff) | |
Update docs
| -rw-r--r-- | README.md | 11 | ||||
| -rw-r--r-- | docs/processor/projection-processor.md | 16 | ||||
| -rw-r--r-- | docs/processor/udf.md | 3 |
3 files changed, 15 insertions, 15 deletions
@@ -26,16 +26,17 @@ Groot Stream is designed to simplify the operation of ETL (Extract, Transform, L Configure a job, you'll set up Sources, Filters, Processing Pipeline, and Sinks, and will assemble several built-in functions into a Processing Pipeline. The job will then be deployed to a Flink cluster for execution. - **Source**: The data source of the job, which can be a Kafka topic, a IPFIX Collector, or a file. - **Filter**: Filters data based on specified conditions. -- **Pipelines**: The fundamental unit of data stream processing is the processor, categorized by functionality into stateless and stateful processors. Each processor can be assemble `UDFs`(User-defined functions) or `UDAFs`(User-defined aggregation functions) into a pipeline. The detail of processor is listed in [Processor](docs/processor). - - **Pre-processing Pipeline**: Optional. Processes data before it enters the processing pipeline. - - **Processing Pipeline**: Core data transformation pipeline. - - **Post-processing Pipeline**: Optional. Processes data after it exits the processing pipeline. +- **Types of Pipelines**: The fundamental unit of data stream processing is the processor, categorized by functionality into stateless and stateful processors. Each processor can be assemble `UDFs`(User-defined functions) or `UDAFs`(User-defined aggregation functions) into a pipeline. There are 3 types of pipelines at different stages of the data processing process: + - **Pre-processing Pipeline**: Optional. These pipelines that are attached to a source to normalize the events before they enter the processing pipeline. + - **Processing Pipeline**: Event processing pipeline. + - **Post-processing Pipeline**: Optional. These pipelines that are attached to a sink to normalize the events before they're written to the sink. - **Sink**: The data sink of the job, which can be a Kafka topic, a ClickHouse table, or a file. -## Supported Connectors & Functions +## Supported Connectors & Processors & Functions - [Source Connectors](docs/connector/source) - [Sink Connectors](docs/connector/sink) +- [Processor](docs/processor) - [Functions](docs/processor/udf.md) ## Minimum Requirements diff --git a/docs/processor/projection-processor.md b/docs/processor/projection-processor.md index 0d4f4c9..65c7545 100644 --- a/docs/processor/projection-processor.md +++ b/docs/processor/projection-processor.md @@ -1,17 +1,17 @@ # Projection Processor > Processing pipelines for projection processor ## Description -Projection processor is used to project the data from source to sink. It can be used to filter the fields, rename the fields, and add the fields. -The projection processor is a part of the processing pipeline. It can be used in the pre-processing pipeline, processing pipeline, and post-processing pipeline. -Each processor can assemble UDFs(User-defined functions) into a pipeline. More UDF detail can be found in [UDF](udf.md). +Projection processor is used to project the data from source to sink. It can be used to filter, remove, and transform fields. +It is a part of the processing pipeline. It can be used in the pre-processing, processing, and post-processing pipeline. Each processor can assemble UDFs(User-defined functions) into a pipeline. +Within the pipeline, events are processed by each Function in order, top‑>down. The UDF usage detail can be found in [UDF](udf.md). ## Options -| name | type | required | default value | -|----------------|---------|----------|---------------------------------------------------------------------------------------------------------------| +| name | type | required | default value | +|----------------|---------|----------|----------------------------------------------------------------------------------------------------------------| | type | String | Yes | The type of the processor, now only support `com.geedgenetworks.core.processor.projection.ProjectionProcessor` | -| output_fields | Array | No | Array of String. The list of fields that need to be kept. Fields not in the list will be removed | -| remove_fields | Array | No | Array of String. The list of fields that need to be removed. | -| functions | Array | No | Array of Object. The list of functions that need to be applied to the data. | +| output_fields | Array | No | Array of String. The list of fields that need to be kept. Fields not in the list will be removed. | +| remove_fields | Array | No | Array of String. The list of fields that need to be removed. | +| functions | Array | No | Array of Object. The list of functions that need to be applied to the data. | ## Usage Example This example use projection processor to remove the fields `http_request_line`, `http_response_line`, `http_response_content_type` and using DROP function filter all event that `server_ip` is `4.4.4.4`. diff --git a/docs/processor/udf.md b/docs/processor/udf.md index 7d77b07..7b69254 100644 --- a/docs/processor/udf.md +++ b/docs/processor/udf.md @@ -186,7 +186,7 @@ Example: ``` ### Generate String Array -Generate string array function is used to merge string fields to an array. +Generate string array function is used to merge multiple fields into a string array. The merged field may be a string or a string array. ```GENERATE_STRING_ARRAY(filter, lookup_fields, output_fields)``` - filter: optional @@ -382,6 +382,5 @@ _`__timestamp` Internal field, from source ingestion time or current unix timest - |
