diff options
| author | doufenghu <[email protected]> | 2024-09-10 20:05:06 +0800 |
|---|---|---|
| committer | doufenghu <[email protected]> | 2024-09-10 20:05:06 +0800 |
| commit | 4bb87b62cd7d3dd12bd19e643aaffda53e35e57a (patch) | |
| tree | 90eb508ca3cd69e9a48531237c9e713e704a8a1c /docs/processor | |
| parent | af6b8ab5e619be383b0597a2a8aaa47341d05f2f (diff) | |
[Feature][docs] Add split operator description.
Diffstat (limited to 'docs/processor')
| -rw-r--r-- | docs/processor/aggregate-processor.md | 23 | ||||
| -rw-r--r-- | docs/processor/split-processor.md | 49 |
2 files changed, 61 insertions, 11 deletions
diff --git a/docs/processor/aggregate-processor.md b/docs/processor/aggregate-processor.md index 5ab0ae0..afc26f6 100644 --- a/docs/processor/aggregate-processor.md +++ b/docs/processor/aggregate-processor.md @@ -10,17 +10,18 @@ Within the pipeline, events are processed by each Function in order, top‑>down ## Options Note:Default will output internal fields `__window_start_timestamp` and `__window_end_timestamp` if not set output_fields. -| name | type | required | default value | -|--------------------------|--------|----------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| type | String | Yes | The type of the processor, now only support `com.geedgenetworks.core.processor.aggregate.AggregateProcessor` | -| output_fields | Array | No | Array of String. The list of fields that need to be kept. Fields not in the list will be removed. | -| remove_fields | Array | No | Array of String. The list of fields that need to be removed. | -| group_by_fields | Array | yes | Array of String. The list of fields that need to be grouped. | -| window_type | String | yes | The type of window, now only support `tumbling_processing_time`, `tumbling_event_time`, `sliding_processing_time`, `sliding_event_time`. if window_type is `tumbling/sliding_event_time,` you need to set watermark. | -| window_size | Long | yes | The duration of the window in seconds. | -| window_slide | Long | yes | The duration of the window slide in seconds. | -| window_timestamp_field | String | No | Set the output timestamp field name, with the unit in seconds. It is mapped to the internal field __window_start_timestamp. | -| functions | Array | No | Array of Object. The list of functions that need to be applied to the data. | +| name | type | required | default value | +|------------------------|-----------|----------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| type | String | Yes | The type of the processor, now only support `com.geedgenetworks.core.processor.aggregate.AggregateProcessor` | +| output_fields | Array | No | Array of String. The list of fields that need to be kept. Fields not in the list will be removed. | +| remove_fields | Array | No | Array of String. The list of fields that need to be removed. | +| group_by_fields | Array | yes | Array of String. The list of fields that need to be grouped. | +| window_type | String | yes | The type of window, now only support `tumbling_processing_time`, `tumbling_event_time`, `sliding_processing_time`, `sliding_event_time`. if window_type is `tumbling/sliding_event_time,` you need to set watermark. | +| window_size | Long | yes | The duration of the window in seconds. | +| window_slide | Long | yes | The duration of the window slide in seconds. | +| window_timestamp_field | String | No | Set the output timestamp field name, with the unit in seconds. It is mapped to the internal field __window_start_timestamp. | +| mini_batch | Boolean | No | Specifies whether to enable local aggregate optimization. The default value is false. This can significantly reduce the state overhead and get a better throughput. | +| functions | Array | No | Array of Object. The list of functions that need to be applied to the data. | ## Usage Example diff --git a/docs/processor/split-processor.md b/docs/processor/split-processor.md new file mode 100644 index 0000000..e1a1163 --- /dev/null +++ b/docs/processor/split-processor.md @@ -0,0 +1,49 @@ +# Split Processor + +> Split the output of a data processing pipeline into multiple streams based on certain conditions. + +## Description + +Using the flink side Outputs send data from a stream to multiple downstream consumers. This is useful when you want to separate or filter certain elements of a stream without disrupting the main processing flow. For example, side outputs can be used for error handling, conditional routing, or extracting specific subsets of the data. + +## Options + +| name | type | required | default value | +|-------------------|--------|----------|--------------------------------------------------------------------------------------------| +| type | String | Yes | The type of the processor, now only support ` com.geedgenetworks.core.split.SplitOperator` | +| rules | Array | Yes | Array of Object. Defining rules for labeling Side Output Tag | +| [rule.]tag | String | Yes | The tag name of the side output | +| [rule.]expression | String | Yes | The expression to evaluate the event. | + +## Usage Example + +This example uses a split processor to split the data into two streams based on the value of the `decoded_as` field. + +```yaml +splits: + decoded_as_split: + type: split + rules: + - tag: http_tag + expression: event.decoded_as == 'HTTP' + - tag: dns_tag + expression: event.decoded_as == 'DNS' + + +topology: + - name: inline_source + downstream: [decoded_as_split] + - name: decoded_as_split + tags: [http_tag, dns_tag] + downstream: [ projection_processor, aggregate_processor] + - name: projection_processor + downstream: [ print_sink ] + - name: aggregate_processor + downstream: [ print_sink ] + - name: print_sink + downstream: [] +``` + + + + |
