docs/processor/split-processor.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49

# Split Processor

>  Split the output of a data processing pipeline into multiple streams based on certain conditions.

## Description

Using the flink side Outputs send data from a stream to multiple downstream consumers. This is useful when you want to separate or filter certain elements of a stream without disrupting the main processing flow. For example, side outputs can be used for error handling, conditional routing, or extracting specific subsets of the data.

## Options

| name              | type   | required | default value                                                                              |
|-------------------|--------|----------|--------------------------------------------------------------------------------------------|
| type              | String | Yes      | The type of the processor, now only support ` com.geedgenetworks.core.split.SplitOperator` |
| rules             | Array  | Yes      | Array of Object. Defining rules for labeling Side Output Tag                               |
| [rule.]tag        | String | Yes      | The tag name of the side output                                                            |
| [rule.]expression | String | Yes      | The expression to evaluate the event.                                                      |

## Usage Example

This example uses a split processor to split the data into two streams based on the value of the `decoded_as` field.

```yaml
splits:
  decoded_as_split:
    type: split
    rules:
      - tag: http_tag
        expression: event.decoded_as == 'HTTP'
      - tag: dns_tag
        expression: event.decoded_as == 'DNS'


topology:
  - name: inline_source
    downstream: [decoded_as_split]
  - name: decoded_as_split
    tags: [http_tag, dns_tag]
    downstream: [ projection_processor, aggregate_processor]
  - name: projection_processor
    downstream: [ print_sink ]
  - name: aggregate_processor
    downstream: [ print_sink ]
  - name: print_sink
    downstream: []
```