diff options
| author | doufenghu <[email protected]> | 2024-10-18 16:41:28 +0800 |
|---|---|---|
| committer | doufenghu <[email protected]> | 2024-10-18 16:41:28 +0800 |
| commit | fd54e003f5e852ad6735e400d3feca024dc5e5f3 (patch) | |
| tree | 433e69c4f324939e323842745a241731f887d8eb /docs/develop-guide.md | |
| parent | 1c8baf9c355db3df000278a5e1d9860c5baf4635 (diff) | |
[Improve][docs] Add event model description.
Diffstat (limited to 'docs/develop-guide.md')
| -rw-r--r-- | docs/develop-guide.md | 22 |
1 files changed, 22 insertions, 0 deletions
diff --git a/docs/develop-guide.md b/docs/develop-guide.md index 2742cee..75e8803 100644 --- a/docs/develop-guide.md +++ b/docs/develop-guide.md @@ -15,6 +15,28 @@ | groot-docs | Docs module of groot-stream, which is responsible for providing documents. | | groot-release | Release module of groot-stream, which is responsible for providing release scripts. | +## Event Model +Groot Stream based all stream processing on data records common known as events. A event is a collection of key-value pairs(fields). As follows: + +```json +{ + "__timestamp": "<Timestamp in UNIX epoch format (milliseconds)>", + "__input_id": "ID/Name of the source that delivered the event", + "__window_start_timestamp" : "<Timestamp in UNIX epoch format (milliseconds)>", + "__window_end_timestamp" : "<Timestamp in UNIX epoch format (milliseconds)>", + "key1": "<value1>", + "key2": "<value2>", + "keyN": "<valueN>" +} +``` +Groot Stream add internal fields during pipeline processing. A few notes about internal fields: +- Internal fields start with a double underscore `__`. +- Each source can add one or many internal fields to the each event. For example, the Kafka source adds both a `__timestamp` and a `__input_id` field. +- Treat internal fields as read-only. Modifying them can result in unintended consequences to your data flows. +- Internal fields only exist for the duration of the event processing pipeline. They are not documented under sources or sinks. +- If you do not configure a timestamp for extraction, the Pipeline process assigns the current time (in UNIX epoch format) to the __timestamp field. +- If you have multiple sources, you can determine which source the event came form by looking at the `__input_id` field. For example, the Kafka source adds the topic name to the `__input_id` field. + ## How to write a high quality Git commit message > [purpose] [module name] [sub-module name] Description (JIRA Issue ID) |
