summaryrefslogtreecommitdiff
path: root/docs/connector
diff options
context:
space:
mode:
authordoufenghu <[email protected]>2024-06-15 23:50:43 +0800
committerdoufenghu <[email protected]>2024-06-15 23:50:43 +0800
commit691f7172a5ce463ca565b744d6c68f173427a6ca (patch)
tree4c585ae27f13c1a6cb82c80d3bd7b2398733bb49 /docs/connector
parent80769f631cfdd66ae5b5f1824a00d12fa2e5e43a (diff)
[Improve][docs] Add mock source connector documents.
Diffstat (limited to 'docs/connector')
-rw-r--r--docs/connector/connector.md192
-rw-r--r--docs/connector/formats/protobuf.md1
-rw-r--r--docs/connector/source/file.md1
-rw-r--r--docs/connector/source/mock.md18
4 files changed, 202 insertions, 10 deletions
diff --git a/docs/connector/connector.md b/docs/connector/connector.md
index 7031bed..71e416c 100644
--- a/docs/connector/connector.md
+++ b/docs/connector/connector.md
@@ -51,6 +51,7 @@ schema:
### Local File
To retrieve the schema from a local file using its absolute path.
+> Ensures that the file path is accessible to all nodes in your Flink cluster.
```yaml
schema:
# by array
@@ -66,9 +67,198 @@ schema:
fields:
url: "https://localhost:8080/schema.json"
```
+## Mock Data Type
+The mock data type is used to define the template of the mock data.
+
+| Mock Type | Parameter | Result Type | Default | Description |
+|-----------------------------------------|-------------|-----------------------|---------------------|-------------------------------------------------------------------------------------------------------------------------|
+| **[Number](#Number)** | - | **int/bigint/double** | - | **Randomly generate a number.** |
+| - | start | number | 0 | The minimum value (include). |
+| - | end | number | int32.max | The maximum value (exclusive). |
+| - | options | array of number | (none) | The optional values. If set, the random value will be selected from the options and `start` and `end` will be ignored. |
+| - | random | boolean | true | Default is random mode. If set to false, the value will be generated in order. |
+| **[Sequence](#Sequence)** | - | **bigint** | - | **Generate a sequence number based on a specific step value .** |
+| - | start | bigint | 0 | The first number in the sequence (include). |
+| - | step | bigint | 1 | The number to add to each subsequent value. |
+| **[UniqueSequence](#UniqueSequence)** | - | **bigint** | - | **Generate a global unique sequence number.** |
+| - | start | bigint | 0 | The first number in the sequence (include). |
+| **[String](#String)** | - | string | - | **Randomly generate a string.** |
+| - | regex | string | [a-zA-Z]{0,5} | The regular expression. |
+| - | options | array of string | (none) | The optional values. If set, the random value will be selected from the options and `regex` will be ignored. |
+| - | random | boolean | true | Default is random mode. If set to false, the options value will be generated in order. |
+| **[Timestamp](#Timestamp)** | - | **bigint** | - | **Generate a unix timestamp in milliseconds or seconds.** |
+| - | unit | string | second | The unit of the timestamp. The optional values are `second`, `millis`. |
+| **[FormatTimestamp](#FormatTimestamp)** | - | **string** | - | **Generate a formatted timestamp.** |
+| - | format | string | yyyy-MM-dd HH:mm:ss | The format to output. |
+| - | utc | boolean | false | Default is local time. If set to true, the time will be converted to UTC time. |
+| **[IPv4](#IPv4)** | - | **string** | - | **Randomly generate a IPv4 address.** |
+| - | start | string | 0.0.0.0 | The minimum value of the IPv4 address(include). |
+| - | end | string | 255.255.255.255 | The maximum value of the IPv4 address(include). |
+| **[Expression](#Expression)** | - | string | - | **Use library [Datafaker](https://www.datafaker.net/documentation/expressions/) expressions to generate fake data.** |
+| - | expression | string | (none) | The datafaker expression used #{expression}. |
+| **[Object](#Object)** | - | **struct/object** | - | **Generate a object data structure. It used to define the nested structure of the mock data.** |
+| - | fields | array of object | (none) | The fields of the object. |
+| **[Union](#Union)** | - | - | - | **Generate a union data structure with multiple mock data type fields.** |
+| - | unionFields | array of object | (none) | The fields of the object. |
+| - | - fields | - array of object | (none) | |
+| - | - weight | - int | 0 | The weight of the generated object. |
+| | random | boolean | true | Default is random mode. If set to false, the options value will be generated in order. |
+
+### Common Parameters
+
+Mock data type supports some common parameters.
+
+| Parameter | Type | Default | Description |
+|---------------------|---------|---------|----------------------------------------------------------------------------------------|
+| [nullRate](#String) | double | 1 | Null value rate. The value range is [0, 1]. If set to 0.1, the null value rate is 10%. |
+| [array](#String) | boolean | false | Array flag. If set to true, the value will be generated as an array. |
+| arrayLenMin | int | 0 | The minimum length of the array(include). `array` flag must be set to true. |
+| arrayLenMax | int | 5 | The maximum length of the array(include). `array` flag must be set to true. |
+
+
+### Number
+- Randomly generate a integer number between 0 and 10000.
+```json
+{"name":"int_random","type":"Number","start":0,"end":10000}
+```
+- Generate a integer number between 0 and 10000, and the value will be generated in order.
+```json
+{"name":"int_inc","type":"Number","start":0,"end":10000,"random":false}
+```
+- Randomly generate a integer number from 20, 22, 25, 30.
+```json
+{"name":"int_options","type":"Number","options":[20,22,25,30]}
+```
+- randomly generate a double number between 0 and 10000.
+```json
+{"name":"double_random","type":"Number","start":0.0,"end":10000.0}
+```
+### Sequence
+
+- Generate a sequence number starting from 0 and incrementing by 2.
+```json
+{"name":"bigint_sequence","type":"Sequence","start":0,"step":2}
+```
+### UniqueSequence
+
+- Generate a global unique sequence number starting from 0.
+```json
+{"name":"id","type":"UniqueSequence","start":0}
+```
+
+### String
+
+- Randomly generate s string with a length between 0 and 5. And set null value rate is 10%.
+```json
+{"name":"str_regex","type":"String","regex":"[a-z]{5,10}","nullRate":0.1}
+```
+- Randomly generate a string from "a", "b", "c", "d".
+```json
+{"name":"str_options","type":"String","options":["a","b","c","d"]}
+```
+- Randomly generate a array of string. The length of the array is between 1 and 3.
+```json
+{"name":"array_str","type":"String","regex":"[a-z]{5,10}","array":true,"arrayLenMin":1,"arrayLenMax":3}
+```
+
+### Timestamp
+
+- Generate a current Unix timestamp in milliseconds.
+```json
+{"name":"timestamp_ms","type":"Timestamp","unit":"millis"}
+```
+### FormatTimestamp
+
+- Generate a formatted timestamp string using format `yyyy-MM-dd HH:mm:ss`.
+```json
+{"name":"timestamp_str","type":"FormatTimestamp","format":"yyyy-MM-dd HH:mm:ss"}
+```
+- Generate a formatted timestamp string using format `yyyy-MM-dd HH:mm:ss.SSS`.
+```json
+{"name":"timestamp_str","type":"FormatTimestamp","format":"yyyy-MM-dd HH:mm:ss.SSS"}
+```
+
+### IPv4
+- Generate a IPv4 address between 192.168.20.1 and 192.168.20.255.
+```json
+{"name":"ip","type":"IpV4","start":"192.168.20.1","end":"192.168.20.255"}
+```
+
+### Expression
+
+- Generate a fake email address.
+```json
+{"name":"emailAddress","type":"Expression","expression":"#{internet.emailAddress}"}
+```
+- Generate a fake domain name.
+```json
+{"name":"domain","type":"Expression","expression":"#{internet.domainName}"}
+```
+- Generate a fake IPv6 address.
+```json
+{"name":"ipv6","type":"Expression","expression":"#{internet.ipV6Address}"}
+```
+- Generate a fake phone number.
+```json
+{"name":"phoneNumber","type":"Expression","expression":"#{phoneNumber.phoneNumber}"}
+```
+### Object
+
+- Generate a object data structure.
+```json
+{"name":"object","type":"Object","fields":[{"name":"str","type":"String","regex":"[a-z]{5,10}","nullRate":0.1},{"name":"cate","type":"String","options":["a","b","c"]}]}
+```
+output:
+```json
+{"object": {"str":"abcde","cate":"a"}}
+```
+
+### Union
+- Generate a union mock data type fields. Generate object_id and item_id fields. When object_id is 10, item_id is randomly generated from 1, 2, 3, 4, 5. When object_id is 20, item_id is randomly generated from 6, 7. The first object generates 5/7 of the total, and the second object generates 2/7 of the total.
+```json
+{
+ "name": "unionFields",
+ "type": "Union",
+ "random": false,
+ "unionFields": [
+ {
+ "weight": 5,
+ "fields": [
+ {
+ "name": "object_id",
+ "type": "Number",
+ "options": [10]
+ },
+ {
+ "name": "item_id",
+ "type": "Number",
+ "options": [1, 2, 3, 4, 5],
+ "random": false
+ }
+ ]
+ },
+ {
+ "weight": 2,
+ "fields": [
+ {
+ "name": "object_id",
+ "type": "Number",
+ "options": [20]
+ },
+ {
+ "name": "item_id",
+ "type": "Number",
+ "options": [6, 7],
+ "random": false
+ }
+ ]
+ }
+ ]
+}
+```
# Sink Connector
-Sink Connector contains some common core features, and each sink connector supports them to varying degrees.
+The Sink Connector contains some common core features, and each sink connector supports these features to varying degrees.
## Common Sink Options
diff --git a/docs/connector/formats/protobuf.md b/docs/connector/formats/protobuf.md
index e55f6f1..18f86c8 100644
--- a/docs/connector/formats/protobuf.md
+++ b/docs/connector/formats/protobuf.md
@@ -9,6 +9,7 @@ It is very popular in Streaming Data Pipeline. Now support protobuf format in so
| Format Protobuf | Universal | [Download](http://192.168.40.153:8099/service/local/repositories/platform-release/content/com/geedgenetworks/format-protobuf/) |
## Format Options
+> Ensures that the file path is accessible to all nodes in your Flink cluster.
| Name | Type | Required | Default | Description |
|-------------------------------|----------|----------|---------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
diff --git a/docs/connector/source/file.md b/docs/connector/source/file.md
index f92ab84..edb4aab 100644
--- a/docs/connector/source/file.md
+++ b/docs/connector/source/file.md
@@ -23,6 +23,7 @@ File source custom properties.
## Example
This example read data of file test source and print to console.
+> Ensures that the file path is accessible to all nodes in your Flink cluster.
```yaml
sources:
diff --git a/docs/connector/source/mock.md b/docs/connector/source/mock.md
index 42894c5..dfd10d9 100644
--- a/docs/connector/source/mock.md
+++ b/docs/connector/source/mock.md
@@ -4,22 +4,22 @@
## Description
-Mock source connector is used to generate data. It is useful for testing.
+Mock source connector used to randomly generate the number of rows according to the user-defined schema. This connector helps you test the functionality of your system without relying on real data.
## Source Options
-File source custom properties.
+Mock source custom properties.
-| Name | Type | Required | Default | Description |
-|---------------------|---------|----------|---------|------------------------------------------------------------------------------------------------|
-| mock.desc.file.path | String | Yes | (none) | mock schema file path. |
-| rows.per.second | Integer | No | 1000 | Rows per second to control the emit rate. |
-| number.of.rows | Long | No | -1 | Total number of rows to emit. By default, the source is unbounded. |
-| millis.per.row | Long | No | 0 | Millis per row to control the emit rate. If greater than 0, rows.per.second is not effective. |
+| Name | Type | Required | Default | Description |
+|---------------------|---------|----------|---------|-----------------------------------------------------------------------------------------------------|
+| mock.desc.file.path | String | Yes | (none) | The path of the mock data structure file. |
+| rows.per.second | Integer | No | 1000 | The number of rows per second that connector generated. |
+| number.of.rows | Long | No | -1 | The total number of rows data generated. By default, the source is unbounded. |
+| millis.per.row | Long | No | 0 | The interval(mills) between each row. If greater than 0, then `rows.per.second` will be ignored. |
## Example
-This example mock source and print to console.
+This example randomly generates data of a specified schema `mock_example.json` and output to console. More details how to declare mock data type, click [here](../connector.md#mock-data-type).
```yaml
sources: