diff options
| author | doufenghu <[email protected]> | 2024-06-15 23:50:43 +0800 |
|---|---|---|
| committer | doufenghu <[email protected]> | 2024-06-15 23:50:43 +0800 |
| commit | 691f7172a5ce463ca565b744d6c68f173427a6ca (patch) | |
| tree | 4c585ae27f13c1a6cb82c80d3bd7b2398733bb49 /docs/connector | |
| parent | 80769f631cfdd66ae5b5f1824a00d12fa2e5e43a (diff) | |
[Improve][docs] Add mock source connector documents.
Diffstat (limited to 'docs/connector')
| -rw-r--r-- | docs/connector/connector.md | 192 | ||||
| -rw-r--r-- | docs/connector/formats/protobuf.md | 1 | ||||
| -rw-r--r-- | docs/connector/source/file.md | 1 | ||||
| -rw-r--r-- | docs/connector/source/mock.md | 18 |
4 files changed, 202 insertions, 10 deletions
diff --git a/docs/connector/connector.md b/docs/connector/connector.md index 7031bed..71e416c 100644 --- a/docs/connector/connector.md +++ b/docs/connector/connector.md @@ -51,6 +51,7 @@ schema: ### Local File To retrieve the schema from a local file using its absolute path. +> Ensures that the file path is accessible to all nodes in your Flink cluster. ```yaml schema: # by array @@ -66,9 +67,198 @@ schema: fields: url: "https://localhost:8080/schema.json" ``` +## Mock Data Type +The mock data type is used to define the template of the mock data. + +| Mock Type | Parameter | Result Type | Default | Description | +|-----------------------------------------|-------------|-----------------------|---------------------|-------------------------------------------------------------------------------------------------------------------------| +| **[Number](#Number)** | - | **int/bigint/double** | - | **Randomly generate a number.** | +| - | start | number | 0 | The minimum value (include). | +| - | end | number | int32.max | The maximum value (exclusive). | +| - | options | array of number | (none) | The optional values. If set, the random value will be selected from the options and `start` and `end` will be ignored. | +| - | random | boolean | true | Default is random mode. If set to false, the value will be generated in order. | +| **[Sequence](#Sequence)** | - | **bigint** | - | **Generate a sequence number based on a specific step value .** | +| - | start | bigint | 0 | The first number in the sequence (include). | +| - | step | bigint | 1 | The number to add to each subsequent value. | +| **[UniqueSequence](#UniqueSequence)** | - | **bigint** | - | **Generate a global unique sequence number.** | +| - | start | bigint | 0 | The first number in the sequence (include). | +| **[String](#String)** | - | string | - | **Randomly generate a string.** | +| - | regex | string | [a-zA-Z]{0,5} | The regular expression. | +| - | options | array of string | (none) | The optional values. If set, the random value will be selected from the options and `regex` will be ignored. | +| - | random | boolean | true | Default is random mode. If set to false, the options value will be generated in order. | +| **[Timestamp](#Timestamp)** | - | **bigint** | - | **Generate a unix timestamp in milliseconds or seconds.** | +| - | unit | string | second | The unit of the timestamp. The optional values are `second`, `millis`. | +| **[FormatTimestamp](#FormatTimestamp)** | - | **string** | - | **Generate a formatted timestamp.** | +| - | format | string | yyyy-MM-dd HH:mm:ss | The format to output. | +| - | utc | boolean | false | Default is local time. If set to true, the time will be converted to UTC time. | +| **[IPv4](#IPv4)** | - | **string** | - | **Randomly generate a IPv4 address.** | +| - | start | string | 0.0.0.0 | The minimum value of the IPv4 address(include). | +| - | end | string | 255.255.255.255 | The maximum value of the IPv4 address(include). | +| **[Expression](#Expression)** | - | string | - | **Use library [Datafaker](https://www.datafaker.net/documentation/expressions/) expressions to generate fake data.** | +| - | expression | string | (none) | The datafaker expression used #{expression}. | +| **[Object](#Object)** | - | **struct/object** | - | **Generate a object data structure. It used to define the nested structure of the mock data.** | +| - | fields | array of object | (none) | The fields of the object. | +| **[Union](#Union)** | - | - | - | **Generate a union data structure with multiple mock data type fields.** | +| - | unionFields | array of object | (none) | The fields of the object. | +| - | - fields | - array of object | (none) | | +| - | - weight | - int | 0 | The weight of the generated object. | +| | random | boolean | true | Default is random mode. If set to false, the options value will be generated in order. | + +### Common Parameters + +Mock data type supports some common parameters. + +| Parameter | Type | Default | Description | +|---------------------|---------|---------|----------------------------------------------------------------------------------------| +| [nullRate](#String) | double | 1 | Null value rate. The value range is [0, 1]. If set to 0.1, the null value rate is 10%. | +| [array](#String) | boolean | false | Array flag. If set to true, the value will be generated as an array. | +| arrayLenMin | int | 0 | The minimum length of the array(include). `array` flag must be set to true. | +| arrayLenMax | int | 5 | The maximum length of the array(include). `array` flag must be set to true. | + + +### Number +- Randomly generate a integer number between 0 and 10000. +```json +{"name":"int_random","type":"Number","start":0,"end":10000} +``` +- Generate a integer number between 0 and 10000, and the value will be generated in order. +```json +{"name":"int_inc","type":"Number","start":0,"end":10000,"random":false} +``` +- Randomly generate a integer number from 20, 22, 25, 30. +```json +{"name":"int_options","type":"Number","options":[20,22,25,30]} +``` +- randomly generate a double number between 0 and 10000. +```json +{"name":"double_random","type":"Number","start":0.0,"end":10000.0} +``` +### Sequence + +- Generate a sequence number starting from 0 and incrementing by 2. +```json +{"name":"bigint_sequence","type":"Sequence","start":0,"step":2} +``` +### UniqueSequence + +- Generate a global unique sequence number starting from 0. +```json +{"name":"id","type":"UniqueSequence","start":0} +``` + +### String + +- Randomly generate s string with a length between 0 and 5. And set null value rate is 10%. +```json +{"name":"str_regex","type":"String","regex":"[a-z]{5,10}","nullRate":0.1} +``` +- Randomly generate a string from "a", "b", "c", "d". +```json +{"name":"str_options","type":"String","options":["a","b","c","d"]} +``` +- Randomly generate a array of string. The length of the array is between 1 and 3. +```json +{"name":"array_str","type":"String","regex":"[a-z]{5,10}","array":true,"arrayLenMin":1,"arrayLenMax":3} +``` + +### Timestamp + +- Generate a current Unix timestamp in milliseconds. +```json +{"name":"timestamp_ms","type":"Timestamp","unit":"millis"} +``` +### FormatTimestamp + +- Generate a formatted timestamp string using format `yyyy-MM-dd HH:mm:ss`. +```json +{"name":"timestamp_str","type":"FormatTimestamp","format":"yyyy-MM-dd HH:mm:ss"} +``` +- Generate a formatted timestamp string using format `yyyy-MM-dd HH:mm:ss.SSS`. +```json +{"name":"timestamp_str","type":"FormatTimestamp","format":"yyyy-MM-dd HH:mm:ss.SSS"} +``` + +### IPv4 +- Generate a IPv4 address between 192.168.20.1 and 192.168.20.255. +```json +{"name":"ip","type":"IpV4","start":"192.168.20.1","end":"192.168.20.255"} +``` + +### Expression + +- Generate a fake email address. +```json +{"name":"emailAddress","type":"Expression","expression":"#{internet.emailAddress}"} +``` +- Generate a fake domain name. +```json +{"name":"domain","type":"Expression","expression":"#{internet.domainName}"} +``` +- Generate a fake IPv6 address. +```json +{"name":"ipv6","type":"Expression","expression":"#{internet.ipV6Address}"} +``` +- Generate a fake phone number. +```json +{"name":"phoneNumber","type":"Expression","expression":"#{phoneNumber.phoneNumber}"} +``` +### Object + +- Generate a object data structure. +```json +{"name":"object","type":"Object","fields":[{"name":"str","type":"String","regex":"[a-z]{5,10}","nullRate":0.1},{"name":"cate","type":"String","options":["a","b","c"]}]} +``` +output: +```json +{"object": {"str":"abcde","cate":"a"}} +``` + +### Union +- Generate a union mock data type fields. Generate object_id and item_id fields. When object_id is 10, item_id is randomly generated from 1, 2, 3, 4, 5. When object_id is 20, item_id is randomly generated from 6, 7. The first object generates 5/7 of the total, and the second object generates 2/7 of the total. +```json +{ + "name": "unionFields", + "type": "Union", + "random": false, + "unionFields": [ + { + "weight": 5, + "fields": [ + { + "name": "object_id", + "type": "Number", + "options": [10] + }, + { + "name": "item_id", + "type": "Number", + "options": [1, 2, 3, 4, 5], + "random": false + } + ] + }, + { + "weight": 2, + "fields": [ + { + "name": "object_id", + "type": "Number", + "options": [20] + }, + { + "name": "item_id", + "type": "Number", + "options": [6, 7], + "random": false + } + ] + } + ] +} +``` # Sink Connector -Sink Connector contains some common core features, and each sink connector supports them to varying degrees. +The Sink Connector contains some common core features, and each sink connector supports these features to varying degrees. ## Common Sink Options diff --git a/docs/connector/formats/protobuf.md b/docs/connector/formats/protobuf.md index e55f6f1..18f86c8 100644 --- a/docs/connector/formats/protobuf.md +++ b/docs/connector/formats/protobuf.md @@ -9,6 +9,7 @@ It is very popular in Streaming Data Pipeline. Now support protobuf format in so | Format Protobuf | Universal | [Download](http://192.168.40.153:8099/service/local/repositories/platform-release/content/com/geedgenetworks/format-protobuf/) | ## Format Options +> Ensures that the file path is accessible to all nodes in your Flink cluster. | Name | Type | Required | Default | Description | |-------------------------------|----------|----------|---------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| diff --git a/docs/connector/source/file.md b/docs/connector/source/file.md index f92ab84..edb4aab 100644 --- a/docs/connector/source/file.md +++ b/docs/connector/source/file.md @@ -23,6 +23,7 @@ File source custom properties. ## Example This example read data of file test source and print to console. +> Ensures that the file path is accessible to all nodes in your Flink cluster. ```yaml sources: diff --git a/docs/connector/source/mock.md b/docs/connector/source/mock.md index 42894c5..dfd10d9 100644 --- a/docs/connector/source/mock.md +++ b/docs/connector/source/mock.md @@ -4,22 +4,22 @@ ## Description -Mock source connector is used to generate data. It is useful for testing. +Mock source connector used to randomly generate the number of rows according to the user-defined schema. This connector helps you test the functionality of your system without relying on real data. ## Source Options -File source custom properties. +Mock source custom properties. -| Name | Type | Required | Default | Description | -|---------------------|---------|----------|---------|------------------------------------------------------------------------------------------------| -| mock.desc.file.path | String | Yes | (none) | mock schema file path. | -| rows.per.second | Integer | No | 1000 | Rows per second to control the emit rate. | -| number.of.rows | Long | No | -1 | Total number of rows to emit. By default, the source is unbounded. | -| millis.per.row | Long | No | 0 | Millis per row to control the emit rate. If greater than 0, rows.per.second is not effective. | +| Name | Type | Required | Default | Description | +|---------------------|---------|----------|---------|-----------------------------------------------------------------------------------------------------| +| mock.desc.file.path | String | Yes | (none) | The path of the mock data structure file. | +| rows.per.second | Integer | No | 1000 | The number of rows per second that connector generated. | +| number.of.rows | Long | No | -1 | The total number of rows data generated. By default, the source is unbounded. | +| millis.per.row | Long | No | 0 | The interval(mills) between each row. If greater than 0, then `rows.per.second` will be ignored. | ## Example -This example mock source and print to console. +This example randomly generates data of a specified schema `mock_example.json` and output to console. More details how to declare mock data type, click [here](../connector.md#mock-data-type). ```yaml sources: |
