summaryrefslogtreecommitdiff
path: root/docs/connector/source/file.md
blob: bdbf74efcc233c97dd038f6b9d126b8929d482ee (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
# File

> File source connector

## Description

File source connector is used to generate data from a text file(local file or hdfs file). It is useful for testing.

## Source Options

File source custom properties.

|           Name            |  Type   | Required | Default |                                            Description                                            |
|---------------------------|---------|----------|---------|---------------------------------------------------------------------------------------------------|
| path                      | String  | Yes      | (none)  | File path, support local path or hdfs path. Example: ./logs/logs.json, hdfs://ns1/test/logs.json. |
| format                    | String  | Yes      | (none)  | Data format.  The Optional values are `json`, `csv`.                                              |
| [format].config           | Map     | No       | (none)  | Data format properties. Please refer to [Format Options](../formats) for details.                 |
| rows.per.second           | Integer | No       | 1000    | Rows per second to control the emit rate.                                                         |
| number.of.rows            | Long    | No       | -1      | Total number of rows to emit. By default, the source is unbounded.                                |
| millis.per.row            | Long    | No       | 0       | Millis per row to control the emit rate.  If greater than 0, rows.per.second is not effective.    |
| read.local.file.in.client | Boolean | No       | true    | Whether read local file in client.                                                                |

## Example

This example read data of file test source and print to console.

> Ensures that the file path is accessible to all nodes in your Flink cluster.

```yaml
sources:
  file_source:
    type: file
    properties:
      # path: 'hdfs://ns1/test/logs.json'
      path: './logs.json'
      rows.per.second: 2
      format: json

sinks:
  print_sink:
    type: print
    properties:
      format: json

application:
  env:
    name: example-file-to-print
    parallelism: 2
    pipeline:
      object-reuse: true
  topology:
    - name: file_source
      downstream: [ print_sink ]
    - name: print_sink
      downstream: [ ]
```