Flat File

A Flat File Source reads and parses structured files in multiple formats (CSV, positional, XML, JSON) from various acquisition channels (FTP, HTTP, S3, Azure Blob Storage).

Each row or logical unit becomes an independent message processed asynchronously for maximum scalability and flexibility.

Acquisition channels

Channel	Description
FTP / FTPS / SFTP	Securely download files from an FTP server. Auth via user/password or SSH key; paths, filename patterns, retention rules supported.
HTTP (TBD)	Download from HTTP/HTTPS with possible authentication (API key, Basic, Bearer).
Amazon S3 (TBD)	Connect to a bucket and filter by prefix or filename.
Azure Blob Storage (TBD)	Access a container; auth via connection string or service principal.
Azure Blob Storage RFE	Extended variant for multi/coordination processing (in development).

Reading protocols

Flowlyze supports two reading protocols:

Coordinated

Flowlyze uses a working directory with read/write permissions:

Copy file from source directory to a temporary processing directory
Read and parse contents
Move file to a completion directory on finish

Advantages:

Visibility on pending files
Error inspection
Full traceability

Directory cleanup and artifact handling are the user’s responsibility.

Simple (TBD)

Flowlyze downloads and processes the file directly, without local copies.
On errors, Flowlyze emits a flow error message with details but does not keep a copy of the original file.

Supported formats

Format	Status	Description
CSV	Implemented	Delimited files with advanced parsing (header, quoting, culture, custom delimiters).
Positional	TBD	Fixed-width files with explicit column positions.
XML	TBD	XML with XPath selectors, node mapping.
JSON	TBD	Complex JSON with JSONPath entity selection.

CSV configuration

Parameter	Description
Line Delimiter	Usually `\n` or `\r\n`.
Column Delimiter	Default `,`; e.g., `;`.
Quote Character	Default `"`.
Culture	Number/date culture (e.g., `it-IT`, `en-US`).
Has Header	First row contains column names (`true`/`false`).
Grouping Column	Column used to group multiple rows into a single message.

Row Grouping

Flowlyze can group multiple source rows into one logical message using a grouping column.
This is useful for hierarchical relations or multiple versions of the same record in a file.

During parsing, Flowlyze evaluates the grouping column (e.g., product_id, record_id).
Rows with the same value are aggregated into one JSON object.
Each group yields a single message containing common fields and a nested array with grouped rows.

Example 1 – Variants grouped by main product

CSV with product variants (size, color, price) associated with a main product identified by product_id.

Source file

product_id	variant_id	color	size	price
1001	1	red	M	29.90
1001	2	blue	L	31.50
1002	3	black	S	28.00

Grouping column: product_id

Aggregated output

{
  "product_id": 1001,
  "variants": [
    { "variant_id": 1, "color": "red", "size": "M", "price": 29.90 },
    { "variant_id": 2, "color": "blue", "size": "L", "price": 31.50 }
  ]
},
{
  "product_id": 1002,
  "variants": [
    { "variant_id": 3, "color": "black", "size": "S", "price": 28.00 }
  ]
}

Example 2 – Grouping multiple saves (journaling tables)

A journaling/audit table records multiple versions of the same record. The flow can consolidate them into one logical message.

Source file

record_id	update_time	field	old_value	new_value
501	2025-10-01 10:30:00	status	draft	pending
501	2025-10-02 11:45:00	status	pending	approved
501	2025-10-03 09:20:00	note	null	"OK"

Grouping column: record_id

Aggregated output

{
  "record_id": 501,
  "data": [
    {
      "update_time": "2025-10-01T10:30:00Z",
      "field": "status",
      "old_value": "draft",
      "new_value": "pending"
    },
    {
      "update_time": "2025-10-02T11:45:00Z",
      "field": "status",
      "old_value": "pending",
      "new_value": "approved"
    },
    {
      "update_time": "2025-10-03T09:20:00Z",
      "field": "note",
      "old_value": null,
      "new_value": "OK"
    }
  ]
}

In this example, Flowlyze emits one message per record_id, containing the complete change history in chronological order. This consolidates versions into a coherent representation for downstream systems (e.g., data lake, CRM, or auditing service).

JSON Configuration

For the JSONformat, Flowlyze supports reading structured JSON files, both simple and complex, with the ability to select a specific portion of the document using JSONPath.

The JSON configuration is intentionally minimal: the parsing behavior mainly depends on the structure of the selected node (object, array, or single value).

Configuration Parameters

Parameter	Description
Json Path	JSONPath expression that identifies the node in the JSON document from which to read data. If not specified, left empty, or set to `"$"`, the document root is used.

Parsing Behavior

Once the target node is determined (root or node selected via JSONPath), Flowlyze generates one or more messages based on the node type:

Node Type	Result
JSON Object	A single message is generated containing the entire object.
JSON Array	One message per array element is generated.
Single Value (string, number, boolean, etc.)	A single message is generated containing the value.

This behavior is identical whether using the document root or a JSONPath.

Usage Without Json Path

If Json Path is not configured or is set to "$":

The entire JSON file is read.
The document root is used as the input node.
Message generation depends on the root type:
- Object → 1 message
- Array → N messages (one per element)

Usage With Json Path

When Json Path is specified:

The file is parsed as JSON.
The JSONPath expression is applied to locate a specific node (for example, a nested array or object).
If the path does not match any node, a configuration error is raised.
The selected node becomes the input for message generation, following the same rules described above.

Configuration Examples

JSON Object as Root

{ "id": 1, "name": "Example" }

Configuration:

Json Path: (not set)

Result:

A single message is generated containing the complete JSON object.

JSON Array as Input

[
  { "id": 1 },
  { "id": 2 }
]

Configuration:

Json Path: (not set)

Result:

Two messages are generated, one for each array element.

Selection via Json Path

{
  "data": {
    "items": [
      { "id": 1 },
      { "id": 2 }
    ]
  }
}

Configuration:

Json Path: $.data.items

Result:

Two messages are generated, , one for each element of the items array.

Invalid Json Path

Configuration:

Json Path: $.missing.path

Result:

Configuration error: the JSONPath does not match any node in the document.

Acquisition channels​

Reading protocols​

Coordinated​

Simple (TBD)​

Supported formats​

CSV configuration​

Row Grouping​

Example 1 – Variants grouped by main product​

Example 2 – Grouping multiple saves (journaling tables)​

JSON Configuration​

Configuration Parameters​

Parsing Behavior​

Usage Without Json Path​

Usage With Json Path​

Configuration Examples​

JSON Object as Root​

JSON Array as Input​

Selection via Json Path​

Invalid Json Path​

Acquisition channels

Reading protocols

Coordinated

Simple (TBD)

Supported formats

CSV configuration

Row Grouping

Example 1 – Variants grouped by main product

Example 2 – Grouping multiple saves (journaling tables)

JSON Configuration

Configuration Parameters

Parsing Behavior

Usage Without Json Path

Usage With Json Path

Configuration Examples

JSON Object as Root

JSON Array as Input

Selection via Json Path

Invalid Json Path