Skip to main content

Flat File

A Flat File Source reads and parses structured files in multiple formats (CSV, positional, XML, JSON) from various acquisition channels (FTP, HTTP, S3, Azure Blob Storage).

Each row or logical unit becomes an independent message processed asynchronously for maximum scalability and flexibility.

Acquisition channels

ChannelDescription
FTP / FTPS / SFTPSecurely download files from an FTP server. Auth via user/password or SSH key; paths, filename patterns, retention rules supported.
HTTP (TBD)Download from HTTP/HTTPS with possible authentication (API key, Basic, Bearer).
Amazon S3 (TBD)Connect to a bucket and filter by prefix or filename.
Azure Blob Storage (TBD)Access a container; auth via connection string or service principal.
Azure Blob Storage RFEExtended variant for multi/coordination processing (in development).

Reading protocols

Flowlyze supports two reading protocols:

Coordinated

Flowlyze uses a working directory with read/write permissions:

  1. Copy file from source directory to a temporary processing directory
  2. Read and parse contents
  3. Move file to a completion directory on finish

Advantages:

  • Visibility on pending files
  • Error inspection
  • Full traceability

Directory cleanup and artifact handling are the user’s responsibility.

Simple (TBD)

Flowlyze downloads and processes the file directly, without local copies.
On errors, Flowlyze emits a flow error message with details but does not keep a copy of the original file.

Supported formats

FormatStatusDescription
CSVImplementedDelimited files with advanced parsing (header, quoting, culture, custom delimiters).
PositionalTBDFixed-width files with explicit column positions.
XMLTBDXML with XPath selectors, node mapping.
JSONTBDComplex JSON with JSONPath entity selection.

CSV configuration

ParameterDescription
Line DelimiterUsually \n or \r\n.
Column DelimiterDefault ,; e.g., ;.
Quote CharacterDefault ".
CultureNumber/date culture (e.g., it-IT, en-US).
Has HeaderFirst row contains column names (true/false).
Grouping ColumnColumn used to group multiple rows into a single message.

Row Grouping

Flowlyze can group multiple source rows into one logical message using a grouping column.
This is useful for hierarchical relations or multiple versions of the same record in a file.

During parsing, Flowlyze evaluates the grouping column (e.g., product_id, record_id).
Rows with the same value are aggregated into one JSON object.
Each group yields a single message containing common fields and a nested array with grouped rows.

Example 1 – Variants grouped by main product

CSV with product variants (size, color, price) associated with a main product identified by product_id.

Source file

product_idvariant_idcolorsizeprice
10011redM29.90
10012blueL31.50
10023blackS28.00

Grouping column: product_id

Aggregated output

{
"product_id": 1001,
"variants": [
{ "variant_id": 1, "color": "red", "size": "M", "price": 29.90 },
{ "variant_id": 2, "color": "blue", "size": "L", "price": 31.50 }
]
},
{
"product_id": 1002,
"variants": [
{ "variant_id": 3, "color": "black", "size": "S", "price": 28.00 }
]
}

Example 2 – Grouping multiple saves (journaling tables)

A journaling/audit table records multiple versions of the same record. The flow can consolidate them into one logical message.

Source file

record_idupdate_timefieldold_valuenew_value
5012025-10-01 10:30:00statusdraftpending
5012025-10-02 11:45:00statuspendingapproved
5012025-10-03 09:20:00notenull"OK"

Grouping column: record_id

Aggregated output

{
"record_id": 501,
"data": [
{
"update_time": "2025-10-01T10:30:00Z",
"field": "status",
"old_value": "draft",
"new_value": "pending"
},
{
"update_time": "2025-10-02T11:45:00Z",
"field": "status",
"old_value": "pending",
"new_value": "approved"
},
{
"update_time": "2025-10-03T09:20:00Z",
"field": "note",
"old_value": null,
"new_value": "OK"
}
]
}

In this example, Flowlyze emits one message per record_id, containing the complete change history in chronological order. This consolidates versions into a coherent representation for downstream systems (e.g., data lake, CRM, or auditing service).

JSON Configuration

For the JSONformat, Flowlyze supports reading structured JSON files, both simple and complex, with the ability to select a specific portion of the document using JSONPath.

The JSON configuration is intentionally minimal: the parsing behavior mainly depends on the structure of the selected node (object, array, or single value).

Configuration Parameters

ParameterDescription
Json PathJSONPath expression that identifies the node in the JSON document from which to read data. If not specified, left empty, or set to "$", the document root is used.

Parsing Behavior

Once the target node is determined (root or node selected via JSONPath), Flowlyze generates one or more messages based on the node type:

Node TypeResult
JSON ObjectA single message is generated containing the entire object.
JSON ArrayOne message per array element is generated.
Single Value (string, number, boolean, etc.)A single message is generated containing the value.

This behavior is identical whether using the document root or a JSONPath.

Usage Without Json Path

If Json Path is not configured or is set to "$":

  • The entire JSON file is read.
  • The document root is used as the input node.
  • Message generation depends on the root type:
    • Object → 1 message
    • Array → N messages (one per element)

Usage With Json Path

When Json Path is specified:

  • The file is parsed as JSON.
  • The JSONPath expression is applied to locate a specific node (for example, a nested array or object).
  • If the path does not match any node, a configuration error is raised.
  • The selected node becomes the input for message generation, following the same rules described above.

Configuration Examples

JSON Object as Root
{ "id": 1, "name": "Example" }

Configuration:

  • Json Path: (not set)

Result:

  • A single message is generated containing the complete JSON object.
JSON Array as Input
[
{ "id": 1 },
{ "id": 2 }
]

Configuration:

  • Json Path: (not set)

Result:

  • Two messages are generated, one for each array element.
Selection via Json Path
{
"data": {
"items": [
{ "id": 1 },
{ "id": 2 }
]
}
}

Configuration:

  • Json Path: $.data.items

Result:

  • Two messages are generated, , one for each element of the items array.
Invalid Json Path

Configuration:

  • Json Path: $.missing.path

Result:

  • Configuration error: the JSONPath does not match any node in the document.