Supported file formats

BEEM's data lake ingestion supports several file formats for source data. The four formats below are the recommended ones for most use cases — additional formats can be supported on request.

Compatibility matrix

Format	Full-load	Direct (full-load)	Incremental	Add/Remove column
CSV	Yes	Yes	Yes	Yes (manual)
JSON	Yes	Yes	Yes	Yes
Parquet	Yes	Yes	Yes	Yes
Avro	Yes	—	Yes	Yes

Format details

CSV

Standard delimited text. Two variants are supported depending on whether values are wrapped in quotes. Add/remove column is supported but requires a manual process.

JSON

JSON documents containing one outer array of records (e.g. [ {...}, {...} ]).

Parquet

Columnar binary format. Position-based, so column renames at the source are transparent.

Avro

Binary row-based format with a self-describing schema embedded in the file header. Avro is name-based (column names in the file header must match the target schema), and the ingestion automatically sanitizes special characters in column names (e.g. $, /, spaces) for downstream compatibility.

Compatibility matrix

Format details

CSV

JSON

Parquet

Avro

Notes