Supported file formats
BEEM's data lake ingestion supports several file formats for source data. The four formats below are the recommended ones for most use cases — additional formats can be supported on request.
Compatibility matrix
| Format | Full-load | Direct (full-load) | Incremental | Add/Remove column |
|---|---|---|---|---|
| CSV | Yes | Yes | Yes | Yes (manual) |
| JSON | Yes | Yes | Yes | Yes |
| Parquet | Yes | Yes | Yes | Yes |
| Avro | Yes | — | Yes | Yes |
Format details
CSV
Standard delimited text. Two variants are supported depending on whether values are wrapped in quotes. Add/remove column is supported but requires a manual process.
JSON
JSON documents containing one outer array of records (e.g. [ {...}, {...} ]).
Parquet
Columnar binary format. Position-based, so column renames at the source are transparent.
Avro
Binary row-based format with a self-describing schema embedded in the file header. Avro is name-based (column names in the file header must match the target schema), and the ingestion automatically sanitizes special characters in column names (e.g. $, /, spaces) for downstream compatibility.
