Supported file formats
File formats supported by BEEM data lake ingestion.
BEEM's data lake ingestion supports several file formats for source data. The four formats below are the recommended ones for most use cases — additional formats can be supported on request.
Compatibility matrix
| Format | Full-load | Direct (full-load) | Incremental | Add/Remove column |
|---|---|---|---|---|
| CSV | Yes | Yes | Yes | Yes (manual) |
| JSON | Yes | Yes | Yes | Yes |
| Parquet | Yes | Yes | Yes | Yes |
| Avro | Yes | — | Yes | Yes |
Format details
CSV
Standard delimited text. Two variants are supported depending on whether values are wrapped in quotes. Add/remove column is supported but requires a manual process.
JSON
JSON documents containing one outer array of records (e.g. [ {...}, {...} ]).
Parquet
Columnar binary format. Position-based, so column renames at the source are transparent.
Avro
Binary row-based format with a self-describing schema embedded in the file header. Avro is name-based (column names in the file header must match the target schema), and the ingestion automatically sanitizes special characters in column names (e.g. $, /, spaces) for downstream compatibility.
Notes
Updated 4 days ago
