Why Reflog?

Reflog captures immutable operational events and continuously materializes them into Parquet projections.

You get both event history and current state from one ingest path. And because outputs are Parquet, your data is not trapped in Reflog; it plugs directly into DuckDB, Spark, Polars, DataFusion, warehouses, and lakehouse tooling.

Reflog is built for teams that need reliable event logging plus analytics-ready data, without taking on Kafka/Flink-level complexity on day one.

1. Problem statement

Event data is usually split across logs, OLTP tables, and ad-hoc pipelines.

Teams are forced to choose between:

Easy ingest but hard analytics later
Powerful analytics but heavy infrastructure upfront

Reflog closes that gap.

2. What Reflog gives you

Append-only ingest for full auditability
Dual projections
- _events: complete event history
- _current: latest entity state
Parquet-native output for interoperability and query performance
Operational safety through checkpoints, segmented processing, and crash recovery patterns

3. Architecture at a glance

gRPC ingest accepts entity operation events
Events are written to append-only segments
A background processor reads closed segments
Reflog writes Parquet projections (_events, _current) with compaction and partitioning

This keeps ingest simple while still producing trusted, analytics-friendly outputs.

4. Why Parquet matters

Columnar format: efficient analytical scans on large datasets
Predicate pushdown and partition pruning: faster queries, lower compute cost
Open standard: no lock-in, easy downstream integration
Flexible workflows: works for ad hoc analysis and scheduled pipelines

5. Use cases

Audit trails and compliance: immutable record of creates/updates/deletes
Product analytics: lifecycle and behavioral events ready for BI and SQL
Data lake ingestion edge: operational events landed as query-ready Parquet
Entity snapshots: _current projection for latest-truth tables
Backfill and replay: deterministic projection rebuilds from the log
AI feature pipelines: combine history and latest state as model inputs

Why Reflog?

1. Problem statement

2. What Reflog gives you

3. Architecture at a glance

4. Why Parquet matters

5. Use cases

6. When Reflog is a fit (and when it is not)

Great fit

Not ideal (yet)