Data Architecture Layers
Source → ingestion → storage → transformation → serving → consumption.
Modern Data Stack
| Layer | Purpose | Tools |
|---|---|---|
| Sources | OLTP DBs, SaaS APIs, events, files | Postgres, Stripe, Salesforce, S3 |
| Ingestion | Load raw data into warehouse / lake | Fivetran, Airbyte, Debezium, custom |
| Storage | Durable persistent layer | S3, ADLS, GCS; Snowflake, BigQuery, Redshift |
| Transform | SQL / Python on raw → models | dbt, SQLMesh, Spark, Flink |
| Orchestrate | Schedule + dependency mgmt | Airflow, Dagster, Prefect, Argo |
| Quality | Validation, freshness, anomaly | Great Expectations, dbt tests, Soda, Monte Carlo |
| Catalog / lineage | Discoverability + impact | DataHub, OpenLineage, Atlan, Unity |
| Serving | Reverse ETL, BI, ML feature | Hightouch, Looker, Tableau, Feast |
Architectural Patterns
- Lambda — batch + speed layers + serving merge. Two codepaths to maintain.
- Kappa — single streaming pipeline reprocesses log on change.
- Lakehouse — open table format on object storage (Iceberg / Delta / Hudi). One layer for batch + stream + ML.
- Data Mesh — domain-owned data products + federated governance.
- Medallion (Bronze/Silver/Gold) — raw → cleaned → business-ready zoning convention.