Database Fundamentals
Core properties, theorems, and classification — the foundation every database discussion builds upon.
ACID Properties
ACID is the gold standard for transactional databases. Every relational database guarantees these four properties for each transaction.
Atomicity
A transaction is an indivisible unit — it either completes entirely or has no effect at all. If any part of a transaction fails, the entire transaction is rolled back. The database uses the Write-Ahead Log (WAL) to achieve this: before modifying any data page, the change is first recorded in the WAL. On crash, the WAL is replayed to undo incomplete transactions.
Consistency
A transaction moves the database from one valid state to another. All constraints (PRIMARY KEY, FOREIGN KEY, UNIQUE, CHECK, NOT NULL) must be satisfied before and after the transaction. This is enforced by the database engine at commit time.
Isolation
Concurrent transactions execute as if they were serial. In practice, databases offer multiple isolation levels (from Read Uncommitted to Serializable) trading off consistency for performance. Isolation is implemented through locks (pessimistic) or MVCC (optimistic).
Durability
Once a transaction commits, its changes persist even through power failures or crashes. Achieved by flushing the WAL to disk before acknowledging the commit. The data pages can be written lazily via background checkpointing.
-- Example: ACID transaction in PostgreSQL BEGIN; UPDATE accounts SET balance = balance - 500 WHERE id = 1; UPDATE accounts SET balance = balance + 500 WHERE id = 2; -- If either UPDATE fails, both are rolled back (Atomicity) -- Constraints checked at commit (Consistency) -- Other transactions see old OR new balances, never partial (Isolation) -- Once committed, changes survive crashes (Durability) COMMIT;
BASE Properties
BASE is the alternative consistency model favored by many distributed NoSQL databases. It trades strong consistency for availability and partition tolerance.
- Basically Available — The system guarantees availability as defined by the CAP theorem. Every request receives a response (not an error), though the response may contain stale data.
- Soft state — The state of the system may change over time, even without new input, due to eventual consistency mechanisms like anti-entropy and read repair propagating updates.
- Eventually consistent — Given enough time without new updates, all replicas will converge to the same value. The convergence window depends on the system — it could be milliseconds or seconds.
ACID vs BASE Comparison
| Property | ACID | BASE |
|---|---|---|
| Consistency model | Strong consistency | Eventual consistency |
| Availability | May sacrifice during partitions | Prioritizes availability |
| Focus | Correctness | Performance & availability |
| Scaling approach | Vertical (scale up) | Horizontal (scale out) |
| Transactions | Full ACID transactions | Relaxed, eventual |
| Use cases | Banking, inventory, booking | Social feeds, analytics, IoT |
| Examples | PostgreSQL, MySQL, Oracle | Cassandra, DynamoDB, Riak |
CAP Theorem
Proposed by Eric Brewer in 2000 and proven by Gilbert and Lynch in 2002, the CAP theorem states that a distributed data store can provide at most two of three guarantees simultaneously:
- Consistency (C) — Every read receives the most recent write or an error. All nodes see the same data at the same time.
- Availability (A) — Every request receives a non-error response, without the guarantee that it contains the most recent write.
- Partition Tolerance (P) — The system continues to operate despite network partitions (messages being dropped or delayed between nodes).
In practice, network partitions are inevitable in distributed systems, so the real choice is between CP (consistency + partition tolerance) and AP (availability + partition tolerance). A "CA" system only exists in a non-distributed context (single node).
PACELC Extension
Daniel Abadi proposed PACELC (2012) to address CAP's limitation — it says nothing about system behavior when there is no partition. PACELC adds the tradeoff during normal operation:
If Partition (P), choose Availability (A) or Consistency (C); Else (E), choose Latency (L) or Consistency (C).
| Database | During Partition (PAC) | Else (ELC) | Classification |
|---|---|---|---|
| PostgreSQL | PC (consistency) | EC (consistency) | PC/EC |
| Cassandra | PA (availability) | EL (latency) | PA/EL |
| MongoDB | PC (consistency) | EC (consistency) | PC/EC |
| DynamoDB | PA (availability) | EL (latency) | PA/EL |
| Cosmos DB | PA (availability) | EL (latency) | PA/EL (tunable) |
| Spanner | PC (consistency) | EC (consistency) | PC/EC |
Database Types
Relational (SQL)
Stores data in tables with rows and columns, enforces schemas, supports JOIN operations, and provides ACID transactions. Best for structured data with complex relationships. Examples: PostgreSQL, MySQL, Oracle, SQL Server.
Document
Stores semi-structured data as JSON/BSON documents. Each document can have a different structure, enabling schema flexibility. Best for content management, user profiles, catalogs. Examples: MongoDB, CouchDB, Firestore.
Key-Value
Simplest NoSQL model — stores data as key-value pairs. Extremely fast for point lookups. Best for caching, session management, feature flags. Examples: Redis, Memcached, DynamoDB, etcd.
Wide-Column (Column-Family)
Stores data in column families with flexible columns per row. Optimized for write-heavy workloads and time-series data. Best for IoT, logging, analytics. Examples: Cassandra, HBase, ScyllaDB.
Graph
Stores data as nodes and relationships (edges) with properties on both. Uses index-free adjacency for O(1) traversal. Best for social networks, fraud detection, recommendations. Examples: Neo4j, Amazon Neptune, TigerGraph.
Time-Series
Optimized for time-stamped data with high ingestion rates, automatic compression, and downsampling. Best for monitoring, metrics, IoT sensor data. Examples: TimescaleDB, InfluxDB, Prometheus.
Vector
Stores high-dimensional vectors for similarity search using algorithms like HNSW or IVFFlat. Best for AI/ML embeddings, semantic search, recommendation engines. Examples: Pinecone, Milvus, pgvector, Weaviate.