Roadmap · 22 modules · First principles

Backend Engineering — From First Principles

End-to-end backend curriculum: how a request travels the wire, hits HTTP, gets parsed, validated, routed, authorized, processed by business logic, queried against DB/cache/search, observed, and shipped. Each module: concepts → patterns → code → gotchas → interview lens.

MODULE 01 — FOUNDATIONS

Request Flow End-to-End

Browser → DNS → TCP → TLS → HTTP → LB → server → response. Every hop matters.

The full path

browser │ ├─ DNS lookup (recursive resolver → root → TLD → authoritative) │ ├─ TCP 3-way handshake (SYN → SYN-ACK → ACK) [1 RTT] ├─ TLS 1.3 handshake (ClientHello → cert → finished) [1 RTT] │ ├─ HTTP request bytes ──► public internet ──► ISP ──► transit ──► cloud edge │ │ │ ▼ │ CDN / Cloudflare / AWS edge │ │ │ ▼ │ Load Balancer (L7) │ │ │ ▼ │ Application server │ (routing → middleware │ → controller → service │ → DB / cache / queue) │ │ ◄────────────────── HTTP response (status, headers, body) ◄───────────┘

Hops & what each does

HopLayerJob
DNSAppResolve api.example.com → IP. Cached at OS, browser, resolver.
Firewall / NATL3/L4SNAT private → public IP. Drops disallowed traffic.
CDN edgeL7Serve static/cached. Terminate TLS close to user.
WAFL7OWASP rules: block SQLi/XSS patterns, geo blocks.
Load balancerL4 / L7Spread across servers. Health-check pools. Sticky or stateless.
API gatewayL7Auth, rate limit, transform, route to upstream services.
App serverL7Run your code: middleware chain → handler → response.

The response: structure

HTTP/2 200 OK
content-type: application/json; charset=utf-8
content-length: 87
cache-control: private, max-age=60
x-request-id: 7f3a-1c2b
date: Mon, 11 May 2026 09:14:22 GMT

{"id": 42, "name": "alice", "email": "a@x.com"}
MODULE 02 — PROTOCOL

HTTP Protocol

Message structure, headers, methods, CORS, status codes, caching, versions, TLS.

Raw message format

# request
POST /api/users HTTP/1.1
Host: api.example.com
Content-Type: application/json
Authorization: Bearer eyJ...
Content-Length: 41

{"email":"a@x.com","password":"hunter2"}

# response
HTTP/1.1 201 Created
Content-Type: application/json
Location: /api/users/42

{"id":42,"email":"a@x.com"}

Header families

FamilyExamplesPurpose
RequestHost, User-Agent, Accept, AuthorizationDescribe sender + intent
RepresentationalContent-Type, Content-Encoding, Content-Length, ETagDescribe body bytes
GeneralDate, Connection, Cache-Control, ViaApply both directions
SecurityStrict-Transport-Security, X-Frame-Options, Content-Security-Policy, X-Content-Type-OptionsBrowser hardening

Methods & semantics

MethodSafeIdempotentBodyUse
GETRead resource
HEADHeaders only (existence/size check)
OPTIONSCORS pre-flight, capability discovery
POSTCreate / non-idempotent action
PUTFull replace at known URI
PATCH✗*Partial update (*idempotent w/ JSON Merge Patch)
DELETERemove resource

CORS — Cross-Origin Resource Sharing

Browser enforces same-origin policy. Server opts other origins in via headers.

Simple request

Pre-flight

# browser sends first:
OPTIONS /api/users HTTP/1.1
Origin: https://app.x.com
Access-Control-Request-Method: PUT
Access-Control-Request-Headers: authorization, content-type

# server replies:
HTTP/1.1 204 No Content
Access-Control-Allow-Origin: https://app.x.com
Access-Control-Allow-Methods: GET, POST, PUT, DELETE
Access-Control-Allow-Headers: authorization, content-type
Access-Control-Allow-Credentials: true
Access-Control-Max-Age: 86400

Status codes — ones that matter

RangeCode · meaning
2xx200 OK · 201 Created · 202 Accepted (async) · 204 No Content · 206 Partial (range)
3xx301 Moved Permanent · 302 Found · 304 Not Modified (ETag hit) · 307/308 preserve method on redirect
4xx400 Bad Request · 401 Unauthorized (= unauthenticated) · 403 Forbidden · 404 Not Found · 409 Conflict · 422 Unprocessable · 429 Too Many Requests
5xx500 Internal Error · 502 Bad Gateway · 503 Service Unavailable · 504 Gateway Timeout

Caching: ETag vs max-age

# first response carries ETag
HTTP/1.1 200 OK
ETag: "v1-7f3a"
Cache-Control: max-age=60, must-revalidate

# subsequent request — conditional
GET /api/users/42
If-None-Match: "v1-7f3a"

# server unchanged → no body
HTTP/1.1 304 Not Modified
ETag: "v1-7f3a"

HTTP versions

VersionTransportMultiplexingHead-of-lineHeader compression
HTTP/1.1TCP, plaintextOne req/conn (pipelining broken)App-layerNone
HTTP/2TCP, binary framesStreams over 1 connTCP-level still blocksHPACK
HTTP/3QUIC over UDPIndependent streamsNone (per-stream loss)QPACK

Content negotiation & compression

Accept: application/json;q=0.9, application/xml;q=0.5
Accept-Encoding: gzip, br, zstd
Accept-Language: en-US, en;q=0.8

# server picks best match, replies:
Content-Type: application/json
Content-Encoding: br

TLS / HTTPS

MODULE 03 — DISPATCH

Routing

URL → handler. Method-aware. Versioned. Grouped. Fast.

Route components

GET /api/v1/users/:userId/posts?status=published&limit=20
     │   │   │     │           │
     │   │   │     │           └─ query params (filters, paging)
     │   │   │     └─ path param (resource id)
     │   │   └─ resource (collection)
     │   └─ version
     └─ namespace

Route types

TypeExampleNotes
Static/healthO(1) hash lookup possible.
Dynamic/users/:idParam capture. Most frameworks use radix/trie.
Nested / hierarchical/orgs/:org/teams/:team/membersAuthorization often cascades.
Catchall / wildcard/files/*pathGreedy — last priority.
Regex/users/{id:\d+}Type-narrowed. Powerful, slower.

API versioning strategies

StrategyExamplePros / Cons
URI/v1/users+ Visible, cache-friendly. − Many URLs.
HeaderAPI-Version: 2+ Clean URL. − Hidden in tooling.
Query?v=2+ Trivial. − Breaks caching on shared keys.
Media typeAccept: application/vnd.x.v2+json+ RESTful. − Hardest to test.

Deprecation pattern

HTTP/1.1 200 OK
Deprecation: true
Sunset: Wed, 01 Jan 2027 00:00:00 GMT
Link: <https://api.x.com/v2/users>; rel="successor-version"
Warning: 299 - "v1 deprecated; migrate to v2 by 2027-01-01"

Route grouping

# pseudo-framework
group("/api/v1", middleware=[logger, requestId]) {
  group("/auth", middleware=[rateLimit(5, "1m")]) {
    POST("/login",   loginHandler)
    POST("/refresh", refreshHandler)
  }
  group("/admin", middleware=[requireAuth, requireRole("admin")]) {
    GET("/users",         listUsers)
    DELETE("/users/:id",  deleteUser)
  }
}
MODULE 04 — DATA ON THE WIRE

Serialization & Deserialization

Native ↔ wire bytes. Pick format by audience + perf budget.

Text vs binary

JSONXMLProtobufMessagePackAvro
Readable
SchemaoptionalXSDrequirednonerequired
Sizebaseline1.5–2×0.2–0.5×0.5×0.3×
Parse speedbaselineslow10–20× faster10×
Useweb APIslegacy/SOAPgRPC, internalcache, RPCKafka, big data

JSON deep-dive

{
  "string": "hello",
  "int":    42,
  "float":  3.14,
  "bool":   true,
  "null":   null,
  "array":  [1, 2, 3],
  "nested": { "k": "v" },
  "date":   "2026-05-11T09:14:22Z"   // ISO-8601 with offset
}

Native mapping

JSONPythonGoJS/TS
objectdictstruct / map[string]anyobject
arraylist[]TArray
numberint/floatfloat64 (or typed)number
nullNonenil / zero / pointernull

Edge cases

Schema validation (JSON Schema)

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "type": "object",
  "required": ["email", "age"],
  "additionalProperties": false,
  "properties": {
    "email": { "type": "string", "format": "email", "maxLength": 254 },
    "age":   { "type": "integer", "minimum": 0, "maximum": 150 },
    "tags":  { "type": "array", "items": { "type": "string" }, "maxItems": 20 }
  }
}
MODULE 05 — TRUST

Authentication, Authorization, Security

Who you are, what you can do, how attackers try to get past it.

Authentication mechanisms

MechanismStateHowUse
Basic authstatelessAuthorization: Basic base64(user:pass)Internal/dev. HTTPS mandatory.
API keystatelessLong random string per clientServer-to-server, partner APIs.
Session cookiestateful (server)Random ID → server lookupClassic web apps.
JWT (bearer)statelessSigned claims in tokenSPAs, mobile, microservices.
OAuth 2.0delegatedAuthorization code → access tokenThird-party app access.
OIDCOAuth + identityOAuth + id_token JWTSSO / "Login with Google".
MFA+factorTOTP, WebAuthn, SMS (weak)High-value accounts.

JWT anatomy

header.payload.signature

# header
{ "alg": "RS256", "typ": "JWT", "kid": "key-2026-q2" }

# payload (claims)
{
  "sub": "user_42",
  "iss": "https://auth.x.com",
  "aud": "api.x.com",
  "exp": 1715420000,
  "iat": 1715416400,
  "scope": "read:users write:posts"
}

# signature = sign(base64(header) + "." + base64(payload), private_key)

JWT pitfalls

Password storage

# NEVER: plaintext, MD5, SHA-1, SHA-256 plain
# CORRECT: slow KDF with per-user salt
hash = argon2id(password, salt, m=64MB, t=3, p=1)
# alternatives: bcrypt (cost ≥ 12), scrypt, PBKDF2 (≥ 600k iter)

Authorization models

ModelDecision inputExample
RBAC role-based(user, role) → permsadmin, editor, viewer.
ABAC attribute-based(user.attrs, resource.attrs, env) → allow?"engineer in same dept can read".
ReBAC relationship-basedgraph: user → owns → docGoogle Docs sharing. Zanzibar.

OWASP-style attacks & defenses

AttackMechanismDefense
SQL injectionUntrusted input concatenated into SQLParameterized queries / prepared statements. Never string-interpolate.
NoSQL injectionObject-shaped input replaces operatorsSchema validate. Reject objects where strings expected.
XSSUntrusted HTML renderedContext-aware escaping. CSP header. HttpOnly cookies.
CSRFBrowser auto-sends cookiesSameSite=Lax/Strict, CSRF tokens, double-submit, Origin check.
MITMNetwork attacker reads trafficTLS everywhere, HSTS, cert pinning for mobile.
Insecure deserializationNative binary parsers on untrusted inputJSON only for untrusted; signed payloads for internal.
SSRFServer fetches attacker URLAllow-list URLs. Block link-local + metadata IPs (169.254.169.254).
IDORPredictable IDs without authz checkCheck ownership server-side every request. UUIDs help defense-in-depth.

Secure design principles

Attack prevention practices

MODULE 06 — INPUT HYGIENE

Validation, Transformation, Normalization

Fail fast on bad input. Normalize before processing. Sanitize before storing.

Three validation types

TypeWhat it checksExamples
TypeRight shapeString not array; integer not string.
SyntacticRight formatEmail regex, UUID, ISO date, phone.
SemanticRight meaningAge 0–150; endDate > startDate; SKU exists in catalog.

Client vs server

Transform & normalize

email   = email.strip().lower()
phone   = re.sub(r'\D', '', phone)        # digits only
country = country.upper()                  # "us" → "US"
name    = ' '.join(name.split())           # collapse spaces
slug    = slugify(title)                   # "Hello World!" → "hello-world"

Sanitization (escape, don't trust)

# HTML
clean_html = bleach.clean(user_html, tags=['p','b','i'], strip=True)
# SQL — never string-format
cur.execute("SELECT * FROM users WHERE id = %s", (user_id,))
# Shell — avoid; if must, use shlex.quote

Complex rules

Error aggregation

HTTP/1.1 422 Unprocessable Entity
Content-Type: application/problem+json

{
  "type": "https://x.com/errors/validation",
  "title": "Validation failed",
  "status": 422,
  "errors": [
    { "field": "email", "code": "invalid_format", "message": "not a valid email" },
    { "field": "age",   "code": "out_of_range",   "message": "must be 0–150" }
  ]
}
MODULE 07 — PIPELINE

Middleware

Cross-cutting logic in chain. Order matters more than content.

What middleware does

Canonical ordering

request ──► recovery (panic/exception catcher) ──► requestId / traceId ──► access log start ──► CORS ──► security headers (HSTS, X-Content-Type-Options, CSP) ──► body parser (json, urlencoded, multipart) ──► compression negotiation ──► rate limiter ──► authentication ──► authorization ──► validation ──► route ──► handler ──► response ◄── log finish (status, duration) ◄── error handler (if thrown)

Common middlewares

TypeExamples
Securityhelmet (sets headers), CSRF, CORS
ParsingJSON, urlencoded, multipart (file upload)
AuthJWT verify, session lookup, API-key check
Rate limittoken bucket per IP/user/route
Loggingaccess log, request-id propagation
Compressiongzip/br based on Accept-Encoding
Errorcentralized handler — maps exceptions to status codes

Keep middleware lightweight

MODULE 08 — STATE

Request Context

Per-request scratch space that flows with the call — without leaking across requests.

What lives in context

Patterns

LanguagePattern
Gocontext.Context as first arg. ctx.WithValue, ctx.Done(), ctx.Deadline.
NodeAsyncLocalStorage (avoids passing through every layer).
Pythoncontextvars.ContextVar (async-safe).
JavaThreadLocal (blocking) / Context with reactive frameworks.

Request ID propagation

# inbound middleware
const reqId = req.headers['x-request-id'] ?? randomUUID()
res.setHeader('x-request-id', reqId)
ctx.set('requestId', reqId)
logger.child({ reqId })

# outbound HTTP calls
fetch(url, { headers: { 'x-request-id': ctx.get('requestId') }})

Timeouts & cancellation

# Go
ctx, cancel := context.WithTimeout(r.Context(), 2*time.Second)
defer cancel()
row := db.QueryRowContext(ctx, "SELECT ...")    # aborts on timeout

# Node
const ctrl = new AbortController()
setTimeout(() => ctrl.abort(), 2000)
await fetch(url, { signal: ctrl.signal })
MODULE 09 — STRUCTURE

MVC, Controllers, REST APIs

Separation of concerns inside request path.

Layered responsibility

LayerOwnsDoesn't touch
Handler / ControllerParse req, validate, call service, shape responseSQL, business rules
Service (business logic)Use cases: placeOrder, cancelSubscriptionHTTP, DB driver specifics
Repository / DAOPersistence, queries, ORM callsBusiness rules, HTTP
ModelEntity definition, invariantsI/O

CRUD ↔ HTTP mapping

POST   /users           → create
GET    /users           → list (paginated)
GET    /users/:id       → read
PUT    /users/:id       → full replace
PATCH  /users/:id       → partial update
DELETE /users/:id       → remove
POST   /users/:id/reset → action (non-CRUD verb)

Standard list response

{
  "data": [ { "id": 1, "name": "alice" }, ... ],
  "meta": {
    "page":  2,
    "limit": 20,
    "total": 137,
    "hasMore": true
  },
  "links": {
    "self": "/users?page=2&limit=20",
    "next": "/users?page=3&limit=20",
    "prev": "/users?page=1&limit=20"
  }
}

Pagination styles

StyleProsCons
Offset (?page=N)Simple, jump to pageSlow on big tables; inconsistent with writes
Cursor (?after=cursor)Stable, fast, infinite-scroll friendlyNo "jump to page N"
Keyset (WHERE id > ?)Same as cursor; index-friendlyRequires sortable monotonic key

Search / sort / filter

GET /products?q=phone&category=electronics&minPrice=100&sort=-price,name&page=2

# parsed:
{
  q:          "phone",
  filters:    { category: "electronics", price: { gte: 100 } },
  sort:       [{ field: "price", dir: "desc" }, { field: "name", dir: "asc" }],
  page:       2
}

REST principles

MODULE 10 — PERSISTENCE

Databases

Storage shape, consistency, indexing, query plans, ORMs.

Relational vs non-relational

Relational (Postgres, MySQL)Document (Mongo)Key-value (Redis, DynamoDB)Wide-column (Cassandra)
Schemafixedflexiblenonerow-flexible
Joinsstrongweak (lookup/aggregate)nonenone
Txnfull ACIDper-doc, multi-doc limitedper-keyper-row
Scalevertical + read replicasshard by keyhorizontalhorizontal
Usetransactional, complex queriesnested objects, agile schemacache, hot keystime-series, massive write

ACID

CAP theorem

Under network Partition, must choose: Consistency (reject reads) or Availability (serve possibly-stale). Real systems pick on partition — most of the time partitions are rare and you have both.

Indexing — rules

Query optimization workflow

EXPLAIN ANALYZE
SELECT u.name, COUNT(o.id)
FROM users u
JOIN orders o ON o.user_id = u.id
WHERE u.country = 'US' AND o.created_at > now() - interval '30 day'
GROUP BY u.name;

# look for:
#   Seq Scan on big table         → missing index
#   high "rows removed by filter" → predicate not pushed to index
#   Sort spilled to disk          → work_mem too low
#   Nested Loop on big rowcounts  → expected Hash/Merge join

Connection pooling

Constraints & transactions

BEGIN;
  UPDATE accounts SET balance = balance - 100 WHERE id = $1;
  UPDATE accounts SET balance = balance + 100 WHERE id = $2;
  INSERT INTO transfers (from_id, to_id, amount) VALUES ($1, $2, 100);
COMMIT;

-- table constraints catch invariants:
CHECK (balance >= 0)
UNIQUE (email)
FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE

ORMs & migrations

MODULE 11 — DOMAIN

Business Logic Layer

Where rules live. Independent of HTTP and DB drivers.

Three-layer architecture

┌──────────────────────────────────┐ │ Presentation │ routes, controllers, DTOs, │ (HTTP / gRPC / CLI) │ validation, serialization └────────────┬─────────────────────┘ │ calls ┌────────────▼─────────────────────┐ │ Business Logic │ use-case services, domain models, │ (pure, framework-agnostic) │ rules, invariants, orchestration └────────────┬─────────────────────┘ │ uses ports ┌────────────▼─────────────────────┐ │ Data Access │ repositories, ORM, SQL, cache │ (Postgres / Redis / S3 / 3rd p.) │ adapters, external clients └──────────────────────────────────┘

Why split

SOLID applied

PrincipleHow it shows up
Single responsibilityOne service = one use case. RegisterUserService.handle().
Open/closedNew auth provider = new adapter implementing AuthPort; existing code unchanged.
LiskovAny UserRepository impl must honor contract (same shapes/errors).
Interface segregationReadOnlyUserRepo vs full repo for handlers that only read.
Dependency inversionService depends on EmailSenderPort (interface), not SendgridClient (concrete).

Error propagation pattern

# BLL throws domain errors
class DomainError(Exception): ...
class NotFound(DomainError): ...
class Forbidden(DomainError): ...
class Conflict(DomainError): ...

# presentation layer maps to HTTP
{
  NotFound:  404,
  Forbidden: 403,
  Conflict:  409,
  Validation: 422,
  DomainError: 500,
}[type(e)]
MODULE 12 — SPEED

Caching

Trade staleness for latency. Choose layer, strategy, eviction with intent.

Caching layers

LayerLatencyScopeExample
CPU L1/L2/L3nsPer-core
App in-memoryµsPer-processLRU map, Caffeine
Distributed0.5–2 msClusterRedis, Memcached
CDN edge1–30 msRegionCloudflare, CloudFront
Browser0 msPer-clientHTTP cache

Strategies

StrategyReadWriteUse
Cache-aside (lazy)app checks cache → on miss reads DB → fills cacheapp writes DB → invalidates cacheDefault. Most common.
Read-throughcache lib reads DB on misssameCleaner code, library-dependent.
Write-throughapp writes cache → cache writes DB synchronouslyConsistent cache, slower writes.
Write-behindapp writes cache; DB written asyncFast writes; risk on crash.

Eviction policies

Invalidation patterns

# by key
cache.delete(f"user:{id}")

# by tag (Redis-stack, Varnish)
cache.delete_by_tag(f"user:{id}")

# fan-out (pub/sub)
pubsub.publish("cache:invalidate", {"keys": [f"user:{id}"]})

# versioned key — never need to delete
cache.set(f"user:{id}:v{version}", data)
# bump version on write → old key TTLs out naturally

Use-case recipes

MODULE 13 — ASYNC WORK

Queues, Background Jobs, Emails

Don't make user wait. Hand off to workers.

What belongs off request path

Architecture

┌─────────┐ enqueue ┌────────┐ pull ┌────────┐ │Producer │ ─────────────► │ Broker │ ───────────►│Consumer│ │ (API) │ │ (Redis,│ │(worker)│ └─────────┘ │ SQS, │◄─── ack ────└────┬───┘ │ Kafka) │ │ └────────┘ ▼ side effects: DB, email, S3, API

Broker comparison

Redis (BullMQ, Sidekiq)RabbitMQSQSKafka
Modellist/streamAMQP exchangesdistributed queuepartitioned log
OrderFIFO per queueFIFO per queueFIFO queue typeFIFO per partition
DurabilityRDB/AOFpersistent queuesmulti-AZreplicated log
Replayfull
Best forweb appscomplex routingAWS-native, simpleevent-sourcing, analytics

Job semantics

Chaining & concurrency

# BullMQ-style flow
const flow = new FlowProducer()
await flow.add({
  name: 'order-complete',
  queueName: 'orders',
  children: [
    { name: 'charge-card',     queueName: 'payments' },
    { name: 'send-receipt',    queueName: 'email'    },
    { name: 'update-warehouse',queueName: 'inventory'},
  ],
})
// parent runs only after all children succeed

Transactional email anatomy

Subject:    Your order #4582 is confirmed
Preheader:  Track shipping below • Need help? Reply to this email.
Body:
  Hi Alice, thanks for your order...
  [ Track shipment ]    ← single CTA
Footer:     Unsubscribe • Address (CAN-SPAM)

Scheduling

MODULE 14 — SEARCH

Elasticsearch

Inverted index for full-text + analytics at scale.

Internals

Use cases

Query patterns

POST /products/_search
{
  "query": {
    "bool": {
      "must":   [{ "match": { "title": "wireless headphones" } }],
      "filter": [
        { "term":  { "category": "audio" } },
        { "range": { "price": { "lte": 200 } } }
      ],
      "should": [
        { "match": { "brand": "sony" } }   // boost
      ]
    }
  },
  "aggs": {
    "by_brand": { "terms": { "field": "brand.keyword" } },
    "price_p":  { "percentiles": { "field": "price" } }
  },
  "size": 20,
  "from": 0
}

Field mapping rules

NeedMapping
Full-text search"type": "text" with analyzer
Exact match / sort / aggregate"type": "keyword"
Range queriesinteger, date, double
Both above (common)text with fields.keyword multi-field
Geogeo_point for lat/lon

Tuning

MODULE 15 — FAILURES

Error Handling

Errors are first-class output. Plan them.

Error categories

TypeWhenStrategy
SyntaxCompile / parse timeLint + CI catch.
Runtime — transientNetwork blip, DB lockedRetry with backoff + circuit breaker.
Runtime — permanentBad input, missing recordFail fast, return 4xx.
Logical / businessInsufficient funds, conflictDomain error → 4xx with code.
SystemOut of memory, disk fullCrash + restart + alert.

Strategies

Custom error types

class AppError(Exception):
    code:    str
    status:  int
    message: str
    cause:   Exception | None = None

class NotFound(AppError):
    status = 404
    code   = "NOT_FOUND"

class RateLimited(AppError):
    status = 429
    code   = "RATE_LIMITED"

Global handler

@app.errorhandler(Exception)
def handle(e):
    request_id = g.get("request_id")
    if isinstance(e, AppError):
        log.warn({"code": e.code, "rid": request_id})
        return jsonify(error=e.code, message=e.message), e.status
    log.error({"rid": request_id}, exc_info=True)
    return jsonify(error="INTERNAL", message="something went wrong"), 500

User-facing error response

{
  "error":     "INSUFFICIENT_FUNDS",
  "message":   "Balance $20.00 below $50.00 needed.",
  "requestId": "7f3a-1c2b",
  "docsUrl":   "https://x.com/docs/errors#insufficient_funds"
}

Monitoring & alerting

MODULE 16 — CONFIG

Config Management

Separate config from code. Environment-aware. Secrets isolated.

Config types

TypeExamplesWhere
Staticretry counts, page size, timeoutsYAML / JSON in repo
Environment-specificDB URL, Redis host, log levelenv vars / per-env file
SensitiveAPI keys, signing secrets, DB credssecret manager (Vault, AWS SM, GCP SM, sops)
Dynamicfeature flags, kill switchesLaunchDarkly, Unleash, Flagsmith, ConfigCat

Precedence (12-factor)

defaults (in code)
  ↓ overridden by
config file (config.yaml)
  ↓ overridden by
env vars (DATABASE_URL=...)
  ↓ overridden by
command-line flags (--port 8080)

.env workflow

# .env (NEVER commit)
DATABASE_URL=postgres://app:secret@localhost/app
JWT_SECRET=hunter2

# .env.example (commit this)
DATABASE_URL=
JWT_SECRET=

# loading
load_dotenv()
db_url = os.environ["DATABASE_URL"]    # crash fast on missing
log_level = os.environ.get("LOG_LEVEL", "info")

Feature flags

if flags.enabled("new-checkout", user_id=user.id):
    return new_checkout_flow(order)
return legacy_checkout_flow(order)

# rollout patterns:
#   percentage:    10% of users
#   targeting:     users in cohort "beta"
#   kill switch:   instantly disable broken feature without redeploy

Secret rotation

MODULE 17 — OBSERVABILITY

Logging, Monitoring, Tracing

Three pillars: logs (events), metrics (aggregates), traces (causal chains).

Logs

Levels

LevelUseAlert?
DEBUGDev troubleshootingNo
INFOLifecycle: startup, shutdown, important business eventsNo
WARNRecoverable, degradedTrend
ERRORFailed request, exceptionYes if rate spikes
FATALProcess can't continuePage

Structured logging

# DON'T
log.info(f"user {uid} placed order {oid} for ${amt}")

# DO  — JSON keys are queryable
log.info("order_placed", extra={
  "user_id":   uid,
  "order_id":  oid,
  "amount":    amt,
  "request_id": rid,
  "trace_id":  tid,
})

What NOT to log

Rotation & retention

Metrics

TypeUseExample
CounterMonotonic counthttp_requests_total{route="/users",code="200"}
GaugePoint-in-time valuedb_pool_in_use, queue_depth
HistogramDistributionhttp_duration_seconds_bucket
SummaryPre-computed quantilesp50 / p95 / p99 latency

RED / USE / Four Golden Signals

Tracing

trace_id: 4f3a... (one per request, propagated across services)
  ├─ span A: api-gateway      (1.2 ms)
  ├─ span B: auth-service     (3.0 ms)
  └─ span C: user-service     (15.4 ms)
       ├─ span D: postgres    (8.1 ms)
       └─ span E: redis       (0.3 ms)
MODULE 18 — SHUTDOWN

Graceful Shutdown

Stop without dropping in-flight work.

Signals

SignalNumberCatchableSender
SIGTERM15k8s/systemd normal stop
SIGINT2Ctrl-C
SIGHUP1Reload config (convention)
SIGKILL9Force kill — no chance to clean up

Shutdown sequence

  1. Mark unhealthy — readiness probe returns 503. LB stops sending new traffic.
  2. Drain — keep serving in-flight requests; reject new ones (or 503).
  3. Wait grace period — typically 10–30s.
  4. Close external resources — DB pool drain, flush log buffers, close file handles, ack pending queue messages.
  5. Exit 0.

Pattern (Go)

srv := &http.Server{Addr: ":8080", Handler: mux}
go srv.ListenAndServe()

sigCh := make(chan os.Signal, 1)
signal.Notify(sigCh, syscall.SIGTERM, syscall.SIGINT)
<-sigCh

ctx, cancel := context.WithTimeout(context.Background(), 25*time.Second)
defer cancel()
srv.Shutdown(ctx)        // stop accepting, finish in-flight
db.Close()
log.Sync()
os.Exit(0)

Kubernetes specifics

MODULE 19 — PERFORMANCE

Scaling, Performance, Concurrency

Find bottleneck → fix smallest → measure → repeat.

Find bottleneck

DB optimization

App-level

Concurrency vs parallelism

ConcurrencyParallelism
WhatMultiple tasks interleaved on one coreMultiple tasks on multiple cores
Wins onI/O-bound (DB, HTTP, file)CPU-bound (encoding, math)
Primitivesasync/await, goroutines, threadsprocess pools, worker threads
Pythonasyncio, aiohttpmultiprocessing (GIL blocks CPU threads)
Nodeevent loop (default)worker_threads, cluster
GogoroutinesGOMAXPROCS = #CPUs

Scaling axes

MODULE 20 — INTEGRATIONS

Advanced Integrations

Big files, real-time, push patterns.

Object storage (S3 et al.)

# pre-signed PUT
url = s3.generate_presigned_url(
  "put_object",
  Params={"Bucket": "uploads", "Key": f"users/{uid}/{uuid}.jpg",
          "ContentType": "image/jpeg"},
  ExpiresIn=900,
)
# client then: PUT url -H "Content-Type: image/jpeg" --data-binary @file

Real-time

WebSocketsServer-Sent Events (SSE)Long polling
DirectionBidirectionalServer → clientClient polls; server holds
TransportWS upgrade over TCPHTTP keep-alive + text/event-streamHTTP
ReconnectApp-levelBuilt-in (Last-Event-ID)Per-poll
UseChat, games, collabNotifications, dashboardsLegacy/fallback

Pub/Sub architecture

Webhooks (server-initiated)

PollingWebhook
InitiatorConsumerProducer
LatencyIntervalNear-instant
CostWasted pollsPay per event
ReliabilityEasy (idempotent reads)Hard (retries, DLQ, signatures)

Outbound webhook checklist

# signing
sig = hmac.new(secret, body, sha256).hexdigest()
headers = {
  "X-Webhook-Signature": f"sha256={sig}",
  "X-Webhook-Timestamp": ts,
  "X-Webhook-Id": event_id,
}

# verifying — constant-time compare!
expected = "sha256=" + hmac.new(secret, body, sha256).hexdigest()
hmac.compare_digest(expected, headers["X-Webhook-Signature"])
MODULE 21 — CONTRACT

OpenAPI Standards

Spec-first APIs: describe → generate clients/servers/docs/tests.

Why API-first

Evolution

Document anatomy

openapi: 3.1.0
info:
  title:   Orders API
  version: 1.4.0
servers:
  - url: https://api.x.com/v1

paths:
  /orders/{id}:
    get:
      operationId: getOrder
      parameters:
        - name: id
          in: path
          required: true
          schema: { type: string, format: uuid }
      responses:
        '200':
          description: OK
          content:
            application/json:
              schema: { $ref: '#/components/schemas/Order' }
        '404': { $ref: '#/components/responses/NotFound' }
      security:
        - bearerAuth: []

components:
  schemas:
    Order:
      type: object
      required: [id, status, total]
      properties:
        id:     { type: string, format: uuid }
        status: { type: string, enum: [pending, paid, shipped] }
        total:  { type: number, minimum: 0 }
  securitySchemes:
    bearerAuth:
      type: http
      scheme: bearer
      bearerFormat: JWT

Best practices

MODULE 22 — DELIVERY

Testing, Code Quality, DevOps

Ship safely. Verify automatically. Operate continuously.

Test types & pyramid

TypeScopeSpeedQuantity
UnitPure function / classmsMany (base of pyramid)
IntegrationModule + real DB / queuesecSome
ContractService boundary (consumer-driven)secPer-pair
End-to-endFull user flow through UIminFew (top of pyramid)
Load / stressSystem under trafficmin–hrPre-release
UATReal users on stagingdaysPer release
SecuritySAST, DAST, dep scan, pentestContinuous + pre-release

TDD cycle

  1. Red — write failing test for desired behavior.
  2. Green — minimum code to pass.
  3. Refactor — clean up; tests still pass.

CI/CD pipeline shape

push / PR ─► lint ─► unit ─► build ─► integration ─► sec-scan ─► sign image ─► deploy to staging ─► smoke tests ─► (manual approval?) ─► deploy to prod ─► rollout watch (auto-rollback on SLO breach)

Code quality

12-Factor App

  1. Codebase: one app, one repo, many deploys.
  2. Dependencies: explicit, isolated (lockfile).
  3. Config: in environment, never in code.
  4. Backing services: treat as attached resources (DB, queue swap by URL).
  5. Build → Release → Run: strict separation.
  6. Processes: stateless, share-nothing.
  7. Port binding: app exports HTTP itself.
  8. Concurrency: scale via process model.
  9. Disposability: fast startup, graceful shutdown.
  10. Dev/prod parity: same OS, services, data shape.
  11. Logs: stream to stdout; let platform aggregate.
  12. Admin processes: one-off scripts run in same env.

DevOps stack

LayerTools
IaCTerraform, Pulumi, CDK, CloudFormation
ContainersDocker / OCI, BuildKit, buildpacks
OrchestrationKubernetes, ECS, Nomad
CI/CDGitHub Actions, GitLab CI, ArgoCD, Flux
SecretsVault, AWS SM, sops, sealed-secrets
ObservabilityPrometheus, Grafana, Loki, Jaeger, Datadog

Deployment strategies

StrategyHowRollbackRisk
RecreateStop v1, start v2Stop v2, start v1Downtime — small apps only.
RollingReplace pods batch-by-batchRoll back same wayMixed versions during rollout.
Blue/GreenStand up v2 fully; flip LBFlip LB back to v12× capacity briefly.
Canaryv2 to 1%/5%/25%/100%Halt rollout, drain canaryNeed automated metrics gate.
Feature flagDeploy dark; toggle on for cohortToggle off — no redeployFlag-debt if not cleaned up.

Horizontal vs vertical scaling