Master Notebook:
Networking Foundations

The definitive interview preparation guide covering every networking concept tested at FAANG & Big Tech companies — from raw bits on the wire to system design at scale.

9 Modules • 17 Diagrams • 50+ Interview Q&A • Complete Cheat Sheet

Module 1

OSI vs TCP/IP Models

The foundational reference models — every networking conversation starts here.

OSI 7-Layer Model: Deep Dive

Why It Exists

Created by ISO in 1984 as a vendor-neutral reference framework
Goal: allow interoperability between different vendors' networking equipment
It is a conceptual model, NOT an implementation — real protocols don't map 1:1
Mnemonic (bottom→top): Please Do Not Throw Sausage Pizza Away

Layer-by-Layer Breakdown

Layer	Name	PDU	Responsibilities	Key Protocols	Devices
7	Application	Data	User-facing services, process-to-process communication	HTTP, FTP, DNS, SMTP, SSH	—
6	Presentation	Data	Encoding, encryption, compression (data translation)	SSL/TLS*, JPEG, ASCII, MPEG	—
5	Session	Data	Establish/maintain/terminate sessions, sync checkpoints	NetBIOS, RPC, PPTP	—
4	Transport	Segment / Datagram	End-to-end delivery, flow control, error recovery	TCP, UDP, SCTP, QUIC*	—
3	Network	Packet	Logical addressing, routing between networks	IP, ICMP, OSPF, BGP, ARP*	Router, L3 Switch
2	Data Link	Frame	Physical addressing (MAC), error detection, media access	Ethernet, Wi-Fi (802.11), PPP	Switch, Bridge, NIC
1	Physical	Bit	Raw bit transmission over physical medium	Ethernet physical, DSL, USB	Hub, Repeater, Cable

* Protocol placement is debated — TLS spans L5-L6, ARP spans L2-L3, QUIC spans L4-L7.

Fig 1.1 — OSI 7-Layer Model mapped to TCP/IP 4-Layer Model

⚠

Interview Gotcha: "Which layer does TLS operate at?" — Trick question. TLS doesn't fit cleanly into one OSI layer. It provides Session (L5) management, Presentation (L6) encryption, and is consumed by Application (L7). In the TCP/IP model, it sits between Transport and Application. The best answer: "TLS spans L5-L6 in OSI, but operates above TCP and below HTTP in practice."

TCP/IP 4-Layer Model

Why TCP/IP Won

OSI was designed by committee (top-down, theoretical)
TCP/IP was built by engineers (bottom-up, driven by ARPANET reality)
TCP/IP shipped working code first → gained adoption → became the de facto standard
OSI layers 5/6 (Session/Presentation) proved unnecessary as separate abstractions — apps handle them

The 4 Layers

Layer	Name	Responsibility	Key Protocols
4	Application	Process-to-process data exchange, user services	HTTP, DNS, FTP, SSH, TLS, SMTP
3	Transport	End-to-end communication, reliability, flow control	TCP, UDP, SCTP
2	Internet	Logical addressing, routing across networks	IP (v4/v6), ICMP, IGMP
1	Network Access	Physical transmission + framing on the local link	Ethernet, Wi-Fi, ARP, PPP

How It Maps to the Linux Kernel

Network Access → NIC driver + net_device struct
Internet → ip_rcv() / ip_output() in net/ipv4/
Transport → tcp_v4_rcv() / udp_rcv()
Application → userspace via socket() syscall → read()/write()

Side-by-Side Comparison

Aspect	OSI Model	TCP/IP Model
Layers	7	4
Origin	ISO (International Standards Org)	DARPA / ARPANET
Approach	Prescriptive (define then build)	Descriptive (build then describe)
Session/Presentation	Separate layers (L5, L6)	Merged into Application
Physical + Data Link	Separate layers (L1, L2)	Merged into Network Access
Real-world usage	Teaching & reference	Actual Internet implementation
Protocol coupling	Model-independent of protocols	Tightly coupled with TCP/IP suite
Strictness	Strict layer boundaries	Flexible, cross-layer optimization OK

Encapsulation & Decapsulation

How Data Flows Down the Stack (Sending)

Application generates data (e.g., HTTP request body)
Transport adds TCP/UDP header → creates a segment (TCP) or datagram (UDP)
Network adds IP header → creates a packet
Data Link adds MAC header + trailer (FCS) → creates a frame
Physical converts frame to bits → sends electrical/optical/radio signals

At the Receiver (Going Up)

Each layer strips its header and passes the payload up
This is decapsulation — reverse order of encapsulation
Each layer only reads its own header; it treats everything above as opaque payload

⚙ Under the Hood: sk_buff and Zero-Copy Encapsulation

Linux kernel represents every packet as an sk_buff (socket buffer) structure
sk_buff has pointers: head, data, tail, end — they define the packet boundaries
Encapsulation is O(1): instead of copying data, the kernel just moves the data pointer backward (via skb_push()) to prepend headers
Decapsulation is O(1): skb_pull() advances the data pointer forward to strip headers
No memory allocation or copy per layer — this is why Linux can process millions of packets/sec
The sk_buff also carries metadata: timestamp, device reference, protocol info, route cache

// Simplified sk_buff header manipulation
struct sk_buff {
    unsigned char *head;  // start of allocated buffer
    unsigned char *data;  // start of current packet data
    unsigned char *tail;  // end of current packet data
    unsigned char *end;   // end of allocated buffer
};

// Prepend a header (encapsulation)
skb_push(skb, sizeof(struct iphdr));  // moves data pointer backward

// Strip a header (decapsulation)
skb_pull(skb, sizeof(struct ethhdr)); // moves data pointer forward

💡

Interview Tip: When asked "What happens when you type a URL in the browser?", structure your answer using the TCP/IP layers top-down (DNS resolve at Application → TCP handshake at Transport → IP routing at Internet → Ethernet framing at Network Access). This shows you understand the full stack.

OSI encapsulation: headers stack top-down

Module 2

Physical & Data Link Layers

Where bits meet the wire — MAC addresses, switching, ARP, and Ethernet frames.

Physical Layer (L1)

Core Responsibilities

Bit transmission: converting 1s and 0s into electrical, optical, or radio signals
Defines voltage levels, timing, data rates, cable specs, connector pinouts
No intelligence — just raw signal transport

Media Types

Medium	Type	Speed	Max Distance	Use Case
Cat5e / Cat6	Copper (twisted pair)	1 Gbps / 10 Gbps	100m	Office LANs
Cat6a / Cat7	Copper (shielded)	10 Gbps	100m	Data centers
Single-mode fiber	Optical	100+ Gbps	80+ km	Long-haul, WAN
Multi-mode fiber	Optical	10-100 Gbps	300-550m	Data center interconnects
802.11ax (Wi-Fi 6)	Radio (wireless)	Up to 9.6 Gbps	~30m indoor	Wireless LANs

Key Concepts

Bandwidth = maximum theoretical throughput (e.g., 1 Gbps link)
Throughput = actual achieved data rate (always ≤ bandwidth)
Latency = time for a bit to travel from source to destination
Latency components: propagation delay + transmission delay + queuing delay + processing delay
Collision domain = network segment where simultaneous transmissions collide (hubs share one; switches isolate per port)

Data Link Layer (L2) & MAC Addresses

MAC Address Format

48 bits (6 bytes), written as hex: AA:BB:CC:DD:EE:FF
First 3 bytes = OUI (Organizationally Unique Identifier) — identifies the manufacturer
Last 3 bytes = NIC-specific — unique per device from that manufacturer
Broadcast MAC: FF:FF:FF:FF:FF:FF — all devices on the LAN receive it
MAC addresses are layer 2 only — they don't cross router boundaries (routers rewrite MACs)

L2 Responsibilities

Framing: wrapping packets in headers/trailers for the local link
Physical addressing: using MAC addresses for local delivery
Error detection: FCS (Frame Check Sequence) using CRC-32
Media access control: CSMA/CD (Ethernet), CSMA/CA (Wi-Fi)

Ethernet Frame Structure

Fig 2.1 — Ethernet II Frame Structure with field sizes

Key Fields Explained

Preamble (7 bytes) + SFD (1 byte): synchronization pattern (alternating 1010... then 10101011) — tells NIC "frame is starting"
EtherType: identifies the upper-layer protocol — 0x0800 = IPv4, 0x0806 = ARP, 0x86DD = IPv6
FCS (Frame Check Sequence): CRC-32 checksum for error detection (not correction — corrupt frames are silently dropped)
MTU (Maximum Transmission Unit): max payload size = 1500 bytes for standard Ethernet; jumbo frames allow up to 9000 bytes

Switching & VLANs

How a Switch Learns & Forwards

Learning: switch reads the source MAC of incoming frames and records MAC → port mapping in its MAC address table (CAM table)
Flooding: if destination MAC is unknown (not in table), switch forwards frame out all ports except the source port
Forwarding: if destination MAC is known, switch sends frame only to the correct port
Filtering: if source and destination are on the same port, frame is dropped (no need to forward)
Aging: MAC entries expire after a timeout (default ~300 seconds) to handle device moves

VLANs (Virtual LANs)

Logically segment a single physical switch into multiple broadcast domains
802.1Q tag: 4-byte header inserted between Src MAC and EtherType, containing VLAN ID (12 bits → up to 4094 VLANs)
Access port: carries traffic for one VLAN only (untagged)
Trunk port: carries traffic for multiple VLANs (tagged with 802.1Q)
Inter-VLAN routing: devices on different VLANs CANNOT communicate without a Layer 3 device (router or L3 switch)

Spanning Tree Protocol (STP)

Problem: redundant switch links create loops → broadcast storms (frames circle forever)
Solution: STP (IEEE 802.1D) elects a root bridge and blocks redundant paths
Root bridge election: switch with lowest Bridge ID (priority + MAC) wins
Blocked ports stay in standby; if active link fails, blocked port activates (convergence: 30-50 seconds for classic STP)
RSTP (802.1w): rapid convergence in ~1-2 seconds

ARP (Address Resolution Protocol)

Purpose

Maps IP address → MAC address on the local network
Required because Ethernet frames need destination MAC, but applications only know IP addresses
Operates at the boundary of L2 and L3

ARP Resolution Process

Fig 2.2 — ARP Request (broadcast) / Reply (unicast) flow

ARP Details

ARP Request: sent as Ethernet broadcast (FF:FF:FF:FF:FF:FF) — all hosts on the LAN receive it
ARP Reply: sent as unicast — only back to the requester
ARP Cache: each host maintains a local cache of IP→MAC mappings (check with arp -a)
Cache timeout: typically 15-20 minutes (Linux: configurable via gc_stale_time)
Gratuitous ARP: a host broadcasts its own IP→MAC mapping unsolicited — used for IP conflict detection and failover

ARP Security Issues

ARP Spoofing/Poisoning: attacker sends fake ARP replies to associate their MAC with a victim's IP → enables Man-in-the-Middle
Defense: Dynamic ARP Inspection (DAI) on switches, static ARP entries for critical servers, ARP rate limiting

⚙ Under the Hood: NIC Ring Buffers & NAPI

NIC uses DMA (Direct Memory Access) to write incoming frames directly into kernel memory — bypasses CPU
Frames land in a ring buffer (circular queue) in kernel space, pre-allocated at driver init
Traditional model: NIC fires an interrupt per packet → CPU context switch per packet → inefficient at high rates
NAPI (New API): NIC fires one interrupt, then kernel switches to polling mode — processes batches of packets without further interrupts
When the ring buffer drains, kernel re-enables interrupts → waits for more traffic
Ring buffer size is tunable: ethtool -G eth0 rx 4096 — larger buffers reduce packet drops under burst
Packet steering: RSS (Receive Side Scaling) distributes packets across multiple CPU cores via hash of flow 5-tuple

⚠

Interview Gotcha: "Can two devices on different VLANs communicate without a router?" — No. VLANs create separate broadcast domains. Even if the devices are on the same physical switch, traffic between VLANs must go through a Layer 3 device (router-on-a-stick or L3 switch with inter-VLAN routing).

Ethernet II frame fields (bytes)

CSMA/CD collision back-off (2 stations)

Module 3

The Network Layer

IP addressing, subnetting, NAT, ICMP, and routing — how packets find their way across the Internet.

IPv4 Addressing

IPv4 Header Structure

Fig 3.1 — IPv4 Header Structure (20 bytes minimum, up to 60 with options)

Critical Header Fields

Version (4 bits): always 4 for IPv4
IHL (Internet Header Length, 4 bits): header length in 32-bit words (min 5 = 20 bytes)
TTL (Time To Live, 8 bits): decremented by 1 at each router; packet dropped when TTL=0 → prevents infinite loops
Protocol (8 bits): identifies transport protocol — 6=TCP, 17=UDP, 1=ICMP
Total Length (16 bits): entire packet size in bytes → max 65,535 bytes
Flags: DF (Don't Fragment) = if set and packet exceeds MTU, router drops it + sends ICMP "fragmentation needed"
Fragment Offset: position of this fragment within the original packet (units of 8 bytes)

Private Address Ranges (RFC 1918)

Range	CIDR	Addresses	Common Use
10.0.0.0 – 10.255.255.255	10.0.0.0/8	16,777,216	Large enterprises, cloud VPCs
172.16.0.0 – 172.31.255.255	172.16.0.0/12	1,048,576	Medium networks
192.168.0.0 – 192.168.255.255	192.168.0.0/16	65,536	Home/small office networks

Special Addresses

127.0.0.0/8 — loopback (localhost)
169.254.0.0/16 — link-local (auto-assigned when DHCP fails, aka APIPA)
0.0.0.0 — "any" address (used in routing tables and listening sockets)
255.255.255.255 — limited broadcast (current network only)

IPv6

Why IPv6

IPv4 has only 2³² ≈ 4.3 billion addresses — already exhausted
IPv6 has 2¹²⁸ ≈ 340 undecillion addresses — enough for every atom on Earth's surface
Simplified header (40 bytes fixed, no checksum, no fragmentation at routers)
Built-in IPsec support, better multicast, auto-configuration (SLAAC)

IPv4 vs IPv6 Comparison

Feature	IPv4	IPv6
Address size	32 bits (4 bytes)	128 bits (16 bytes)
Notation	Dotted decimal: 192.168.1.1	Colon hex: 2001:0db8::1
Header size	20-60 bytes (variable)	40 bytes (fixed)
Header checksum	Yes (recalculated at each hop)	No (relies on L2/L4 checksums)
Fragmentation	Routers can fragment	Only the source can fragment (PMTUD required)
NAT	Widely used (address shortage)	Not needed (enough addresses)
Broadcast	Yes (255.255.255.255)	No broadcast — uses multicast instead
Auto-config	DHCP	SLAAC + optional DHCPv6
IPsec	Optional	Mandatory support (optional use)

IPv6 Address Types

Global Unicast (2000::/3): routable on the Internet (like IPv4 public addresses)
Link-Local (fe80::/10): auto-generated, valid only on the local link (always present)
Unique Local (fc00::/7): private addresses (like RFC 1918)
Multicast (ff00::/8): replaces broadcast
Loopback: ::1 (equivalent to 127.0.0.1)

Subnetting & CIDR

Core Concepts

Subnet mask: separates the network portion from the host portion of an IP address
CIDR notation: /n means the first n bits are the network prefix
Network address: all host bits = 0 (identifies the subnet itself)
Broadcast address: all host bits = 1 (sends to all hosts in subnet)
Usable hosts = 2^(32-n) - 2 (subtract network + broadcast addresses)

Quick Reference Table

CIDR	Subnet Mask	Addresses	Usable Hosts	Typical Use
/32	255.255.255.255	1	1	Host route (loopback, point-to-point)
/31	255.255.255.254	2	2*	Point-to-point links (RFC 3021)
/30	255.255.255.252	4	2	Traditional point-to-point
/28	255.255.255.240	16	14	Small subnets
/24	255.255.255.0	256	254	Standard LAN subnet
/16	255.255.0.0	65,536	65,534	Large campus/VPC
/8	255.0.0.0	16,777,216	16,777,214	Giant network (10.0.0.0/8)

* /31 uses both addresses as host addresses per RFC 3021 — no network/broadcast waste.

Worked Example: Subnet 10.0.0.0/16 into 4 Equal Subnets

Original: 10.0.0.0/16 → 65,534 usable hosts
Need 4 subnets → borrow 2 bits from host portion → new prefix: /18
Each subnet: 2¹⁴ - 2 = 16,382 usable hosts

Subnet	Network Address	Usable Range	Broadcast
1	10.0.0.0/18	10.0.0.1 – 10.0.63.254	10.0.63.255
2	10.0.64.0/18	10.0.64.1 – 10.0.127.254	10.0.127.255
3	10.0.128.0/18	10.0.128.1 – 10.0.191.254	10.0.191.255
4	10.0.192.0/18	10.0.192.1 – 10.0.255.254	10.0.255.255

NAT (Network Address Translation)

Types of NAT

Type	Mapping	Use Case
Static NAT	1:1 (one private IP ↔ one public IP)	Servers that need consistent public IP
Dynamic NAT	N:M (pool of public IPs assigned on demand)	Medium networks with enough public IPs
PAT / NAT Overload	N:1 (many private IPs share one public IP, differentiated by port)	Home routers, most enterprises

How PAT (Port Address Translation) Works

Host 192.168.1.10:5000 sends packet to 8.8.8.8:53
Router rewrites source to 203.0.113.1:12345 (public IP + random high port)
Router stores mapping in NAT translation table: 192.168.1.10:5000 ↔ 203.0.113.1:12345
Reply from 8.8.8.8:53 arrives at 203.0.113.1:12345
Router looks up the mapping, rewrites destination back to 192.168.1.10:5000

NAT Traversal Problems

Hosts behind NAT cannot receive unsolicited inbound connections
P2P is hard: both peers behind NAT → neither can initiate
STUN: discovers your public IP:port via a public server, works for most NAT types
TURN: relays all traffic through a public server (fallback when STUN fails)
ICE: framework that tries STUN first, falls back to TURN (used by WebRTC)

ICMP (Internet Control Message Protocol)

Key ICMP Message Types

Type	Code	Name	Used By
0	0	Echo Reply	`ping` response
3	0-15	Destination Unreachable	Network/host/port unreachable
3	4	Fragmentation Needed + DF Set	Path MTU Discovery
5	0-3	Redirect	Router tells host of better route
8	0	Echo Request	`ping` request
11	0	Time Exceeded (TTL=0)	`traceroute`

How Traceroute Uses ICMP

Send packet with TTL=1 → first router decrements to 0, drops it, sends back ICMP Time Exceeded
Send packet with TTL=2 → second router responds
Repeat, incrementing TTL, until destination is reached (returns ICMP Port Unreachable or Echo Reply)
Each hop's IP address and round-trip time is recorded

Path MTU Discovery (PMTUD)

Sender sets DF (Don't Fragment) flag on all packets
If a router encounters a smaller MTU, it drops the packet and sends ICMP Type 3, Code 4 with the next-hop MTU
Sender reduces packet size to fit
Problem: some firewalls block ICMP → PMTUD fails → "black hole" connections (TCP hangs)

Routing Protocols

Routing vs Forwarding

Routing = building and maintaining the routing table (control plane)
Forwarding = looking up the destination IP in the table and sending the packet out the correct interface (data plane)
Routing is slow and complex; forwarding is fast (hardware-accelerated in modern routers)

Types of Routing Protocols

Type	Algorithm	Examples	How It Works	Scope
Distance Vector	Bellman-Ford	RIP, EIGRP	Share full routing table with neighbors periodically	Small networks
Link State	Dijkstra (SPF)	OSPF, IS-IS	Flood link-state advertisements, each router builds full topology map	Enterprise / ISP internal
Path Vector	Best path selection	BGP	Exchange path (AS) information, policy-based selection	Internet backbone

BGP (Border Gateway Protocol)

The "routing protocol of the Internet" — connects autonomous systems (ASes)
eBGP: between different ASes (external) — the inter-domain routing protocol
iBGP: within the same AS (internal) — distributes external routes internally
Uses TCP port 179 for peering sessions
Path selection criteria (simplified): highest local preference → shortest AS path → lowest origin type → lowest MED → eBGP over iBGP → lowest router ID
BGP hijacking: malicious AS announces someone else's IP prefix → traffic routed through attacker
Defense: RPKI (Resource Public Key Infrastructure) — cryptographic validation of route origins

OSPF (Open Shortest Path First)

Link-state protocol using Dijkstra's algorithm to compute shortest path tree
Divides network into areas; Area 0 is the backbone (all other areas must connect to Area 0)
Sends LSAs (Link State Advertisements) when topology changes — convergence in seconds
Metric = cost (typically based on interface bandwidth: cost = reference BW / interface BW)
Supports ECMP (Equal-Cost Multi-Path) — load balances across equal-cost routes

⚙ Under the Hood: Linux Routing Table & Netfilter

Linux Routing Subsystem

ip route show — view the routing table
Longest prefix match: /32 matches before /24 before /0 (default route)
Multiple routing tables supported via policy routing (ip rule)
FIB (Forwarding Information Base) is the kernel's optimized lookup structure — uses LC-trie for O(1) lookups

Netfilter Hook Points

PREROUTING: packet just arrived, before routing decision (DNAT happens here)
INPUT: packet destined for this host
FORWARD: packet being routed through this host to another destination
OUTPUT: locally generated packet leaving
POSTROUTING: packet about to leave the interface (SNAT/masquerade happens here)

# View routing table
$ ip route show
default via 192.168.1.1 dev eth0 proto dhcp metric 100
192.168.1.0/24 dev eth0 proto kernel scope link src 192.168.1.10

# Add a static route
$ ip route add 10.0.0.0/8 via 192.168.1.254

# Policy routing: packets from 10.0.1.0/24 use table 100
$ ip rule add from 10.0.1.0/24 table 100
$ ip route add default via 10.0.1.1 table 100

⚠

Interview Gotcha: "What is the difference between routing and forwarding?" — Routing is building the map (control plane, CPU-intensive, runs BGP/OSPF). Forwarding is using the map per-packet (data plane, hardware-accelerated, nanosecond lookups). In modern SDN, these are explicitly separated: the controller does routing, switches do forwarding.

IPv4 header fields (20-byte minimum)

Enable JavaScript to view this diagram. IPv4 header is 20 bytes without options: Version, IHL, ToS/DSCP, Total Length, Identification, Flags+FragOffset, TTL, Protocol, Checksum, SrcIP, DstIP.

CIDR calculator: 192.168.10.0/22

Longest-Prefix Match routing

Module 4

The Transport Layer

TCP vs UDP, connection management, flow control, congestion control — the engine of reliable communication.

TCP vs UDP

Feature	TCP	UDP
Connection	Connection-oriented (3-way handshake)	Connectionless (fire and forget)
Reliability	Guaranteed delivery (ACKs, retransmissions)	No guarantees (best effort)
Ordering	In-order delivery (sequence numbers)	No ordering
Flow control	Yes (sliding window)	No
Congestion control	Yes (slow start, AIMD)	No
Header size	20-60 bytes	8 bytes
Speed	Slower (overhead)	Faster (minimal overhead)
Streams	Byte stream (no message boundaries)	Message-oriented (preserves boundaries)
Use cases	HTTP, SSH, email, file transfer	DNS, video streaming, gaming, VoIP

When to Use Which

TCP when: data must arrive complete and in-order (web pages, file downloads, APIs, database connections)
UDP when: speed matters more than reliability, or app handles its own reliability (real-time video, game state updates, DNS queries)
QUIC: UDP + reliability + encryption (HTTP/3) — best of both worlds for modern web

Ports & Sockets

Port Ranges

Range	Name	Assignment	Examples
0 – 1023	Well-Known	IANA-assigned, requires root/admin	80 (HTTP), 443 (HTTPS), 22 (SSH), 53 (DNS)
1024 – 49151	Registered	IANA-registered for applications	3306 (MySQL), 5432 (PostgreSQL), 8080 (alt HTTP)
49152 – 65535	Ephemeral	OS-assigned dynamically for client connections	Random source port for outgoing connections

Socket & Connection Identity

Socket = IP address + port number (e.g., 192.168.1.10:443)
Connection = uniquely identified by 5-tuple: (protocol, src IP, src port, dst IP, dst port)
This means a server on port 443 can handle millions of simultaneous connections — each has a unique 5-tuple
The 5-tuple is why PAT works: even with one public IP, different source ports distinguish connections

TCP Handshake & Teardown

3-Way Handshake (Connection Establishment)

Fig 4.1 — TCP 3-Way Handshake with sequence numbers and states

Why 3-Way (Not 2-Way)?

Both sides must synchronize sequence numbers — each direction needs its own SYN + ACK
2-way would only confirm one direction → server wouldn't know client received its sequence number
Prevents stale connection problem: old SYN packets from a previous connection would falsely establish connections

4-Way Teardown (Connection Termination)

FIN from initiator → "I'm done sending data"
ACK from receiver → "I got your FIN" (receiver can still send data = half-close)
FIN from receiver → "I'm also done sending data"
ACK from initiator → "Got it." Enters TIME_WAIT (lasts 2×MSL ≈ 60 seconds)

TIME_WAIT Explained

Duration: 2 × MSL (Maximum Segment Lifetime) — typically 60 seconds total
Why it exists: ensures delayed segments from the old connection are flushed before the same 5-tuple can be reused
Also handles the case where the final ACK is lost — allows retransmission of the last FIN
Problem at scale: servers closing many short-lived connections accumulate TIME_WAIT sockets → port exhaustion
Fix: net.ipv4.tcp_tw_reuse=1 (safe — reuses TIME_WAIT for outgoing connections with newer timestamps)

TCP Fast Open (TFO)

Eliminates 1 RTT on repeat connections by caching a crypto cookie on the first handshake
Subsequent connections send data in the SYN packet itself (along with the cookie)
Enable: net.ipv4.tcp_fastopen=3 (3 = both client and server)

Flow Control

Sliding Window Mechanism

Fig 4.2 — TCP Sliding Window: window slides right as ACKs arrive

How It Works

Receiver advertises its receive window (rwnd) in every ACK — tells sender "I have this much buffer space"
Sender cannot have more than rwnd bytes of unacknowledged data in flight
As ACKs arrive, window slides forward → sender can transmit more
Zero window: receiver's buffer is full → rwnd=0 → sender must stop and wait
Zero-window probe: sender periodically sends tiny probe packets to check if rwnd has reopened
Window scaling (RFC 7323): allows rwnd up to 1 GB (original header field maxes at 64 KB)

Congestion Control

The Problem

Flow control prevents overwhelming the receiver
Congestion control prevents overwhelming the network (routers, links)
Sender maintains a congestion window (cwnd) — actual send rate = min(cwnd, rwnd)

The 4 Phases

Slow Start: cwnd starts at 1 MSS, doubles every RTT (exponential growth) until threshold (ssthresh)
Congestion Avoidance: cwnd increases by 1 MSS per RTT (linear growth = AIMD — Additive Increase)
Fast Retransmit: 3 duplicate ACKs → assume packet lost → retransmit immediately (don't wait for timeout)
Fast Recovery: after fast retransmit, set ssthresh = cwnd/2, set cwnd = ssthresh (Multiplicative Decrease) → resume congestion avoidance

Congestion Control Algorithms Comparison

Algorithm	Detection	Approach	Best For
Reno	Packet loss (3 dup ACKs / timeout)	AIMD: linear increase, halve on loss	Classic; struggles on high-BDP links
Cubic	Packet loss	Cubic function of time since last loss; more aggressive window growth	Linux default; high-bandwidth links
BBR (Google)	Bandwidth & RTT estimation (model-based)	Probes for max bandwidth and min RTT; doesn't rely on loss	Long-fat pipes, lossy links (mobile, satellite)
Vegas	RTT changes (delay-based)	Detects congestion before packet loss via increasing RTT	Low-latency environments

Fig 4.3 — TCP Reno: Slow Start → Congestion Avoidance → Loss → Recovery

Error Recovery

Retransmission Mechanisms

Retransmission Timeout (RTO): if no ACK received within RTO, retransmit the segment
RTO calculation (Jacobson's algorithm): RTO = SRTT + 4×RTTVAR
- SRTT = smoothed RTT (exponential moving average)
- RTTVAR = RTT variance
Fast Retransmit: 3 duplicate ACKs for the same sequence number → retransmit immediately (don't wait for timeout)
SACK (Selective ACK): receiver reports which non-contiguous blocks it has received → sender retransmits only the missing segments
Without SACK: sender must retransmit everything after the gap (Go-Back-N behavior)

TCP Header Flags

Flag	Name	Purpose
`SYN`	Synchronize	Initiate connection, synchronize sequence numbers
`ACK`	Acknowledge	Acknowledgment field is valid
`FIN`	Finish	Sender is done sending data
`RST`	Reset	Abort connection immediately (error or rejection)
`PSH`	Push	Push data to application immediately (don't buffer)
`URG`	Urgent	Urgent pointer field is valid (rarely used)
`ECE`	ECN Echo	Explicit Congestion Notification received
`CWR`	Congestion Window Reduced	Sender has reduced cwnd in response to ECE

⚙ Under the Hood: Linux TCP Tuning Parameters

# Key sysctl parameters for TCP performance

# Max backlog of pending connections (SYN queue)
net.core.somaxconn = 65535

# TCP receive/send buffer sizes: min, default, max (bytes)
net.ipv4.tcp_rmem = 4096 131072 6291456
net.ipv4.tcp_wmem = 4096 16384 4194304

# Reuse TIME_WAIT sockets for new outgoing connections
net.ipv4.tcp_tw_reuse = 1

# Enable TCP Fast Open (client + server)
net.ipv4.tcp_fastopen = 3

# Enable SACK (usually on by default)
net.ipv4.tcp_sack = 1

# Enable window scaling (usually on by default)
net.ipv4.tcp_window_scaling = 1

# Congestion control algorithm
net.ipv4.tcp_congestion_control = bbr

# SO_REUSEPORT: multiple sockets bind to same port
# (kernel distributes connections across them — useful for multi-process servers)
# Set per-socket via setsockopt(SO_REUSEPORT)

⚠

Interview Gotcha: "Why does TIME_WAIT last 2×MSL?" — Two reasons: (1) ensures any delayed segments from the old connection expire before the same 5-tuple can be reused (prevents corruption of new connections), and (2) if the final ACK is lost, the peer will retransmit its FIN — the TIME_WAIT state keeps the socket alive to ACK that retransmission.

TCP 3-way handshake + 4-way close

TCP Sliding Window + cumulative ACK

TCP vs UDP header side-by-side

Enable JavaScript to view this diagram. TCP = 20 bytes minimum, UDP = 8 bytes. TCP has src/dst port, seq, ack, flags, window, checksum, urgent. UDP has only src/dst port, length, checksum.

Congestion control phases: cwnd vs time (heatmap)

Module 5

The Application Layer

HTTP evolution, DNS resolution, DHCP, and the protocols developers interact with daily.

HTTP Evolution: 1.0 → 1.1 → 2 → 3

Feature	HTTP/1.0	HTTP/1.1	HTTP/2	HTTP/3
Year	1996	1997	2015	2022
Transport	TCP	TCP	TCP	QUIC (UDP)
Connections	New TCP per request	Persistent (keep-alive)	Single multiplexed	Single multiplexed
Multiplexing	None	Pipelining (broken in practice)	Full multiplexing (streams)	Full multiplexing
Head-of-line blocking	Yes (per connection)	Yes (pipelining issue)	At TCP level (one lost packet blocks all streams)	None (independent streams)
Header format	Text	Text	Binary + HPACK compression	Binary + QPACK
Server Push	No	No	Yes (deprecated in Chrome)	Yes
TLS	Optional	Optional	Optional (but browsers require it)	Built-in (TLS 1.3 mandatory)
Connection setup	1 RTT (TCP) + 2 RTT (TLS)	Same (reused)	Same	1 RTT (0-RTT on repeat)

Fig 5.1 — HTTP/1.1 requires multiple connections; HTTP/2 multiplexes streams on one

HTTP/2 Key Concepts

Binary framing layer: messages split into frames, interleaved across streams
Stream: logical channel within a single connection; each request-response pair = one stream
HPACK: header compression using static + dynamic tables — eliminates redundant header bytes
Stream prioritization: client can hint which resources are most important
Remaining problem: TCP-level head-of-line blocking — one lost TCP segment stalls ALL streams

HTTP/3 & QUIC

Replaces TCP with QUIC (built on UDP) → each stream has independent loss recovery
Lost packet in stream 1 does NOT block streams 2, 3, 4
TLS 1.3 is integrated into QUIC → 1-RTT handshake (vs TCP + TLS = 2-3 RTTs)
Connection migration: connections identified by Connection ID, not IP:port → survives network changes (Wi-Fi → cellular)

HTTP Methods & Status Codes

Methods & Properties

Method	Purpose	Idempotent?	Safe?	Has Body?
`GET`	Retrieve a resource	Yes	Yes	No*
`POST`	Create a resource / submit data	No	No	Yes
`PUT`	Replace a resource entirely	Yes	No	Yes
`PATCH`	Partially update a resource	No*	No	Yes
`DELETE`	Delete a resource	Yes	No	Optional
`HEAD`	GET without response body (headers only)	Yes	Yes	No
`OPTIONS`	Query supported methods (CORS preflight)	Yes	Yes	No

* GET technically allows a body (RFC 7231) but most servers/proxies ignore it. PATCH can be idempotent depending on implementation.

Status Code Families

Range	Category	Key Codes
1xx	Informational	`100 Continue`, `101 Switching Protocols` (WebSocket upgrade)
2xx	Success	`200 OK`, `201 Created`, `204 No Content`
3xx	Redirection	`301 Moved Permanently`, `302 Found`, `304 Not Modified` (cache)
4xx	Client Error	`400 Bad Request`, `401 Unauthorized`, `403 Forbidden`, `404 Not Found`, `429 Too Many Requests`
5xx	Server Error	`500 Internal Server Error`, `502 Bad Gateway`, `503 Service Unavailable`, `504 Gateway Timeout`

Critical Headers

Content-Type: MIME type of body (application/json, text/html)
Cache-Control: caching directives (max-age=3600, no-cache, no-store)
ETag: entity tag for cache validation (server returns 304 if unchanged)
Authorization: credentials (Bearer <token>, Basic base64(user:pass))
Connection: keep-alive: persistent connection (HTTP/1.1 default)
Transfer-Encoding: chunked: streaming response without knowing total size upfront

DNS Deep Dive

DNS Resolution Flow

Fig 5.2 — DNS Resolution: Client sends recursive query to resolver; resolver performs iterative queries

Recursive vs Iterative Queries

Recursive: client asks resolver → resolver does ALL the work and returns the final answer
Iterative: resolver asks root → root says "ask .com" → resolver asks .com → .com says "ask ns.example.com" → resolver asks ns.example.com → gets answer
Client→Resolver is typically recursive; Resolver→authoritative servers is iterative

DNS Record Types

Type	Name	Purpose	Example
`A`	Address	Domain → IPv4 address	example.com → 93.184.216.34
`AAAA`	IPv6 Address	Domain → IPv6 address	example.com → 2606:2800:220:1:...
`CNAME`	Canonical Name	Alias → another domain name	www.example.com → example.com
`MX`	Mail Exchange	Domain → mail server (with priority)	example.com → 10 mail.example.com
`NS`	Name Server	Domain → authoritative name server	example.com → ns1.example.com
`TXT`	Text	Arbitrary text (SPF, DKIM, domain verification)	"v=spf1 include:_spf.google.com ~all"
`SOA`	Start of Authority	Zone metadata (serial, refresh, retry, expire)	Primary NS, admin email, timing params
`PTR`	Pointer	IP → domain (reverse DNS)	34.216.184.93.in-addr.arpa → example.com
`SRV`	Service	Service discovery (host, port, priority, weight)	_sip._tcp.example.com → 5060 sip.example.com

DNS Caching Hierarchy

Browser cache (Chrome: chrome://net-internals/#dns)
OS cache (systemd-resolved, nscd)
Recursive resolver cache (ISP's resolver, 8.8.8.8, 1.1.1.1)
Authoritative server (source of truth)

TTL (Time To Live): how long a record can be cached (set by authoritative server)
Low TTL (60s) = faster propagation for changes, more queries to authoritative
High TTL (86400s) = fewer queries, but changes take up to 24 hours to propagate
DNS propagation is not "propagation" — it's cache expiry across the Internet

⚠

Interview Gotcha: "Is DNS over TCP or UDP?" — Both. DNS queries use UDP port 53 (faster, most queries fit in one packet). TCP port 53 is used for: (1) zone transfers between DNS servers (AXFR/IXFR), (2) responses larger than 512 bytes (or 4096 with EDNS0), (3) when reliability is needed. EDNS0 extended the UDP payload to 4096 bytes, reducing the need for TCP fallback.

DHCP (Dynamic Host Configuration Protocol)

DORA Process

Discover: client broadcasts "I need an IP address" (UDP, src=0.0.0.0, dst=255.255.255.255, port 67)
Offer: DHCP server(s) respond with an available IP + config
Request: client broadcasts acceptance of one offer (informing other servers to retract)
Acknowledge: selected server confirms the lease

What DHCP Provides

IP address + subnet mask
Default gateway (router)
DNS server addresses
Lease duration (how long the IP assignment is valid)
Optional: NTP servers, domain name, TFTP server (for PXE boot)

DHCP Relay

DHCP uses broadcast → doesn't cross router boundaries
DHCP relay agent (on the router) intercepts broadcasts and forwards them as unicast to a DHCP server on another subnet

FTP, SSH, SMTP

FTP (File Transfer Protocol)

Uses two connections: control (port 21) and data (port 20 or ephemeral)
Active mode: server connects back to client's data port → fails through NAT/firewalls
Passive mode: client initiates both connections → works through NAT (server sends PASV response with port)
Largely replaced by SFTP (SSH-based) or HTTPS for file transfer

SSH (Secure Shell)

Port 22, encrypted remote access and tunneling
Key exchange: Diffie-Hellman (ECDH) to establish shared secret
Authentication: password or public-key (client proves ownership of private key)
Tunneling: forward local port to remote (-L), reverse tunnel (-R), SOCKS proxy (-D)
ssh -L 3306:db.internal:3306 bastion → access remote DB via local port 3306

Email Protocols

Protocol	Port	Direction	Purpose
SMTP	25 (server-server) / 587 (client-server with STARTTLS)	Send	Sending/relaying email between servers
POP3	110 / 995 (TLS)	Receive	Download emails (removes from server)
IMAP	143 / 993 (TLS)	Receive	Sync emails (keeps on server, folders, multi-device)

⚙ Under the Hood: How Linux DNS Resolution Works

Application calls getaddrinfo() (glibc) — the universal DNS resolver function
/etc/nsswitch.conf controls lookup order: typically files dns (check /etc/hosts first, then DNS)
/etc/resolv.conf specifies DNS server IPs (often managed by systemd-resolved or NetworkManager)
systemd-resolved provides a local stub resolver at 127.0.0.53 with per-link DNS config and caching
musl libc (Alpine Linux) has a different resolver than glibc — notably, it sends queries to all resolvers simultaneously (parallel), while glibc tries them sequentially
In containers (Docker/Kubernetes): /etc/resolv.conf is injected by the container runtime, usually pointing to a cluster DNS (CoreDNS in k8s)

DNS recursive resolution: example.com

HTTP/1.1 vs HTTP/2 multiplexing timeline

HTTP/3 QUIC over UDP layer stack

Module 6

Modern Web & Security

TLS handshakes, HTTPS, CORS, WebSockets, and gRPC — the security and real-time layer of modern applications.

TLS/SSL (1.2 vs 1.3)

TLS 1.2 vs 1.3 Comparison

Feature	TLS 1.2	TLS 1.3
Handshake RTTs	2 RTT	1 RTT (0-RTT on resumption)
Key exchange	RSA or ECDHE (configurable)	ECDHE only (forward secrecy mandatory)
Cipher suites	~37 cipher suites (many weak)	5 cipher suites (all strong)
Symmetric encryption	AES-CBC, AES-GCM, RC4, 3DES	AES-GCM, ChaCha20-Poly1305 only
RSA key exchange	Supported (no forward secrecy)	Removed
Handshake encryption	Cleartext after ServerHello	Encrypted after ServerHello
0-RTT resumption	Session tickets (1-RTT)	PSK + early data (0-RTT)

Fig 6.1 — TLS 1.2 requires 2 round trips; TLS 1.3 completes in 1 (0 on resumption)

Certificate Chain of Trust

Root CA: self-signed, pre-installed in OS/browser trust store (e.g., DigiCert, Let's Encrypt ISRG Root)
Intermediate CA: signed by Root CA (root stays offline for security)
Leaf/Server Certificate: signed by Intermediate CA — this is what the server presents
Client verifies chain: leaf → intermediate → root (found in trust store)

Key Concepts

Forward Secrecy: use ephemeral keys (ECDHE) so that compromising the server's long-term key doesn't decrypt past sessions
OCSP Stapling: server includes a signed "certificate not revoked" proof from CA → avoids client's separate OCSP query
Certificate Pinning: app hardcodes expected certificate/public key hash — prevents CA compromise attacks (deprecated in browsers, still used in mobile apps)
SNI (Server Name Indication): client includes hostname in cleartext ClientHello → allows virtual hosting on shared IP
ECH (Encrypted Client Hello): encrypts SNI field → hides which site you're connecting to (TLS 1.3 extension)

HTTPS & HSTS

Why HTTPS Everywhere

Confidentiality: encrypts data in transit (prevents Wi-Fi sniffing, ISP inspection)
Integrity: prevents tampering (ISP ad injection, content modification)
Authentication: proves the server is who it claims (via certificate)
Google ranks HTTPS sites higher; browsers show "Not Secure" for HTTP sites

HSTS (HTTP Strict Transport Security)

Server sends header: Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
Browser remembers: always use HTTPS for this domain (even if user types http://)
Prevents SSL stripping attacks: attacker downgrades connection to HTTP
HSTS Preload List: domain hardcoded into browsers so even the very first visit uses HTTPS
Caution: once preloaded, removing is extremely slow (~months); ensure HTTPS works perfectly first

Mixed Content

HTTPS page loading HTTP resources (scripts, images) = mixed content
Active mixed content (scripts, iframes): blocked by browsers
Passive mixed content (images, video): may load with warning

CORS (Cross-Origin Resource Sharing)

Same-Origin Policy

Browser restricts scripts from making requests to a different origin (scheme + host + port)
https://api.example.com ≠ https://www.example.com (different host)
http://example.com ≠ https://example.com (different scheme)
CORS is the mechanism to relax same-origin policy in a controlled way

Simple vs Preflight Requests

Simple request: GET/HEAD/POST with standard headers, no preflight needed
Preflight required when: custom headers, methods like PUT/DELETE/PATCH, or content-type other than form-data/text-plain/form-urlencoded

Fig 6.2 — CORS: Browser sends preflight OPTIONS before the actual PUT request

Key CORS Headers

Header	Set By	Purpose
`Access-Control-Allow-Origin`	Server	Which origins can access (specific origin or `*`)
`Access-Control-Allow-Methods`	Server	Allowed HTTP methods (GET, POST, PUT, etc.)
`Access-Control-Allow-Headers`	Server	Allowed custom headers (Authorization, Content-Type, etc.)
`Access-Control-Allow-Credentials`	Server	Whether to include cookies/auth (cannot use `*` origin with this)
`Access-Control-Max-Age`	Server	How long (seconds) to cache preflight response
`Origin`	Browser	The requesting origin (sent automatically)

WebSockets

How WebSockets Work

Starts as an HTTP/1.1 request with Upgrade: websocket header
Server responds with 101 Switching Protocols → connection upgrades to full-duplex binary/text framing
Persistent connection: both sides can send messages at any time (no request-response pattern)
Low overhead per message: 2-14 byte frame header (vs HTTP's hundreds of bytes of headers per request)

WebSocket vs Alternatives

Technique	Direction	Connection	Overhead	Use Case
WebSocket	Full-duplex (both ways)	Persistent	Very low	Chat, gaming, collaborative editing
SSE (Server-Sent Events)	Server → Client only	Persistent	Low	Live feeds, notifications, dashboards
Long Polling	Server → Client	Repeated HTTP	Medium	Legacy fallback, simple notifications
Short Polling	Client → Server	Repeated HTTP	High	Simple status checks (inefficient)

gRPC

Core Concepts

Google's RPC framework built on HTTP/2 with Protocol Buffers (protobuf) serialization
Binary format: smaller payloads, faster serialization/deserialization than JSON
Strong typing via .proto schema files → auto-generated client/server code

4 Communication Patterns

Pattern	Client	Server	Example
Unary	1 request	1 response	GetUser(id) → User
Server Streaming	1 request	Stream of responses	ListLogs(filter) → stream of LogEntry
Client Streaming	Stream of requests	1 response	UploadChunks(stream) → UploadResult
Bidirectional	Stream	Stream	Chat(stream) ↔ stream

gRPC vs REST vs GraphQL

Aspect	REST	gRPC	GraphQL
Format	JSON (text)	Protobuf (binary)	JSON
Transport	HTTP/1.1 or 2	HTTP/2	HTTP
Contract	OpenAPI (optional)	.proto (mandatory)	Schema (mandatory)
Streaming	Limited (SSE, chunked)	Native (4 patterns)	Subscriptions
Browser support	Native	Requires grpc-web proxy	Native
Best for	Public APIs, CRUD	Internal microservices, low-latency	Flexible client queries

⚙ Under the Hood: TLS Session Tickets & Key Derivation

Session Tickets: server encrypts session state, sends it to client; client presents ticket on reconnect → server decrypts, resumes without full handshake
HKDF (HMAC-based Key Derivation Function): TLS 1.3 derives all keys from a single master secret using HKDF-Extract and HKDF-Expand
Key schedule in TLS 1.3:
- Early Secret → derived from PSK (or zero for fresh connection)
- Handshake Secret → derived from ECDHE shared secret
- Master Secret → derived from handshake secret
- Traffic keys (client/server) → derived from master secret + transcript hash
0-RTT risk: early data is not protected against replay attacks — server must implement replay mitigation (or only allow idempotent requests in 0-RTT)

⚠

Interview Gotcha: "Does HTTPS encrypt the URL?" — Partially. The path and query string ARE encrypted (inside the TLS tunnel). But the domain name is visible in two places: (1) DNS query (plaintext unless DoH/DoT), (2) TLS ClientHello SNI field (plaintext unless ECH/ESNI). The destination IP is always visible.

TLS 1.3 handshake (1-RTT)

CORS preflight flow (non-simple request)

Enable JavaScript to view this timeline. Browser sends OPTIONS preflight with Origin, Access-Control-Request-Method/Headers; server replies Access-Control-Allow-*; real request follows.

XSS vs CSRF attack flow

Enable JavaScript to view this diagram. XSS: attacker injects JS into victim’s session on victim origin. CSRF: attacker site submits form using victim’s cookies.

Module 7

Networking for System Design

Load balancers, CDNs, Anycast, reverse proxies, and firewalls — building blocks of scalable architectures.

Load Balancers (L4 vs L7)

L4 vs L7 Comparison

Feature	L4 (Transport)	L7 (Application)
Inspects	IP + port (TCP/UDP headers)	Full HTTP: URL, headers, cookies, body
Speed	Very fast (kernel/hardware)	Slower (must parse HTTP)
TLS termination	No (pass-through)	Yes (terminates + re-encrypts or plaintext to backend)
Routing decisions	Based on IP:port only	URL path, hostname, headers, cookies
WebSocket support	Yes (transparent pass-through)	Yes (must understand upgrade)
Sticky sessions	Source IP hash	Cookie-based (more reliable)
Cost	Lower (less CPU)	Higher (more processing)
Examples	AWS NLB, HAProxy (TCP mode)	AWS ALB, Nginx, Envoy, HAProxy (HTTP mode)

Fig 7.1 — L4 distributes by IP:port; L7 routes by URL path, headers, and content

Load Balancing Algorithms

Algorithm	How	Pro	Con
Round Robin	Cycle through servers 1→2→3→1	Simple, even distribution	Ignores server load/capacity
Weighted Round Robin	Higher-weight servers get more requests	Accounts for heterogeneous servers	Static weights
Least Connections	Route to server with fewest active connections	Adapts to actual load	Doesn't account for request weight
IP Hash	Hash(client IP) → server	Session stickiness without cookies	Uneven with non-uniform IPs
Consistent Hashing	Hash ring with virtual nodes	Minimal remapping on server add/remove	More complex implementation
Random	Random server selection	Simplest possible	Uneven in small clusters

Health Checks

Active: LB periodically sends probes (TCP connect, HTTP GET /health) to backends
Passive: LB monitors response codes and timeouts from real traffic
Unhealthy server removed from pool; re-added after consecutive successful checks
Graceful degradation: if all servers fail health checks, some LBs still route traffic (better than total outage)

CDNs (Content Delivery Networks)

How CDNs Work

User makes DNS request for cdn.example.com
DNS resolves to the nearest edge POP (Point of Presence) via Anycast or geo-DNS
Cache HIT: edge has the content → returns it directly (low latency)
Cache MISS: edge fetches from regional cache (mid-tier) or origin server → caches it → returns to user

CDN Architecture

Fig 7.2 — CDN Cache Hierarchy: most requests served from the edge POP

Cache Invalidation Strategies

TTL-based: content expires after configured time (simple but stale window)
Purge: explicitly remove content by URL/path (API call to CDN)
Cache tags: tag content with labels, purge by tag (e.g., purge all product images)
Versioned URLs: app.v2.js or app.js?v=abc123 — new URL = new cache entry (immutable caching)
Stale-while-revalidate: serve stale content while fetching fresh version in background

Anycast

How It Works

Same IP address is advertised from multiple locations via BGP
Routers send traffic to the nearest (by BGP path) advertising location
No DNS tricks needed — it's pure routing

Use Cases

DNS root servers: 13 root server IPs, but hundreds of actual instances worldwide (all Anycast)
CDN edge routing: Cloudflare, AWS CloudFront use Anycast for user → edge routing
DDoS mitigation: attack traffic gets distributed across all Anycast locations (dilutes the attack)
Works best for stateless protocols (DNS/UDP); TCP Anycast requires connection pinning (same destination for all packets in a flow)

Reverse Proxies

Forward Proxy vs Reverse Proxy

Aspect	Forward Proxy	Reverse Proxy
Sits in front of	Clients	Servers
Client knows?	Yes (configured in client)	No (transparent to client)
Purpose	Anonymity, caching, filtering	LB, SSL termination, caching, security
Example	Squid, corporate proxy	Nginx, HAProxy, Envoy, Traefik

Reverse Proxy Functions

SSL/TLS termination: decrypt HTTPS at proxy, forward plain HTTP to backends
Compression: gzip/brotli compress responses before sending to client
Caching: cache static/dynamic responses to reduce backend load
Request routing: route /api/* to API service, /* to frontend
Rate limiting: throttle abusive clients
Header manipulation: add X-Forwarded-For, X-Request-ID

Nginx vs Envoy vs HAProxy

Feature	Nginx	Envoy	HAProxy
Config model	Static file reload	Dynamic (xDS API)	Static file reload
Observability	Basic (access logs)	Rich (distributed tracing, metrics)	Good (stats page, Prometheus)
Service mesh	No (unless using Nginx mesh)	Istio/Cilium sidecar	No
L4 support	TCP stream module	Native	Native (excellent)
gRPC	Partial	Full native support	Partial
Best for	Web serving + simple LB	Microservices, service mesh	High-performance TCP/HTTP LB

Firewalls & DDoS Mitigation

Firewall Types

Stateless (Packet Filter): inspects each packet independently against rules (src/dst IP, port, protocol). Fast but limited — can't track connections.
Stateful: tracks connection state (NEW, ESTABLISHED, RELATED). Allows return traffic automatically. This is what iptables/nftables provides.
WAF (Web Application Firewall): L7 firewall that inspects HTTP content. Blocks SQL injection, XSS, etc. (e.g., AWS WAF, Cloudflare WAF, ModSecurity)

AWS Security Model

Feature	Security Group	Network ACL
Level	Instance (ENI)	Subnet
Stateful?	Yes (return traffic auto-allowed)	No (must explicitly allow return)
Default	Deny all inbound, allow all outbound	Allow all both directions
Rules	Allow only (no deny rules)	Allow and deny rules with priority order

DDoS Mitigation Techniques

Rate limiting: cap requests per IP/token per time window (e.g., 100 req/min)
SYN cookies: don't allocate state for SYN requests until the handshake completes (prevents SYN flood)
Anycast diffusion: spread attack traffic across global POPs
Scrubbing centers: route traffic through DDoS mitigation providers (Cloudflare, AWS Shield, Akamai)
BGP blackholing: last resort — drop all traffic to a targeted IP at the ISP level

Zero Trust Networking

Traditional: "trust everything inside the network perimeter" (castle-and-moat)
Zero Trust: "never trust, always verify" — every request must be authenticated & authorized, regardless of network location
Key principles: verify identity, enforce least privilege, assume breach, inspect all traffic
Implementation: mTLS between services, identity-aware proxies (BeyondCorp), microsegmentation

⚙ Under the Hood: Consistent Hashing for Load Distribution

The Problem with Simple Hashing

Simple hash: server = hash(key) % N → adding/removing a server remaps almost all keys
With N=10 servers and a change to N=11: ~90% of keys get remapped → massive cache invalidation

Consistent Hashing Solution

Arrange a hash ring (0 to 2³²-1); place server hashes on the ring
A key maps to the next server clockwise on the ring
Adding a server: only keys between the new server and its predecessor are remapped → ~1/N keys move
Virtual nodes: each physical server gets multiple positions on the ring → more uniform distribution
Maglev hashing (Google): lookup table approach with O(1) lookup and even better consistency than consistent hashing — used in Google's load balancer

⚠

Interview Gotcha: "When would you choose L4 over L7 load balancing?" — L4 when: you need raw throughput (millions of connections), don't need content inspection, or are load balancing non-HTTP protocols (database connections, gRPC without path routing, gaming). L7 when: you need URL-based routing, header inspection, cookie-based stickiness, or TLS termination with HTTP-aware health checks.

Load balancer L4 vs L7 + CDN POP topology

Module 8

Troubleshooting & Tools

Practical usage of ping, traceroute, dig, netstat, curl, and tcpdump — the network engineer's toolkit.

ping & traceroute

ping

Sends ICMP Echo Request (type 8) → receives ICMP Echo Reply (type 0)
Measures: RTT (round-trip time), packet loss, and TTL of replies

# Basic ping with 4 packets
$ ping -c 4 google.com
PING google.com (142.250.80.46): 56 data bytes
64 bytes from 142.250.80.46: icmp_seq=0 ttl=118 time=12.3 ms
64 bytes from 142.250.80.46: icmp_seq=1 ttl=118 time=11.8 ms
64 bytes from 142.250.80.46: icmp_seq=2 ttl=118 time=12.1 ms
64 bytes from 142.250.80.46: icmp_seq=3 ttl=118 time=11.9 ms

--- google.com ping statistics ---
4 packets transmitted, 4 received, 0% packet loss
rtt min/avg/max/mdev = 11.8/12.0/12.3/0.2 ms

TTL=118: started at 128 (Windows default) → 10 hops away. Linux default=64, Cisco=255.
High RTT → network congestion or distant server
Packet loss → congestion, faulty link, or ICMP being rate-limited/blocked
Some hosts/firewalls block ICMP — "no reply" doesn't always mean unreachable

traceroute / tracert / mtr

traceroute (Linux/Mac): sends UDP packets with incrementing TTL → each hop sends back ICMP Time Exceeded
tracert (Windows): uses ICMP Echo Request with incrementing TTL
mtr: combines ping + traceroute in real-time — shows per-hop loss and latency continuously

# traceroute to identify slow or failing hops
$ traceroute -n google.com
 1  192.168.1.1     1.2 ms    1.1 ms    1.0 ms     # Home router
 2  10.0.0.1        8.5 ms    8.3 ms    8.4 ms     # ISP gateway
 3  72.14.194.226   12.1 ms   11.9 ms   12.0 ms    # ISP backbone
 4  * * *                                              # Hop blocks ICMP (firewall)
 5  142.250.80.46   12.3 ms   12.2 ms   12.1 ms    # Destination

# mtr (combined live view)
$ mtr -n --report google.com

* * * = hop doesn't respond to ICMP (common for firewall-protected routers; not necessarily a problem)
Asymmetric routing: forward path ≠ return path → latency spikes at a hop don't always mean that hop is slow
Key insight: if latency increases at hop N and stays high at hop N+1, N+2... the problem is at hop N. If it returns to normal, it's just that router being slow to respond to ICMP.

dig & nslookup

dig (DNS Information Groper)

# Simple A record lookup
$ dig example.com A +short
93.184.216.34

# Full query with sections
$ dig example.com A
;; QUESTION SECTION:
;example.com.                   IN      A

;; ANSWER SECTION:
example.com.            3600    IN      A       93.184.216.34

;; AUTHORITY SECTION:
example.com.            3600    IN      NS      a.iana-servers.net.

;; Query time: 23 msec
;; SERVER: 127.0.0.53#53(127.0.0.53)

# Query specific DNS server
$ dig @8.8.8.8 example.com A

# Trace full resolution path (root → TLD → authoritative)
$ dig +trace example.com

# Check MX records
$ dig example.com MX +short
10 mail.example.com.

# Reverse DNS lookup
$ dig -x 93.184.216.34

Reading dig Output

QUESTION: what was asked
ANSWER: the resolved records (with TTL in seconds)
AUTHORITY: authoritative nameservers for the domain
ADDITIONAL: glue records (IP addresses of nameservers)
Query time: how long the resolution took
SERVER: which resolver answered

netstat & ss

ss (modern replacement for netstat)

# List all TCP listening sockets with process info
$ ss -tulnp
State    Recv-Q  Send-Q  Local Address:Port   Peer Address:Port  Process
LISTEN   0       128     0.0.0.0:22           0.0.0.0:*          users:(("sshd",pid=1234))
LISTEN   0       128     0.0.0.0:443          0.0.0.0:*          users:(("nginx",pid=5678))
LISTEN   0       128     127.0.0.1:5432       0.0.0.0:*          users:(("postgres",pid=9012))

# Show established connections
$ ss -tn state established

# Count connections by state
$ ss -tan | awk '{print $1}' | sort | uniq -c | sort -rn
    152 ESTAB
     34 TIME-WAIT
     12 CLOSE-WAIT
      8 LISTEN

# Find connections to a specific port
$ ss -tn dport = :443

TCP Connection States to Know

State	Meaning	Concern?
`ESTABLISHED`	Active, data flowing	Normal
`TIME_WAIT`	Connection closed, waiting for stale packets to expire	Many = high connection churn (consider connection pooling)
`CLOSE_WAIT`	Remote closed, local hasn't closed yet	Bug! Application not closing sockets — connection leak
`FIN_WAIT_2`	Local sent FIN, got ACK, waiting for remote FIN	Many = remote not closing properly
`SYN_SENT`	Connection attempt in progress	Many = target unreachable or slow
`SYN_RECV`	Server received SYN, sent SYN-ACK, waiting for ACK	Many = possible SYN flood attack

curl

# Verbose output (see full request/response headers)
$ curl -v https://api.example.com/health

# Headers only
$ curl -I https://example.com

# Timing breakdown (THE most useful curl trick for debugging)
$ curl -o /dev/null -s -w "\
    dns:        %{time_namelookup}s\n\
    connect:    %{time_connect}s\n\
    tls:        %{time_appconnect}s\n\
    ttfb:       %{time_starttransfer}s\n\
    total:      %{time_total}s\n\
    size:       %{size_download} bytes\n\
    status:     %{http_code}\n" \
    https://example.com

# Output:
    dns:        0.023s       # DNS resolution time
    connect:    0.045s       # TCP handshake complete
    tls:        0.098s       # TLS handshake complete
    ttfb:       0.234s       # Time to first byte (server processing)
    total:      0.250s       # Total request time
    size:       1256 bytes
    status:     200

# POST with JSON body
$ curl -X POST -H "Content-Type: application/json" \
    -d '{"key":"value"}' https://api.example.com/data

# Follow redirects
$ curl -L https://example.com

Interpreting Timing Breakdown

High dns → DNS resolver is slow, try another (8.8.8.8, 1.1.1.1)
High connect - dns → network latency to server (far away or congested)
High tls - connect → slow TLS handshake (large cert chain, slow server crypto)
High ttfb - tls → server processing time is slow (backend issue, not network)
High total - ttfb → large response body and/or slow download speed

tcpdump & Wireshark

tcpdump Essentials

# Capture all traffic on eth0 (verbose, no DNS resolution)
$ sudo tcpdump -i eth0 -nn

# Filter by port
$ sudo tcpdump -i eth0 -nn port 443

# Filter by host
$ sudo tcpdump -i eth0 -nn host 10.0.1.5

# Capture to file for Wireshark analysis
$ sudo tcpdump -i eth0 -nn -w capture.pcap port 80

# Show TCP flags (S=SYN, .=ACK, F=FIN, R=RST, P=PSH)
$ sudo tcpdump -i eth0 -nn 'tcp[tcpflags] & (tcp-syn|tcp-fin) != 0'

# Capture only first 200 bytes of each packet
$ sudo tcpdump -i eth0 -nn -s 200 port 80

# Complex filter: HTTP requests to specific host
$ sudo tcpdump -i eth0 -nn 'dst host 10.0.1.5 and tcp dst port 80'

Reading tcpdump Output

# TCP 3-way handshake in tcpdump
14:30:01.123 IP 10.0.1.10.54321 > 10.0.1.5.80: Flags [S], seq 100, win 65535
14:30:01.124 IP 10.0.1.5.80 > 10.0.1.10.54321: Flags [S.], seq 200, ack 101, win 65535
14:30:01.124 IP 10.0.1.10.54321 > 10.0.1.5.80: Flags [.], ack 201, win 65535

# Flag meanings: S=SYN, S.=SYN+ACK, .=ACK, F=FIN, R=RST, P=PSH

Wireshark Tips

Display filters (Wireshark): http.request.method == "GET", tcp.flags.syn == 1, ip.addr == 10.0.1.5
Capture filters (BPF syntax, same as tcpdump): port 443 and host 10.0.1.5
Follow TCP Stream: right-click any packet → "Follow" → "TCP Stream" → see full conversation
Expert Info: Analyze → Expert Information → shows retransmissions, window problems, errors
TCP retransmissions: look for [TCP Retransmission] in info column — indicates packet loss

⚠

Interview Gotcha: "How would you debug a service that is reachable but slow?"
Systematic approach:

1. ping — check baseline RTT and packet loss
2. traceroute/mtr — identify if a specific hop is adding latency
3. curl timing — isolate: DNS vs connect vs TLS vs server processing vs download
4. ss — check for CLOSE_WAIT (connection leaks) or excessive TIME_WAIT
5. tcpdump — look for TCP retransmissions, window size drops, RSTs
6. Application logs — if network is fine, the bottleneck is in the application

tcpdump capture line anatomy

Module 9

Advanced Topics

Socket programming, connection pooling, service mesh, QUIC, zero-copy, kernel bypass, and eBPF — the deep end.

Socket Programming

BSD Socket API (The Foundation)

Call	Purpose	Side
`socket()`	Create a socket (specify AF_INET, SOCK_STREAM/SOCK_DGRAM)	Both
`bind()`	Assign local address:port to the socket	Server
`listen()`	Mark socket as passive (ready to accept connections), set backlog	Server
`accept()`	Block until a client connects; returns a NEW socket for that connection	Server
`connect()`	Initiate TCP handshake to server	Client
`send()`/`recv()`	Send/receive data on connected socket	Both
`close()`	Close the socket (triggers FIN)	Both

I/O Models Evolution

Model	Mechanism	Scale	Issue
Blocking I/O	One thread per connection, blocks on recv()	~1K connections	Thread overhead (stack, scheduling)
select()	Monitor up to 1024 FDs, scan all each time	~1K	O(n) scan, FD_SETSIZE limit
poll()	Like select but no FD limit	~10K	Still O(n) scan of all FDs
epoll()	Kernel maintains interest list + ready list; returns only ready FDs	~1M+	Linux-only
kqueue	BSD equivalent of epoll	~1M+	BSD/macOS only
io_uring	Async I/O via shared ring buffers with kernel	~1M+	Newer (Linux 5.1+), complex API

The C10K Problem

Problem (circa 1999): how to handle 10,000 concurrent connections on a single server
Thread-per-connection fails: 10K threads × 1MB stack = 10GB RAM just for stacks
Solution: event-driven architecture with epoll/kqueue — single thread monitors thousands of sockets
Modern reality: C10M (10 million) connections is the new challenge → requires kernel bypass (DPDK/XDP)

epoll: Edge-Triggered vs Level-Triggered

Level-triggered (LT): epoll_wait returns a FD as long as it has data available (like poll). Simpler, more forgiving.
Edge-triggered (ET): epoll_wait returns a FD only when NEW data arrives. Must read all available data immediately or it's lost until next event.
ET is faster (fewer wakeups) but requires non-blocking I/O + drain-all-data loops
Nginx uses ET; most other software uses LT

Connection Pooling

Why Pool?

TCP handshake = 1 RTT; TLS = 1-2 more RTTs → total: 2-3 RTTs before first byte
Connection pooling: establish once, reuse many times → amortize setup cost over many requests
Reduces server-side socket churn (fewer TIME_WAIT, less kernel overhead)

Types of Connection Reuse

Mechanism	Scope	Example
HTTP Keep-Alive	Reuse TCP connection for sequential HTTP requests	Browser → server (HTTP/1.1 default)
HTTP/2 Multiplexing	Multiple concurrent requests on one connection	Browser → server (single TCP connection)
Connection Pool	Pre-established pool of connections, checked out/returned	App → database (PgBouncer, HikariCP)

Pool Sizing: Little's Law

L = λ × W where L = concurrent connections needed, λ = requests/sec, W = avg latency per request
Example: 500 req/s, 20ms avg DB query → L = 500 × 0.02 = 10 connections needed
Add headroom for variance: typically 1.5-2× the calculated value
Too small pool → requests queue, latency spikes
Too large pool → wastes DB resources, can actually reduce throughput (lock contention, context switching)

Service Mesh

Architecture

Sidecar proxy pattern: every service instance gets a proxy (typically Envoy) running alongside it
All traffic goes through the sidecar → the mesh controls communication without changing application code
Data plane: the sidecars — handle actual traffic (routing, load balancing, encryption)
Control plane: centralized config (Istio's istiod, Linkerd's control plane) — pushes policy to sidecars

What a Service Mesh Provides

mTLS (mutual TLS): automatic encryption + authentication between all services
Observability: distributed tracing, metrics, access logs — without app instrumentation
Traffic management: canary deployments, A/B testing, circuit breaking, retries with budgets
Access control: fine-grained authorization policies (service A can call service B's /api/v2 only)

Istio vs Linkerd

Feature	Istio	Linkerd
Sidecar	Envoy	Linkerd2-proxy (Rust, lightweight)
Complexity	High (many CRDs, config options)	Low (opinionated, simpler)
Performance	Higher latency overhead	Lower latency overhead (~1ms p99)
Features	Full-featured (VMs, multi-cluster)	Focused on Kubernetes
Best for	Complex multi-platform environments	K8s-first, simplicity-focused teams

DNS over HTTPS (DoH) & DNS over TLS (DoT)

Comparison

Feature	Traditional DNS	DoT (DNS over TLS)	DoH (DNS over HTTPS)
Port	53 (UDP/TCP)	853 (TCP+TLS)	443 (HTTPS)
Encryption	None (plaintext)	TLS tunnel	HTTPS (HTTP/2 + TLS)
Blockable?	Easily (port 53)	Easily (port 853)	Hard (same port as all HTTPS traffic)
Enterprise visibility	Full (can inspect/log)	Can block port 853	Difficult to distinguish from regular HTTPS
Performance	Fastest (no encryption)	Slight overhead	Slight overhead

Trade-offs

Privacy benefit: ISP can't see DNS queries → can't track/sell browsing history
Enterprise concern: DoH bypasses corporate DNS filtering/monitoring
DoH controversy: centralizes DNS at a few providers (Cloudflare, Google) instead of distributed ISP resolvers

QUIC Protocol (Deep Dive)

Fig 9.1 — QUIC merges TCP reliability + TLS encryption into a single layer over UDP

Key QUIC Features

No TCP head-of-line blocking: each stream has independent loss recovery — lost packet in stream 1 doesn't block stream 2
1-RTT handshake: combines transport + crypto handshake (TCP+TLS = 2-3 RTTs)
0-RTT resumption: returning clients send data immediately using cached keys
Connection migration: connections identified by Connection ID (not IP:port 4-tuple) → survives Wi-Fi↔cellular transitions
Built-in TLS 1.3: encryption is mandatory, not optional
Userspace implementation: QUIC runs in userspace (not kernel) → faster iteration, easier deployment

⚠

Interview Gotcha: "Why is QUIC built on UDP instead of a new Layer 4 protocol?" — Middlebox ossification. NATs, firewalls, and other middleboxes are designed to pass TCP and UDP. A new L4 protocol would be dropped by most middleboxes. By building on UDP, QUIC works with existing infrastructure. This is also why QUIC encrypts almost all its headers — to prevent middleboxes from interfering.

Zero-Copy Networking & Kernel Bypass

The Problem: Traditional Data Path

NIC receives frame → DMA to kernel buffer
Kernel copies data to user-space buffer (read() syscall)
Application processes data
Application writes to user-space buffer
Kernel copies to kernel buffer (write() syscall)
Kernel sends to NIC via DMA

→ 4 copies + 4 context switches per request = significant CPU overhead

Zero-Copy Solutions

Technique	How	Copies Saved	Use Case
`sendfile()`	Kernel copies file → socket directly (no user-space)	2 copies + 2 ctx switches	Serving static files (Nginx, Apache)
`mmap()` + `write()`	Map file into user-space (kernel page cache)	1 copy	Large file access
`splice()`	Move data between two FDs via kernel pipe (no user-space copy)	2 copies	Proxying data between sockets
`io_uring`	Shared ring buffers between kernel and user-space, async I/O	Varies	High-performance async I/O (databases)
`MSG_ZEROCOPY`	Socket flag: kernel sends from user-space buffer directly	1 copy	Large sends (>10KB payloads)

Kernel Bypass

Why: even with zero-copy, the kernel network stack adds microseconds of latency per packet (interrupt handling, protocol processing, syscall overhead)
For ultra-low-latency (HFT, telecom) or ultra-high-throughput (100Gbps+), bypass the kernel entirely

Technology	How	Trade-off
DPDK	User-space poll-mode NIC drivers, hugepages, dedicated CPU cores	Burns CPU cores (100% poll loop), no kernel stack
XDP	BPF programs at NIC driver level (before sk_buff allocation)	Limited program complexity, requires BPF-compatible driver
AF_XDP	User-space socket backed by XDP — packets go NIC→user-space directly	Best of both: kernel integration + bypass speed

eBPF for Networking

What is eBPF?

Extended Berkeley Packet Filter — sandboxed programs that run inside the Linux kernel
Verified at load time (no crashes, no infinite loops) → safe to run in production
Attached to kernel hook points: XDP, TC, socket, kprobes, tracepoints
Think of it as "JavaScript for the kernel" — programmable, safe, dynamic

eBPF Hook Points for Networking

Hook	Location	Speed	Use Case
XDP	NIC driver (before sk_buff)	Fastest (~24 Mpps)	DDoS mitigation, packet filtering
TC (Traffic Control)	After sk_buff, before routing decision	Fast	Load balancing, NAT, policy enforcement
Socket	Socket operations (connect, sendmsg)	Moderate	Socket redirection, access control
cgroup	Per-cgroup network control	Moderate	Container network policy

Cilium: eBPF-Based Networking

Kubernetes CNI (Container Network Interface) plugin built on eBPF
Replaces kube-proxy: eBPF handles service load balancing instead of iptables rules
iptables with 10K+ services = slow (linear rule evaluation); eBPF = O(1) hash lookup
Provides: L3/L4/L7 network policy, transparent encryption, observability (Hubble)

iptables vs eBPF

Aspect	iptables	eBPF (Cilium)
Rule evaluation	Linear (O(n) per packet)	Hash-based (O(1))
Update cost	Full table replace	Incremental map update
At 10K services	~5 sec rule update, measurable latency	Instant update, no latency impact
Observability	Minimal (counters only)	Rich (per-flow metrics, identity-based)
L7 policy	No	Yes (HTTP, gRPC, Kafka)

⚙ Under the Hood: How epoll Works Internally

epoll instance = kernel object with two key data structures:
Interest list: red-black tree of all monitored FDs → O(log n) add/remove
Ready list: linked list of FDs with pending events → O(1) retrieval
When data arrives on a socket, the kernel's socket layer calls a callback that adds the FD to the ready list
epoll_wait() returns FDs from the ready list — only FDs with events, not all monitored FDs
This is why epoll is O(k) where k = ready FDs, while select/poll is O(n) where n = all monitored FDs
eventpoll struct in kernel: holds rbr (RB-tree root), rdllist (ready list), wq (wait queue for epoll_wait callers)

// Simplified epoll usage
int epfd = epoll_create1(0);

struct epoll_event ev;
ev.events = EPOLLIN | EPOLLET;  // edge-triggered read
ev.data.fd = server_fd;
epoll_ctl(epfd, EPOLL_CTL_ADD, server_fd, &ev);

struct epoll_event events[1024];
while (1) {
    int n = epoll_wait(epfd, events, 1024, -1);
    for (int i = 0; i < n; i++) {
        // Only ready FDs returned — no scanning!
        handle_event(events[i]);
    }
}

BGP best-path selection: 3 competing routes to 10.0.0.0/8

Enable JavaScript to view this animation. Three candidate routes: AS100-300 (local-pref 200), AS100-200-300 (AS-path length 3), AS100-400-500-300 (AS-path length 4). BGP picks by local-pref first.

Reference

Interview Cheat Sheet

Quick-reference cards for every module, rapid-fire Q&A, and protocol port numbers.

1. OSI vs TCP/IP Models

OSI = 7 layers (reference); TCP/IP = 4 layers (implementation)
OSI layers 5/6 merged into TCP/IP Application layer
TLS doesn't fit cleanly into one OSI layer (spans L5-L6)
Encapsulation: Data → Segment → Packet → Frame → Bits
Linux uses sk_buff with pointer manipulation for O(1) encapsulation
TCP/IP won because it shipped working code; OSI was theoretical

2. Physical & Data Link

MAC = 48 bits (6 bytes), first 3 = OUI (manufacturer)
Ethernet frame: Preamble | Dst MAC | Src MAC | EtherType | Payload | FCS
MTU = 1500 bytes (Ethernet payload max)
Switch learns MAC→port from source MACs; floods unknown destinations
VLANs segment broadcast domains; inter-VLAN needs L3
ARP: broadcast request, unicast reply; maps IP → MAC
STP prevents switching loops; RSTP converges in 1-2 seconds

3. Network Layer

IPv4: 32-bit, header 20-60 bytes; IPv6: 128-bit, header 40 bytes fixed
TTL decrements at each hop; 0 → packet dropped + ICMP Time Exceeded
CIDR: usable hosts = 2^(32-n) - 2
NAT (PAT): many private IPs share one public IP via port mapping
Routing = building table (control plane); Forwarding = using it (data plane)
BGP = inter-AS (Internet backbone); OSPF = intra-AS (link-state, Dijkstra)
Traceroute works by incrementing TTL

4. Transport Layer

TCP: reliable, ordered, connection-oriented, 20-60B header
UDP: best-effort, connectionless, 8B header
3-way handshake: SYN → SYN-ACK → ACK
Connection = 5-tuple (proto, src IP, src port, dst IP, dst port)
Flow control: receiver window (rwnd); Congestion: cwnd
Actual rate = min(cwnd, rwnd)
TIME_WAIT = 2×MSL (prevents old segment pollution)
Fast retransmit: 3 dup ACKs → retransmit immediately
BBR: model-based congestion control (probes bandwidth + RTT)

5. Application Layer

HTTP/1.1: persistent, head-of-line blocking
HTTP/2: binary, multiplexed streams, HPACK compression (TCP HOL remains)
HTTP/3: QUIC (UDP), no HOL blocking, built-in TLS 1.3
DNS: recursive (client→resolver), iterative (resolver→authoritative)
DNS uses UDP:53 for queries, TCP:53 for zone transfers & large responses
DHCP: Discover → Offer → Request → Acknowledge (DORA)
POST is not idempotent; PUT/DELETE are; GET is safe
301 = permanent redirect (cached); 302 = temporary

6. Modern Web & Security

TLS 1.2 = 2 RTT; TLS 1.3 = 1 RTT (0-RTT on resumption)
TLS 1.3: only ECDHE (forward secrecy mandatory), 5 cipher suites
HSTS: forces HTTPS, prevents SSL stripping; preload for first-visit
CORS: same-origin policy relaxation; preflight = OPTIONS request
HTTPS encrypts path/query but NOT domain (visible in SNI + DNS)
WebSocket: full-duplex, persistent, upgrade from HTTP
gRPC: HTTP/2 + protobuf, 4 streaming patterns, best for microservices

7. System Design Networking

L4 LB: fast, IP:port only; L7 LB: content-aware, TLS termination
CDN: edge POP → regional → origin; cache invalidation via TTL/purge/versioning
Anycast: same IP, multiple locations via BGP; DDoS dilution
Consistent hashing: ~1/N keys remapped on server add/remove
Reverse proxy: SSL term, caching, compression, routing (Nginx/Envoy)
Stateful firewall tracks connections; WAF inspects HTTP content
Zero Trust: never trust, always verify (mTLS, identity-aware proxy)

8. Troubleshooting

ping: RTT + loss; traceroute: per-hop latency; mtr: combined live
dig +trace: follow full DNS resolution path
ss -tulnp: listening sockets with PIDs
CLOSE_WAIT = bug (app not closing sockets)
curl -w timing: isolate DNS / connect / TLS / TTFB / total
tcpdump flags: S=SYN, S.=SYN-ACK, .=ACK, F=FIN, R=RST
Slow service debug: ping → traceroute → curl timing → ss states → tcpdump

9. Advanced Topics

epoll: O(k) ready FDs vs select/poll O(n) all FDs
C10K solved by event-driven (epoll); C10M needs kernel bypass (DPDK/XDP)
Connection pool size: Little's Law (L = λ × W)
Service mesh: sidecar proxy (Envoy) + control plane (Istio/Linkerd)
QUIC: UDP-based, no HOL blocking, connection migration via Connection ID
sendfile(): zero-copy file→socket; io_uring: async I/O via ring buffers
eBPF replaces iptables: O(1) vs O(n) rule evaluation; Cilium for k8s

Rapid-Fire: Top 50 Interview Questions

#	Question	Key Answer
1	What happens when you type google.com in browser?	DNS → TCP handshake → TLS handshake → HTTP GET → Server processes → Response → Browser renders
2	TCP vs UDP?	TCP: reliable, ordered, connection-oriented. UDP: fast, no guarantees, connectionless.
3	What is a 3-way handshake?	SYN(seq=x) → SYN-ACK(seq=y, ack=x+1) → ACK(ack=y+1). Synchronizes sequence numbers.
4	Why 3-way not 2-way?	Both sides must confirm they can send AND receive. Prevents stale SYN from creating ghost connections.
5	What is TIME_WAIT?	2×MSL wait after closing. Prevents old segments from polluting new connections on same 5-tuple.
6	OSI vs TCP/IP?	OSI: 7 layers, theoretical. TCP/IP: 4 layers, practical. TCP/IP merges Session/Presentation into Application.
7	What is NAT?	Translates private IP:port to public IP:port. PAT allows many hosts to share one public IP.
8	How does DNS work?	Client→recursive resolver→root NS→TLD NS→authoritative NS. Caches at each level with TTL.
9	DNS TCP or UDP?	Both. UDP for queries (<512B, or 4096 with EDNS0). TCP for zone transfers and large responses.
10	What is a subnet mask?	Separates network bits from host bits. /24 = 255.255.255.0 = 256 addresses, 254 usable.
11	L4 vs L7 load balancer?	L4: routes by IP:port (fast). L7: routes by HTTP content (URL, headers, cookies).
12	How does HTTPS work?	TCP → TLS handshake (key exchange + cert verification) → encrypted HTTP inside TLS tunnel.
13	TLS 1.2 vs 1.3?	1.3: 1-RTT (vs 2), only ECDHE (forward secrecy mandatory), 5 cipher suites, encrypted handshake.
14	What is CORS?	Browser mechanism to allow cross-origin requests. Server sets Access-Control-Allow-Origin header.
15	What is a CDN?	Edge servers cache content close to users. Reduces latency and origin server load.
16	What is ARP?	Maps IP→MAC on local network. Broadcast request, unicast reply.
17	What is a VLAN?	Logically segments a switch into separate broadcast domains. Inter-VLAN requires L3.
18	Routing vs forwarding?	Routing = building the table (control plane). Forwarding = using it per-packet (data plane).
19	What is BGP?	Internet routing protocol between autonomous systems. Path-vector, TCP:179, policy-based.
20	What is TTL?	Time To Live in IP header. Decremented per hop. 0 = drop + ICMP Time Exceeded. Prevents loops.
21	TCP flow control?	Receiver advertises rwnd (receive window). Sender can't exceed it. Prevents overwhelming receiver.
22	TCP congestion control?	Sender maintains cwnd. Slow start (exponential) → congestion avoidance (linear/AIMD). Actual rate = min(cwnd, rwnd).
23	What is DHCP?	Auto-assigns IP, mask, gateway, DNS to hosts. DORA: Discover→Offer→Request→Acknowledge.
24	HTTP/1.1 vs HTTP/2?	HTTP/2: binary framing, multiplexed streams, header compression. Still has TCP HOL blocking.
25	What is HTTP/3?	HTTP over QUIC (UDP). No HOL blocking, 1-RTT handshake, 0-RTT resumption, connection migration.
26	What is WebSocket?	Full-duplex over single TCP connection. Starts as HTTP upgrade. Low overhead per message.
27	What is gRPC?	Google RPC: HTTP/2 + protobuf (binary). 4 patterns: unary, server/client/bidi streaming.
28	What is Anycast?	Same IP from multiple locations via BGP. Traffic goes to nearest. Used for DNS, CDN, DDoS.
29	Forward vs reverse proxy?	Forward: sits in front of clients (anonymity). Reverse: sits in front of servers (LB, SSL, caching).
30	What is HSTS?	Header that forces HTTPS for a domain. Prevents SSL stripping. Preload = hardcoded in browser.
31	Stateless vs stateful firewall?	Stateless: inspects each packet independently. Stateful: tracks connections, auto-allows return traffic.
32	What is MTU?	Maximum Transmission Unit. Ethernet = 1500 bytes. Packets exceeding MTU are fragmented or dropped (DF bit).
33	What is consistent hashing?	Hash ring where adding/removing server remaps only ~1/N keys (vs ~100% with modulo).
34	CLOSE_WAIT vs TIME_WAIT?	CLOSE_WAIT: remote closed, local didn't close (bug!). TIME_WAIT: both closed, waiting for stale packets (normal).
35	What is SYN flood?	Attack: send many SYNs without ACK → fill server's SYN queue. Defense: SYN cookies.
36	What is epoll?	Linux API for monitoring many FDs. Returns only ready FDs (O(k) vs select's O(n)). Solved C10K.
37	What is a service mesh?	Sidecar proxies (Envoy) handling inter-service traffic: mTLS, observability, traffic management.
38	What is eBPF?	Sandboxed programs in the Linux kernel. Used for networking (Cilium), observability, security.
39	IPv4 vs IPv6?	v4: 32-bit, 4.3B addresses, NAT. v6: 128-bit, no NAT needed, simplified header, no broadcast.
40	What is ICMP?	Control protocol for IP. Used by ping (type 8/0), traceroute (type 11), and error messages (type 3).
41	TCP Fast Open?	Send data in SYN packet using cached cookie. Saves 1 RTT on repeat connections.
42	What is SACK?	Selective ACK. Receiver reports non-contiguous received blocks. Sender retransmits only gaps.
43	Connection pooling?	Reuse established connections. Saves handshake overhead. Pool size via Little's Law: L=λ×W.
44	What is QUIC?	UDP-based transport with built-in TLS 1.3, stream multiplexing, connection migration. Used by HTTP/3.
45	Why QUIC uses UDP?	Middlebox ossification: NATs/firewalls drop unknown protocols. UDP passes through existing infra.
46	What is zero-copy?	Avoid copying data between kernel/user space. sendfile() goes file→socket in kernel. Nginx uses this.
47	What is DPDK?	User-space networking: poll-mode NIC drivers, hugepages. Bypasses kernel for ultra-low latency.
48	Does HTTPS encrypt the URL?	Path/query: yes. Domain: visible in DNS query and TLS SNI (unless DoH + ECH).
49	What is certificate pinning?	App hardcodes expected cert/key hash. Prevents CA compromise attacks. Deprecated in browsers.
50	Debug slow-but-reachable service?	ping (RTT) → traceroute (hop latency) → curl timing (DNS/connect/TLS/TTFB) → ss (conn states) → tcpdump (retransmissions)

Protocol Port Numbers Quick Reference

Protocol	Port(s)	Transport	Notes
HTTP	80	TCP
HTTPS	443	TCP (or UDP for QUIC/HTTP3)
DNS	53	UDP + TCP	UDP for queries, TCP for zone transfers
DNS over TLS (DoT)	853	TCP
SSH	22	TCP
FTP	20 (data) / 21 (control)	TCP
SFTP	22	TCP	Runs over SSH
SMTP	25 / 587 (submission)	TCP	587 with STARTTLS
POP3	110 / 995 (TLS)	TCP
IMAP	143 / 993 (TLS)	TCP
DHCP	67 (server) / 68 (client)	UDP
SNMP	161 / 162 (trap)	UDP
NTP	123	UDP
BGP	179	TCP
MySQL	3306	TCP
PostgreSQL	5432	TCP
Redis	6379	TCP
MongoDB	27017	TCP
Kubernetes API	6443	TCP
RDP	3389	TCP/UDP

Master Notebook:Networking Foundations

OSI 7-Layer Model: Deep Dive

Why It Exists

Layer-by-Layer Breakdown

TCP/IP 4-Layer Model

Why TCP/IP Won

The 4 Layers

How It Maps to the Linux Kernel

Side-by-Side Comparison

Encapsulation & Decapsulation

How Data Flows Down the Stack (Sending)

At the Receiver (Going Up)

Physical Layer (L1)

Core Responsibilities

Media Types

Key Concepts

Data Link Layer (L2) & MAC Addresses

MAC Address Format

L2 Responsibilities

Ethernet Frame Structure

Key Fields Explained

Switching & VLANs

How a Switch Learns & Forwards

VLANs (Virtual LANs)

Spanning Tree Protocol (STP)

ARP (Address Resolution Protocol)

Purpose

ARP Resolution Process

ARP Details

ARP Security Issues

IPv4 Addressing

IPv4 Header Structure

Critical Header Fields

Private Address Ranges (RFC 1918)

Special Addresses

IPv6

Why IPv6

IPv4 vs IPv6 Comparison

IPv6 Address Types

Subnetting & CIDR

Core Concepts

Quick Reference Table

Worked Example: Subnet 10.0.0.0/16 into 4 Equal Subnets

NAT (Network Address Translation)

Types of NAT

How PAT (Port Address Translation) Works

NAT Traversal Problems

ICMP (Internet Control Message Protocol)

Key ICMP Message Types

How Traceroute Uses ICMP

Path MTU Discovery (PMTUD)

Routing Protocols

Routing vs Forwarding

Types of Routing Protocols

BGP (Border Gateway Protocol)

OSPF (Open Shortest Path First)

Linux Routing Subsystem

Netfilter Hook Points

TCP vs UDP

When to Use Which

Ports & Sockets

Port Ranges

Socket & Connection Identity

TCP Handshake & Teardown

3-Way Handshake (Connection Establishment)

Why 3-Way (Not 2-Way)?

4-Way Teardown (Connection Termination)

TIME_WAIT Explained

TCP Fast Open (TFO)

Flow Control

Sliding Window Mechanism

How It Works

Congestion Control

The Problem

The 4 Phases

Congestion Control Algorithms Comparison

Error Recovery

Retransmission Mechanisms

TCP Header Flags

HTTP Evolution: 1.0 → 1.1 → 2 → 3

Master Notebook:
Networking Foundations