Resource Exhaustion Attacks
Anatomy, Signals, and Hardening
Disclaimer: This post is for educational/defensive purposes. Test only on systems you own or have explicit permission to assess. Attempting any form of DoS against third-party systems without consent is illegal.
Threat model & taxonomy aka what "resource exhaustion" really means
A few months ago, one of my containers ate all the CPU on my homelab node. Everything froze, including SSH and Grafana. This led me down a rabbit hole into resource exhaustion. When we say resource exhaustion, we're talking about deliberately driving one or more bottlenecks to saturation so that latency spikes > queues explode > retries amplify > availability collapses. Resource exhaustion attacks primarily target the A component of the CIA triad.
Common targets:
- CPU (run queue saturation, hot loops, expensive parsing, catastrophic backtracking in regex).
- Memory (heap growth, allocator pressure, page-cache contention OOM killer).
- Kernel objects (file descriptors, sockets, epoll/kqueue entries, conntrack table).
- Storage and I/O (sync writes, fsync storms, small-IO amplification).
- Network (SYN backlog, accept queue, keep-alive hoarding, slow bodies).
- Upstream services (DB connection pools, per-tenant quota), and denial-of-wallet (autoscalers reactively spin up cost).
Modern incidents are frequently L7 complexity or concurrency bombs that look like normal traffic until they hit the wrong code path.
How pressure propagates (host > kernel > runtime > app)
- Kernel scheduling: sustained CPU > 80–90% with long run queues (load >> cores) starves housekeeping; iowait rises; timers fire late.
- Memory pressure: reclaim churns; refaults spike; then OOM killer selects a victim; cgroups can contain blast radius per service. (Linux cgroup v2 is the modern control plane for per-service CPU/memory/pids/io limits.) (Kernel)
- Sockets/FDs: accept queue/backlog caps, per-proc
RLIMIT_NOFILE, and app worker pools form hard ceilings; once reached, errors devolve into resets and timeouts. Kernel caps likesomaxconnand SYN cookies influence survivability during bursts. (docs.kernel.org) - Conntrack (NAT/stateful firewalls): state table fill > packet drops; it's memory-bound and must be right-sized. (Kernel)
Signals & first responders
Host/Kernel
- CPU:
mpstat 1,pidstat -u 1, run queue length (load), context switches. - Memory:
vmstat 1,free -m, cgroupmemory.events(oom_kill,pressure), OOM logs (dmesg -T | egrep -i 'killed process|oom'). (docs.memset.com) - I/O:
iostat -xz 1,pidstat -d 1, disk util %, avg rq size. - FDs:
lsof | wc -l, per-proc/proc/$pid/fd. - Sockets:
ss -s,ss -Htan state syn-recv,established,time-wait,close-wait, SYN/accept queue ratios; backlog overruns show indmesgwith SYN cookies. (Red Hat Customer Portal) - Conntrack:
/proc/sys/net/netfilter/nf_conntrack_count,conntrack -Sfor adds/ins/drops. (Kernel)
Service
- Web: 5xx rate, surge in 499/408 (client timeouts), upstream 504s, queue wait in app thread pools, rising p95–p99 tail latencies, keep-alives held open.
Two canonical exhaustion patterns
We'll reproduce (A) host-level CPU/memory pressure with a synthetic stressor and (B) an algorithmic complexity bomb (ReDoS) inside your own service, then harden.
A) Host-level pressure with stress-ng in a container
stress-ng validates that your limits, monitors, and kill paths behave as expected when a noisy neighbour or a bug goes rogue.
# 1) Start a cgroup-contained sandbox (Docker) that mounts cgroup v2 on the host.
cat > docker-compose.yml <<'YML'
services:
noisy:
image: ubuntu:22.04
command: ["sleep","infinity"]
deploy: {}
restart: unless-stopped
ulimits:
nofile: 65536
tty: true
YML
docker compose up -d
docker exec -it $(docker compose ps -q noisy) bash
# 2) Inside container: install stress-ng and run controlled load
apt-get update && apt-get install -y stress-ng
# CPU burn across all cores for 60s
stress-ng --cpu 0 --timeout 60s
# Memory pressure ~90% of cgroup limit for 60s
stress-ng --vm 1 --vm-bytes 90% --timeout 60s
Now watch host metrics (top/vmstat), container stats, and cgroup events. If you're using cgroup v2, you can enforce ceilings:
# Replace <ctr-cgroup-path> with the container cgroup path under /sys/fs/cgroup/
cd /sys/fs/cgroup/<ctr-cgroup-path>/
echo "100000 200000" > cpu.max # cap to 50% of one CPU
echo $((1024*1024*1024)) > memory.max # 1GiB hard cap
echo 1024 > pids.max # bound fork/threads
stress-ngis a standard tool for controlled CPU/VM/I/O stress (don't run privileged modes in prod). (Ubuntu Manpages)- cgroup v2 controllers (
cpu.max,memory.max,pids.max) define enforceable per-service budgets and expose counters likememory.eventsfor OOMs. (Kernel)
Pass criteria: the service remains responsive (degraded but alive), Garbage Collection doesn't spiral, and your platform contains the noisy container without system-wide OOM without killing off processes.
B) Application-level complexity: ReDoS
Many default regex engines do backtracking; certain patterns explode to exponential runtime with crafted inputs (ReDoS), burning CPU per request. The fix is either engine choice (RE2/Rust/.NET safe modes) or pattern discipline + timeouts.
Minimal FastAPI route with a risky pattern (run only locally):
# app.py
from fastapi import FastAPI, HTTPException
import re, time
app = FastAPI()
# Risky: nested quantifiers can catastrophically backtrack on certain inputs
BAD = re.compile(r'^(a+)+$') # demo only
@app.post("/match")
def match(s: str):
t0 = time.perf_counter()
ok = BAD.match(s) is not None
dt = time.perf_counter() - t0
if dt > 0.5:
# Simulate production safeguard: shed work when regex runs too long
raise HTTPException(status_code=429, detail=f"regex timeout {dt:.3f}s")
return {"matched": ok, "t": dt}
Run:
uvicorn app:app --host 0.0.0.0 --port 8080
# In another shell, post benign vs adversarial payloads (vary lengths to observe runtime growth)
curl -XPOST 'http://127.0.0.1:8080/match?s=aaaaab' # fast
# Crafting pathological inputs locally will show runtime spikes; do not target third parties.
Mitigations:
- Prefer linear-time regex engines (e.g., RE2 bindings) where possible, or safe regex libraries provided by your language.
- Bound work: regex timeouts, request timeouts, and CPU quotas per worker; treat regex as executable code in reviews.
Production hardening (defense-in-depth)
1) Contain blast radius per service (cgroup v2 / systemd)
# /etc/systemd/system/api.service
[Service]
ExecStart=/usr/local/bin/api
# CPU & memory budgets
CPUQuota=200% # 2 cores worth
MemoryHigh=1G
MemoryMax=1.5G
# Fork/thread/file descriptor ceilings
TasksMax=1000
LimitNOFILE=65536
# I/O politeness (when supported)
IOSchedulingClass=idle
[Install]
WantedBy=multi-user.target
2) Kernel survivability under burst/flood
Tune with care (prefer per-service limits first, kernel knobs are shared resources):
# /etc/sysctl.d/50-net-survivability.conf
net.core.somaxconn = 4096 # cap for listen() backlog ceiling
net.ipv4.tcp_max_syn_backlog = 4096 # SYN queue depth
net.ipv4.tcp_syncookies = 1 # enable SYN cookies on overflow
net.netfilter.nf_conntrack_max = 262144 # size conntrack table (stateful firewalls/NAT)
somaxconnis the kernel ceiling for accept backlog; apps requesting larger values are clamped- SYN cookies help when SYN backlog overflows (they’re a last-resort defense for SYN floods).
- Conntrack table sizing is memory-sensitive; monitor
nf_conntrack_countvsnf_conntrack_max.
Don’t just crank these up, every slot consumes RAM and can worsen CPU cache pressure. Track drops and memory overhead before/after.
3) Edge/L7: shed work early (Nginx examples)
Rate-limit & connection caps:
# ip-scoped leaky bucket at ~10 r/s with burst tolerance
limit_req_zone $binary_remote_addr zone=perip:10m rate=10r/s;
# limit simultaneous connections per client and per vhost
limit_conn_zone $binary_remote_addr zone=addr:10m;
limit_conn_zone $server_name zone=perhost:10m;
server {
...
limit_req zone=perip burst=20 nodelay;
limit_conn addr 20;
limit_conn perhost 1000;
}
Drop slow request bodies / keep-alive hoarding:
http {
client_body_timeout 10s;
keepalive_timeout 15s;
keepalive_requests 100;
# prevent giant uploads from tying workers
client_max_body_size 10m;
}
limit_req_zone/limit_reqimplement request-rate enforcement (leaky bucket).limit_conn_zone/limit_conncap concurrent connections per key (client/vhost).client_body_timeout/client_max_body_sizestop slow-body and jumbo-upload abuse. Defaults and semantics documented by NGINX.- On Apache, use
mod_reqtimeoutto drop slowloris-style reads.
4) Application-layer guardrails
- Time-box regex, image parsing, decompression.
- Enforce CPU quotas per worker.
- Use token buckets and quotas per principal.
- Implement circuit breakers (return 429/503 early).
5) Platform & topology
- Terminate TLS close to clients.
- Isolate tenants by cgroup or rate limit.
- Control autoscaler surge to prevent denial-of-wallet.
Verification playbook
- Host stress:
stress-ngwithin limits, no node-wide OOM. - Slow body:
curl --limit-rate 1 --data-binary @bigfile> Nginx drops atclient_body_timeout. - ReDoS guardrails: FastAPI returns 429 after timeout.
- Conntrack: watch
nf_conntrack_countunder burst to ensure no drops.
Cheat sheet
- Put every service in cgroup v2 with CPUQuota/MemoryMax/TasksMax/LimitNOFILE.
- At the edge:
limit_req,limit_conn,client_body_timeout,client_max_body_size. - Kernel guardrails:
somaxconn,tcp_max_syn_backlog,tcp_syncookies=1,nf_conntrack_max. - Avoid backtracking regex; use RE2; enforce timeouts; shed early.
- Watch
memory.events, OOM logs, socket state mix, conntrack count, and p95–p99 tails.
Closing take
Most mystery outages during spikes are actually queues. Fix the budget and the queue disappears. Enforce budgets at the kernel, edge, and app, measure the right things, and rehearse the failure. Your error budget and cloud bill will thank you.