Resource Exhaustion Attacks

Anatomy, Signals, and Hardening

Disclaimer: This post is for educational/defensive purposes. Test only on systems you own or have explicit permission to assess. Attempting any form of DoS against third-party systems without consent is illegal.

Threat model & taxonomy aka what "resource exhaustion" really means

A few months ago, one of my containers ate all the CPU on my homelab node. Everything froze, including SSH and Grafana. This led me down a rabbit hole into resource exhaustion. When we say resource exhaustion, we're talking about deliberately driving one or more bottlenecks to saturation so that latency spikes > queues explode > retries amplify > availability collapses. Resource exhaustion attacks primarily target the A component of the CIA triad.

Common targets:

  • CPU (run queue saturation, hot loops, expensive parsing, catastrophic backtracking in regex).
  • Memory (heap growth, allocator pressure, page-cache contention OOM killer).
  • Kernel objects (file descriptors, sockets, epoll/kqueue entries, conntrack table).
  • Storage and I/O (sync writes, fsync storms, small-IO amplification).
  • Network (SYN backlog, accept queue, keep-alive hoarding, slow bodies).
  • Upstream services (DB connection pools, per-tenant quota), and denial-of-wallet (autoscalers reactively spin up cost).

Modern incidents are frequently L7 complexity or concurrency bombs that look like normal traffic until they hit the wrong code path.


How pressure propagates (host > kernel > runtime > app)

  • Kernel scheduling: sustained CPU > 80–90% with long run queues (load >> cores) starves housekeeping; iowait rises; timers fire late.
  • Memory pressure: reclaim churns; refaults spike; then OOM killer selects a victim; cgroups can contain blast radius per service. (Linux cgroup v2 is the modern control plane for per-service CPU/memory/pids/io limits.) (Kernel)
  • Sockets/FDs: accept queue/backlog caps, per-proc RLIMIT_NOFILE, and app worker pools form hard ceilings; once reached, errors devolve into resets and timeouts. Kernel caps like somaxconn and SYN cookies influence survivability during bursts. (docs.kernel.org)
  • Conntrack (NAT/stateful firewalls): state table fill > packet drops; it's memory-bound and must be right-sized. (Kernel)

Signals & first responders

Host/Kernel

  • CPU: mpstat 1, pidstat -u 1, run queue length (load), context switches.
  • Memory: vmstat 1, free -m, cgroup memory.events (oom_kill, pressure), OOM logs (dmesg -T | egrep -i 'killed process|oom'). (docs.memset.com)
  • I/O: iostat -xz 1, pidstat -d 1, disk util %, avg rq size.
  • FDs: lsof | wc -l, per-proc /proc/$pid/fd.
  • Sockets: ss -s, ss -Htan state syn-recv,established,time-wait,close-wait, SYN/accept queue ratios; backlog overruns show in dmesg with SYN cookies. (Red Hat Customer Portal)
  • Conntrack: /proc/sys/net/netfilter/nf_conntrack_count, conntrack -S for adds/ins/drops. (Kernel)

Service

  • Web: 5xx rate, surge in 499/408 (client timeouts), upstream 504s, queue wait in app thread pools, rising p95–p99 tail latencies, keep-alives held open.

Two canonical exhaustion patterns

We'll reproduce (A) host-level CPU/memory pressure with a synthetic stressor and (B) an algorithmic complexity bomb (ReDoS) inside your own service, then harden.

A) Host-level pressure with stress-ng in a container

stress-ng validates that your limits, monitors, and kill paths behave as expected when a noisy neighbour or a bug goes rogue.

# 1) Start a cgroup-contained sandbox (Docker) that mounts cgroup v2 on the host.
cat > docker-compose.yml <<'YML'
services:
  noisy:
    image: ubuntu:22.04
    command: ["sleep","infinity"]
    deploy: {}
    restart: unless-stopped
    ulimits:
      nofile: 65536
    tty: true
YML

docker compose up -d
docker exec -it $(docker compose ps -q noisy) bash

# 2) Inside container: install stress-ng and run controlled load
apt-get update && apt-get install -y stress-ng
# CPU burn across all cores for 60s
stress-ng --cpu 0 --timeout 60s
# Memory pressure ~90% of cgroup limit for 60s
stress-ng --vm 1 --vm-bytes 90% --timeout 60s

Now watch host metrics (top/vmstat), container stats, and cgroup events. If you're using cgroup v2, you can enforce ceilings:

# Replace <ctr-cgroup-path> with the container cgroup path under /sys/fs/cgroup/
cd /sys/fs/cgroup/<ctr-cgroup-path>/
echo "100000 200000" > cpu.max        # cap to 50% of one CPU
echo $((1024*1024*1024)) > memory.max # 1GiB hard cap
echo 1024 > pids.max                  # bound fork/threads
  • stress-ng is a standard tool for controlled CPU/VM/I/O stress (don't run privileged modes in prod). (Ubuntu Manpages)
  • cgroup v2 controllers (cpu.max, memory.max, pids.max) define enforceable per-service budgets and expose counters like memory.events for OOMs. (Kernel)

Pass criteria: the service remains responsive (degraded but alive), Garbage Collection doesn't spiral, and your platform contains the noisy container without system-wide OOM without killing off processes.


B) Application-level complexity: ReDoS

Many default regex engines do backtracking; certain patterns explode to exponential runtime with crafted inputs (ReDoS), burning CPU per request. The fix is either engine choice (RE2/Rust/.NET safe modes) or pattern discipline + timeouts.

Minimal FastAPI route with a risky pattern (run only locally):

# app.py
from fastapi import FastAPI, HTTPException
import re, time

app = FastAPI()
# Risky: nested quantifiers can catastrophically backtrack on certain inputs
BAD = re.compile(r'^(a+)+$')  # demo only

@app.post("/match")
def match(s: str):
    t0 = time.perf_counter()
    ok = BAD.match(s) is not None
    dt = time.perf_counter() - t0
    if dt > 0.5:
        # Simulate production safeguard: shed work when regex runs too long
        raise HTTPException(status_code=429, detail=f"regex timeout {dt:.3f}s")
    return {"matched": ok, "t": dt}

Run:

uvicorn app:app --host 0.0.0.0 --port 8080
# In another shell, post benign vs adversarial payloads (vary lengths to observe runtime growth)
curl -XPOST 'http://127.0.0.1:8080/match?s=aaaaab'    # fast
# Crafting pathological inputs locally will show runtime spikes; do not target third parties.

Mitigations:

  • Prefer linear-time regex engines (e.g., RE2 bindings) where possible, or safe regex libraries provided by your language.
  • Bound work: regex timeouts, request timeouts, and CPU quotas per worker; treat regex as executable code in reviews.

Production hardening (defense-in-depth)

1) Contain blast radius per service (cgroup v2 / systemd)

# /etc/systemd/system/api.service
[Service]
ExecStart=/usr/local/bin/api
# CPU & memory budgets
CPUQuota=200%             # 2 cores worth
MemoryHigh=1G
MemoryMax=1.5G
# Fork/thread/file descriptor ceilings
TasksMax=1000
LimitNOFILE=65536
# I/O politeness (when supported)
IOSchedulingClass=idle

[Install]
WantedBy=multi-user.target

2) Kernel survivability under burst/flood

Tune with care (prefer per-service limits first, kernel knobs are shared resources):

# /etc/sysctl.d/50-net-survivability.conf
net.core.somaxconn = 4096              # cap for listen() backlog ceiling
net.ipv4.tcp_max_syn_backlog = 4096    # SYN queue depth
net.ipv4.tcp_syncookies = 1            # enable SYN cookies on overflow
net.netfilter.nf_conntrack_max = 262144  # size conntrack table (stateful firewalls/NAT)
  • somaxconn is the kernel ceiling for accept backlog; apps requesting larger values are clamped
  • SYN cookies help when SYN backlog overflows (they’re a last-resort defense for SYN floods).
  • Conntrack table sizing is memory-sensitive; monitor nf_conntrack_count vs nf_conntrack_max.
Don’t just crank these up, every slot consumes RAM and can worsen CPU cache pressure. Track drops and memory overhead before/after.

3) Edge/L7: shed work early (Nginx examples)

Rate-limit & connection caps:

# ip-scoped leaky bucket at ~10 r/s with burst tolerance
limit_req_zone $binary_remote_addr zone=perip:10m rate=10r/s;

# limit simultaneous connections per client and per vhost
limit_conn_zone $binary_remote_addr zone=addr:10m;
limit_conn_zone $server_name zone=perhost:10m;

server {
  ...
  limit_req zone=perip burst=20 nodelay;
  limit_conn addr 20;
  limit_conn perhost 1000;
}

Drop slow request bodies / keep-alive hoarding:

http {
  client_body_timeout 10s;
  keepalive_timeout   15s;
  keepalive_requests  100;

  # prevent giant uploads from tying workers
  client_max_body_size 10m;
}
  • limit_req_zone/limit_req implement request-rate enforcement (leaky bucket).
  • limit_conn_zone/limit_conn cap concurrent connections per key (client/vhost).
  • client_body_timeout/client_max_body_size stop slow-body and jumbo-upload abuse. Defaults and semantics documented by NGINX.
  • On Apache, use mod_reqtimeout to drop slowloris-style reads.

4) Application-layer guardrails

  • Time-box regex, image parsing, decompression.
  • Enforce CPU quotas per worker.
  • Use token buckets and quotas per principal.
  • Implement circuit breakers (return 429/503 early).

5) Platform & topology

  • Terminate TLS close to clients.
  • Isolate tenants by cgroup or rate limit.
  • Control autoscaler surge to prevent denial-of-wallet.

Verification playbook

  • Host stress: stress-ng within limits, no node-wide OOM.
  • Slow body: curl --limit-rate 1 --data-binary @bigfile > Nginx drops at client_body_timeout.
  • ReDoS guardrails: FastAPI returns 429 after timeout.
  • Conntrack: watch nf_conntrack_count under burst to ensure no drops.

Cheat sheet

  • Put every service in cgroup v2 with CPUQuota/MemoryMax/TasksMax/LimitNOFILE.
  • At the edge: limit_req, limit_conn, client_body_timeout, client_max_body_size.
  • Kernel guardrails: somaxconn, tcp_max_syn_backlog, tcp_syncookies=1, nf_conntrack_max.
  • Avoid backtracking regex; use RE2; enforce timeouts; shed early.
  • Watch memory.events, OOM logs, socket state mix, conntrack count, and p95–p99 tails.

Closing take

Most mystery outages during spikes are actually queues. Fix the budget and the queue disappears. Enforce budgets at the kernel, edge, and app, measure the right things, and rehearse the failure. Your error budget and cloud bill will thank you.