Deployment & Operations
This guide covers running SCRUBR in production. See SECURITY.md
for the threat model and scrubr.example.yaml for the full
annotated config.
Modes
SCRUBR runs in one of three serving modes (chosen by config):
| Mode | How clients reach it | Config |
|---|---|---|
| Explicit endpoint (default) | Point the client base URL at SCRUBR and a listen_path route |
routes[].listen_path |
| TLS termination | Same, but over HTTPS | tls.enabled + cert/key |
| TLS interception (MITM) | Transparent: client trusts SCRUBR's CA; SCRUBR mints per-host certs and routes by Host |
intercept.enabled + CA + routes[].host |
Explicit-endpoint mode is the simplest and most robust — prefer it unless you need to intercept clients you can't reconfigure.
Quick start (explicit endpoint)
scrubr --config scrubr.yaml --listen 0.0.0.0:8080
routes:
- listen_path: "/openai"
upstream: "https://api.openai.com"
profile: openai
profiles:
openai:
scan_paths: ["messages[].content"]
stream_paths: ["choices[].delta.content"] # required for streaming responses
rules:
- { name: email, type: EMAIL, pattern: '[\w.+-]+@[\w.-]+\.\w+', priority: 50 }
Then point your app at http://scrubr:8080/openai/v1/chat/completions.
Streaming: set
stream_pathsfor any provider you stream from. Without it, a sentinel fragmented across SSEdata:events will not rehydrate. (choices[].delta.contentfor OpenAI,delta.textfor Anthropic.)
Onboarding safely (dry-run)
Run a new route in mode: dry-run first. SCRUBR forwards the original payload
but reports what it would mask via the x-scrubr-detected response header and
logs — validate coverage, then switch to enforce.
TLS termination
tls:
enabled: true
cert_path: /etc/scrubr/tls/cert.pem
key_path: /etc/scrubr/tls/key.pem
TLS interception (MITM)
- Create a CA (once) and distribute the cert to client trust stores:
sh openssl req -x509 -newkey ec -pkeyopt ec_paramgen_curve:prime256v1 \ -keyout ca.key -out ca.pem -days 3650 -nodes -subj "/CN=SCRUBR CA" - Configure interception and host-routed entries:
```yaml
intercept:
enabled: true
listen: "0.0.0.0:8443"
ca_cert_path: /etc/scrubr/ca/ca.pem
ca_key_path: /etc/scrubr/ca/ca.key # protect like a root signing key
routes:
- { host: "api.openai.com", upstream: "https://api.openai.com", profile: openai } ```
- Direct client traffic to SCRUBR. Two modes:
- SNI-transparent (
connect: false, default): redirect the hosts to SCRUBR via DNS/SNI; SCRUBR terminates TLS using the SNI. - CONNECT proxy (connect: true): clients set SCRUBR as their HTTP(S) proxy (HTTPS_PROXY=http://scrubr:8443). SCRUBR MITMs configured hosts and blind-tunnels everything else untouched.
For a complete "use SCRUBR as your OS HTTP proxy" walkthrough (CA setup script, trust-store install per OS, a ready-to-run config), see HTTP-PROXY.md.
The CA key can mint a cert for any host — restrict file permissions, keep it off shared storage, and rotate it. Use
intercept.upstream_ca_pathto trust an internal CA on the upstream side.
High availability (multi-node)
Run several SCRUBR instances behind a load balancer. For session scope to work
across nodes, use the Redis backend; give each node a distinct node_id:
flowchart LR
LB["Load balancer"] --> N0["SCRUBR · node_id 0"]
LB --> N1["SCRUBR · node_id 1"]
LB --> N2["SCRUBR · node_id N"]
N0 & N1 & N2 --> R[("Redis<br/>encrypted session maps")]
N0 & N1 & N2 --> P[["LLM provider"]]
sessions:
backend: redis
redis_url: "rediss://redis.internal:6379/"
encryption_key: "<high-entropy secret, identical on every node>"
node_id: 1 # 0..4095, unique per node
- Node ids partition the sentinel id space, so concurrent nodes never collide.
- Enable
encryption_keyso Redis holds only ciphertext; run Redis with AUTH+TLS. - Sticky sessions (route a conversation to one node) give the strongest ordering; without them, concurrent writes to the same session are last-write-wins per field.
Request scope needs no shared state — any node handles any request.
Health & observability
GET /healthz→200 ok(unauthenticated) for load-balancer liveness.- Response headers
x-scrubr-modeandx-scrubr-detected(counts/types only). - Logs are structured (
RUST_LOG=scrubr=info); they never contain secret values.
Audit
audit:
enabled: true
path: /var/log/scrubr/audit.jsonl
Verify integrity any time:
scrubr audit-verify /var/log/scrubr/audit.jsonl
# OK: N record(s) verified, chain intact (exit 0)
# TAMPERED: chain breaks at record seq K (exit 1)
Ship the file to append-only/WORM storage for compliance.
Full transaction log
For request/response auditing, enable the transaction log — one JSON line per
request with the masked provider-facing request and response bodies, a
correlation id (returned as x-scrubr-request-id), route/tenant/status, and
detection counts:
transactions:
enabled: true
path: /var/log/scrubr/transactions.jsonl
max_body_bytes: 65536
In enforce mode the captured bodies are secret-free (only sentinels). In dry-run mode nothing is masked, so records reflect the original content — protect the file accordingly.
Configuration reference
| Setting | Purpose |
|---|---|
routes[] |
inbound path (or host) → upstream + profile + optional policy overrides |
profiles{} |
scan_paths (request) / stream_paths (SSE response) per provider |
masking.{mode,style,scope,ttl,session_header} |
global policy defaults |
rules[], glossary[], entropy, ner |
detection (curated set: examples/common-rules.yaml) |
sources[] |
.env / secret-file / Vault (KV v2) ingestion |
auth, tenants[] |
client auth and multi-tenant policy |
sessions |
backend (memory/redis), encryption, node_id |
tls, intercept |
TLS termination / interception |
audit |
tamper-evident log |
Env: SCRUBR_CONFIG, SCRUBR_LISTEN, RUST_LOG. CLI: --config, --listen,
--version, demo, audit-verify <path>.
Containers
Each release publishes a multi-arch (amd64 + arm64) image to GHCR, packaging
the static musl binary into a minimal scratch image:
docker run --rm -p 8080:8080 -v "$PWD/scrubr.yaml:/etc/scrubr/scrubr.yaml:ro" \
ghcr.io/scrubr-dev/scrubr:latest --config /etc/scrubr/scrubr.yaml --listen 0.0.0.0:8080
Tags: :latest and :vX.Y.Z. To build from source locally instead, the
multi-stage Dockerfile compiles a static binary into a minimal image:
docker build -t scrubr ..
Kubernetes (Helm)
A Helm chart is published as an OCI artifact to GHCR on each release:
# Single instance, default config (dry-run reverse proxy to OpenAI).
helm install scrubr oci://ghcr.io/scrubr-dev/charts/scrubr --version X.Y.Z
# Your own config: put the scrubr.yaml contents under `config:` in values.yaml.
helm install scrubr oci://ghcr.io/scrubr-dev/charts/scrubr --version X.Y.Z -f my-values.yaml
The chart runs the hardened image (non-root, read-only rootfs), mounts the config
from a ConfigMap, exposes a ClusterIP Service on :8080, and wires /healthz
probes. helm test scrubr runs a health smoke test.
High availability
# Turnkey: bundle a single Redis (dependency-free, official image).
helm install scrubr oci://ghcr.io/scrubr-dev/charts/scrubr --version X.Y.Z \
--set ha.enabled=true --set replicaCount=3 \
--set redis.enabled=true --set redis.password=<secret> \
--set sessions.encryptionKey=<high-entropy secret> \
--set config.masking.scope=session
# Production: point at your own managed / clustered Redis instead.
# --set redis.url=rediss://scrubr:pass@redis:6379/0 (omit redis.enabled)
ha.enabled switches the workload to a StatefulSet so each pod gets a stable
ordinal, fed to SCRUBR as a distinct node_id (the id-space partition that keeps
concurrent nodes from colliding). Pods share session state via Redis, encrypted
at rest with your key. The chart also adds a PodDisruptionBudget and soft
anti-affinity; set autoscaling.enabled=true for an HPA. The topology matches
the diagram in High availability above.
The session store is either the bundled Redis (redis.enabled=true, with optional
redis.persistence.enabled) or an external one (redis.url); the Redis URL and
at-rest key are kept in a Secret and injected via secretKeyRef. See the
Deploy on Kubernetes guide
for the full walkthrough.
Requires Kubernetes ≥ 1.28 (the
apps.kubernetes.io/pod-indexdownward-API label).
The same wiring works without Helm via environment variables that override the
config's sessions block — handy for any orchestrator:
| Env | Overrides |
|---|---|
SCRUBR_NODE_ID |
sessions.node_id (0..4095) |
SCRUBR_REDIS_URL |
sessions.redis_url |
SCRUBR_ENCRYPTION_KEY |
sessions.encryption_key |
SCRUBR_SESSION_BACKEND |
sessions.backend (memory/redis) |