Configuration Reference
SCRUBR reads a single YAML config (default scrubr.example.yaml; override with --config
or SCRUBR_CONFIG). All sections are optional and default to off/empty. The annotated
scrubr.example.yaml is a working starting point;
examples/common-rules.yaml is a ready-made detection
ruleset.
Config is compiled once into immutable matcher artifacts and hot-reloaded atomically when the file (or a watched secret file) changes; a bad edit keeps the last good config.
routes·profiles·maskingrules·glossary·entropy·nersources·auth·tenantssessions·tls·interceptaudit·transactions- CLI & environment
routes
Maps inbound requests to an upstream. In normal (path) mode a route matches by
listen_path prefix; in interception mode it matches by host.
| Key | Type | Default | Notes |
|---|---|---|---|
listen_path |
string | — | Inbound path prefix, e.g. /openai. Stripped before forwarding. |
host |
string | — | Host matched in interception mode (e.g. api.openai.com). |
upstream |
string | — | Base upstream URL, e.g. https://api.openai.com. |
profile |
string | — | Name of a profiles entry. |
mode |
enum | global | Per-route override of masking.mode (enforce/dry-run). |
scope |
enum | global | Per-route override of masking.scope (request/session). |
style |
enum | global | Per-route override of masking.style. |
routes:
- { listen_path: "/openai", upstream: "https://api.openai.com", profile: openai }
- { listen_path: "/canary", upstream: "https://api.openai.com", profile: openai, mode: dry-run }
- { host: "api.openai.com", upstream: "https://api.openai.com", profile: openai } # interception
A route in
enforcemode with noscan_pathslogs a warning (it would pass requests through unmasked).
profiles
Provider-aware content paths. A path is dot-separated; a [] suffix descends into every
array element. Only string leaves are masked/rehydrated. The special path "**" scans
every string leaf in the body (comprehensive, opt-in — favors recall over the
provider-aware minimalism, so validate for false positives).
| Key | Type | Notes |
|---|---|---|
scan_paths |
string[] | Request JSON paths to mask, e.g. messages[].content; or ["**"] for all string leaves. |
stream_paths |
string[] | SSE response content paths to rehydrate per event, e.g. choices[].delta.content (OpenAI), delta.text (Anthropic). Required for streaming; ["**"] rehydrates all leaves. |
profiles:
openai:
scan_paths: ["messages[].content", "messages[].tool_calls[].function.arguments"]
stream_paths: ["choices[].delta.content"]
masking
Global masking policy (each field overridable per route/tenant).
| Key | Type | Default | Notes |
|---|---|---|---|
mode |
enforce | dry-run |
enforce |
Dry-run forwards the original and only reports. In enforce mode a JSON-typed body that fails to parse is rejected (422), not forwarded unmasked. |
style |
typed-sentinel | bare-sentinel |
typed-sentinel |
⟦S:EMAIL·id·tag⟧ vs ⟦S·id·tag⟧ (the tag is a keyed MAC that authenticates the sentinel). |
scope |
request | session |
request |
Session scope gives stable pseudonyms across a conversation. |
ttl |
duration | 30m |
Session idle timeout (45s, 30m, 1h, 90). |
session_header |
string | x-scrubr-session |
Request header identifying a session. |
rules
Regex detection rules (compiled into one meta-engine). Patterns use Rust regex syntax; write them as YAML single-quoted scalars so backslashes are literal.
| Key | Type | Notes |
|---|---|---|
name |
string | Label. |
type |
string | Entity type shown in the sentinel (e.g. EMAIL, AWS_KEY). |
pattern |
string | Regex. |
priority |
int | Higher wins on overlap. |
rules:
- { name: aws_key, type: AWS_KEY, pattern: '\bAKIA[0-9A-Z]{16}\b', priority: 95 }
See examples/common-rules.yaml for a curated set.
glossary
Literal terms (Aho-Corasick). Same matcher as secret sources.
glossary:
- { term: "Project Hufflepuff", type: CODENAME, priority: 100 }
entropy
Generic high-entropy secret catcher. Off by default; low priority so named rules win.
| Key | Type | Default |
|---|---|---|
enabled |
bool | false |
min_bits |
float | 3.5 (bits/char) |
min_len |
int | 20 |
priority |
int | 10 |
entity_type |
string | SECRET |
ner
Heuristic person-name detection (not a trained model; conservative). Off by default.
| Key | Type | Default |
|---|---|---|
enabled |
bool | false |
entity_type |
string | PERSON |
priority |
int | 30 |
names |
string[] | extra first names beyond the built-in gazetteer |
sources
External secret values pulled at startup/reload and masked (same automaton as the
glossary). Each entry has a kind.
dotenv — each KEY=VALUE line contributes VALUE.
file — each non-empty, non-comment line is a literal secret.
| Key | Default | Notes |
|---|---|---|
path |
— | File path (relative to the config dir). |
entity_type |
SECRET |
|
priority |
80 |
|
min_len |
5 |
Skip values shorter than this. |
vault — HashiCorp Vault KV v2. Token resolution: token → token_path file →
token_env (default VAULT_TOKEN). Pulled at startup/reload (not polled).
| Key | Default | Notes |
|---|---|---|
address |
— | e.g. https://vault.internal:8200. |
mount |
secret |
KV v2 mount. |
paths |
— | Secret paths under the mount. |
token / token_path / token_env |
— | Token sources, in that order. |
entity_type, priority, min_len |
as above |
sources:
- { kind: dotenv, path: ".env" }
- { kind: file, path: "secrets.txt", min_len: 6 }
- kind: vault
address: "https://vault.internal:8200"
paths: ["app/prod", "shared/api-keys"]
token_env: "VAULT_TOKEN"
auth
API-key gate on the proxy itself (keys compared in constant time; never forwarded
upstream). Required automatically when tenants are defined.
| Key | Default | Notes |
|---|---|---|
enabled |
false |
|
header |
x-scrubr-key |
Header carrying the key. |
keys |
[] |
Accepted keys. |
/healthz is always reachable without a key.
tenants
Multi-tenant policy: a client key identifies a tenant with its own policy, private glossary, and isolated session namespace (precedence: tenant > route > global).
| Key | Notes |
|---|---|
id |
Tenant id (used in logs/audit + session namespace). |
keys |
Client keys mapping to this tenant. |
mode / scope / style |
Optional policy overrides. |
glossary |
Tenant-private terms (masked only for this tenant). |
tenants:
- { id: acme, keys: ["acme-key"], scope: session, glossary: [ { term: "Falcon", type: CODENAME, priority: 100 } ] }
- { id: globex, keys: ["globex-key"], mode: dry-run }
sessions
Where session reverse-maps live.
| Key | Default | Notes |
|---|---|---|
backend |
memory |
memory (single node) or redis (cross-node). |
redis_url |
— | e.g. rediss://redis.internal:6379/. |
encryption_key |
— | Passphrase → AES-256-GCM at rest; required for secret-free Redis. Also derives the per-session sentinel MAC key, so set it on multi-node clusters or a session's sentinels won't rehydrate across nodes. |
node_id |
random | 0..4095, distinct per node; partitions the id space. |
tls
Terminate client HTTPS at the proxy (else plain HTTP). rustls + ring.
| Key | Default |
|---|---|
enabled |
false |
cert_path / key_path |
— (PEM) |
intercept
TLS interception (MITM): mint a per-host cert from a CA and route by Host. Clients must
trust the CA. The CA key can mint any cert — protect it like a root key.
| Key | Default | Notes |
|---|---|---|
enabled |
false |
|
connect |
false |
false = SNI-transparent; true = CONNECT proxy. |
listen |
--listen |
Interception endpoint. |
ca_cert_path / ca_key_path |
— | PEM CA used to mint leaf certs. |
upstream_ca_path |
— | Extra CA the proxy trusts for upstream connections. |
See DEPLOYMENT.md → TLS interception.
audit
Tamper-evident, append-only audit log (counts/types only — never values). Verify with
scrubr audit-verify <path>.
| Key | Default |
|---|---|
enabled |
false |
path |
scrubr-audit.jsonl |
transactions
Full request/response log of the masked provider-facing exchange (secret-free in
enforce mode). Each request also returns a x-scrubr-request-id header.
| Key | Default | Notes |
|---|---|---|
enabled |
false |
|
path |
scrubr-transactions.jsonl |
|
max_body_bytes |
65536 |
Per-body capture limit (truncated beyond). |
CLI & environment
scrubr [--config <path>] [--listen <addr>] # start the proxy
scrubr --version # print version
scrubr demo # offline mask → rehydrate round-trip
scrubr audit-verify <path> # verify an audit log's hash chain
| Env | Purpose |
|---|---|
SCRUBR_CONFIG |
config path (overridden by --config) |
SCRUBR_LISTEN |
listen address (overridden by --listen) |
RUST_LOG |
log filter, e.g. scrubr=info |
VAULT_TOKEN |
default Vault token for vault sources |
SCRUBR_SESSION_BACKEND |
override sessions.backend (memory/redis) |
SCRUBR_REDIS_URL |
override sessions.redis_url |
SCRUBR_ENCRYPTION_KEY |
override sessions.encryption_key (at-rest) |
SCRUBR_NODE_ID |
override sessions.node_id (0..4095; e.g. a pod ordinal) |
The SCRUBR_* session overrides let an orchestrator inject per-instance cluster
settings without templating the config — see
Deployment → Kubernetes.