Security Policy
SCRUBR is a security tool: it sits in the data path and handles plaintext secrets and PII. Treat it as a high-value component and deploy it accordingly.
Reporting a vulnerability
Please report security issues privately — do not open a public issue.
- Email: security@scrubr.example (replace with your project contact)
- Include: affected version, a description, and reproduction steps or a PoC.
We aim to acknowledge within 3 business days and to provide a remediation timeline after triage. Coordinated disclosure is appreciated; please give us a reasonable window before public disclosure.
Supported versions
The latest 1.x release receives security fixes.
Threat model
What SCRUBR protects
- Data minimization to the provider. With masking enforced, the upstream LLM provider receives only opaque sentinels — never the original secrets/PII. This is the core, attestable property.
- Reversibility integrity. A masked value round-trips losslessly, including across streamed (SSE) responses, or is left verbatim — it is never silently corrupted or mis-rehydrated.
Trust boundary
SCRUBR necessarily sees plaintext request/response content (it is the masking broker). Run it inside your trust boundary, on hosts and networks you control, with least-privilege access. Anyone who can read SCRUBR's memory, its config, its secret sources, or (for the Redis backend) the session store can see secrets.
Sentinels are authenticated; session keys are still bearer secrets. Every
sentinel carries a per-vault keyed MAC tag (⟦S:TYPE·id·tag⟧), so a hostile or
compromised upstream cannot forge or blindly enumerate sentinels (⟦S·0⟧,
⟦S·1⟧, …) to read the vault — only sentinels SCRUBR actually issued rehydrate.
What remains inherent to reversibility: with scope: session, everyone presenting
the same session-header value shares one vault, and an upstream that received a
sentinel earlier in the session can replay it — which re-reveals that value to the
session owner (not to the upstream). So still use one session per
user/trust-unit, make session keys unguessable, and don't mix different
users' secrets under one session key. (Request scope confines everything to the
caller's own current request.) For cross-node sessions the tag key is derived from
sessions.encryption_key, so set it — otherwise nodes can't agree on tags and a
session's sentinels won't rehydrate on another node.
Sensitive material and how it is handled
- In-memory vaults (request/session mappings) are zeroized on drop; session scope is bounded by TTL.
- Redis-backed sessions persist mappings off-process. Enable
sessions.encryption_key(AES-256-GCM) so the store holds only ciphertext, and run Redis with AUTH + TLS on a private network. Give every node a distinctsessions.node_id(the Helm chart derives it from the pod ordinal) — colliding ids share an id space and corrupt sessions. Run Redis HA: a transient read failure is surfaced loudly but can still corrupt a session's mappings for that request. - The interception CA key is the most dangerous secret in the system — it can
mint a trusted certificate for any host. Protect
intercept.ca_key_pathwith the same rigor as a root signing key (restricted FS permissions, ideally an HSM/KMS in production), and scope the CA's distribution to managed clients only. - Auth keys are compared as fixed-length SHA-256 digests in constant time (revealing neither which key matched nor any key's length) and never forwarded upstream.
- Audit log is hash-chained and tamper-evident (
scrubr audit-verify), but it is a local file: protect it and consider shipping to append-only/WORM storage. It records detection counts and types only — never values. - Transaction log (optional) captures the masked provider-facing request and
response — secret-free in enforce mode. In dry-run mode nothing is masked, so
records contain original content; protect the file and avoid dry-run + transactions
outside a trusted boundary. Audit and transaction logs are created
0600(owner-only) on Unix.
Network egress
- Upstream redirects are never followed. A 3xx from the upstream is passed through to the client, so a compromised/malicious upstream cannot redirect SCRUBR to an internal service or metadata endpoint (SSRF), nor cause SCRUBR to rehydrate an attacker-chosen target's response with the client's secrets. The Vault connector likewise never follows redirects (its token can't leak to another host).
- The CONNECT proxy is not an open relay to internal hosts. Blind tunnels
refuse loopback and link-local targets (blocking the cloud metadata endpoint at
169.254.169.254and localhost pivots), and connect to the exact vetted IP. Still, bind the proxy to trusted networks — it will relay to arbitrary public hosts by design. - Certificate minting is bounded to configured interception hosts, so an attacker cannot force unbounded key-generation with arbitrary SNI values.
Operational guidance
- Terminate client TLS at SCRUBR (
tls) or run it behind a TLS terminator; the plain-HTTP listener is for trusted local networks only. - Start with dry-run mode to validate detection coverage before enforcing.
- Bias detection toward recall for secret/PII categories — a false negative (a leak) is worse than a false positive (a degraded prompt).
- Rotate auth keys and the interception CA on a schedule.
Known limitations (as of 1.0)
- Masking covers configured JSON content paths. In enforce mode a JSON-typed
body that does not parse is rejected (422) rather than forwarded, and a
profile can set
scan_paths: ["**"]to scan every string leaf. Still, a body sent with a non-JSON content type, or a secret no rule matches, passes through: SCRUBR prevents leakage in well-formed provider requests; it is not a DLP control against a client deliberately exfiltrating over an unscanned channel. - The at-rest
encryption_keyis derived via SHA-256, not a password-stretching KDF — use a high-entropy key (it is a shared cluster secret, not a password). - The audit hash-chain detects edits and mid-file deletions, but truncation of the most recent records is not self-evident — ship to append-only/WORM storage if that matters.
- Auth keys are static (no built-in rotation/expiry).
- The heuristic NER is not a trained model; it favors precision and will miss many names. Use it as defense-in-depth, not a sole PII control.
- Concurrent cross-node writes to the same session are last-write-wins per field; sticky sessions are recommended for strict ordering.
- Audit writes are synchronous (durability over throughput, by design).