Documentation
@raeven-co/sether
Streaming PII redaction for AI applications. Tokenise sensitive data before it reaches any LLM, restore it in the reply. MIT licensed, one dependency, Node 18+.
Overview
Sether sits between your application and any LLM API — OpenAI, Anthropic, Gemini, Bedrock, your own fine-tunes. It detects sensitive data, swaps each match for a stable token like <EMAIL_…> before the request leaves your boundary, then restore() swaps the originals back into the response. Your application code never has to branch on redacted text.
Detection is deterministic (validated patterns — Luhn, mod-97, SSA blacklists — not ML guesses), streaming-safe (chunk-boundary correct, property-tested), and local (no network calls; the package has a single dependency, libphonenumber-js).
Install
npm install @raeven-co/setherNode 18+. ESM and CommonJS both supported. The package makes zero network calls and ships pure-ASCII bundles (supply-chain scanner friendly).
Quickstart
import { Sether } from '@raeven-co/sether';
import { Readable } from 'node:stream';
const sether = new Sether();
// 1. Outgoing request — redact before it reaches the LLM
const safeForLLM = Readable
.from(['my email is alice@example.com'])
.pipe(sether.redact());
// → "my email is <EMAIL_…>"
// 2. LLM response — restore before it reaches your user
const safeForUser = llmResponseStream.pipe(sether.restore());
// → "Confirmation sent to alice@example.com"The same Sether instance shares its vault between redact() and restore() — that is how the round-trip identity is preserved across streaming chunks. One instance per request/conversation is the standard pattern.
For non-streaming text there is a synchronous one-shot: redactSync(text, { detectors, vault }).
Detectors (basic pack)
new Sether() runs the basic pack by default. Pass an explicit list to narrow the scope:
import { Sether, emailDetector, ssnDetector } from '@raeven-co/sether';
const sether = new Sether({
detectors: [emailDetector, ssnDetector], // only these two
});| Export | Token type | Method |
|---|---|---|
| emailDetector | RFC 5321-style regex. ASCII local parts. | |
| phoneDetector | PHONE | libphonenumber-js — international parsing. |
| creditCardDetector | CC | Bounded regex + Luhn check. ReDoS-safe. |
| ssnDetector | SSN | Regex + SSA invalid-prefix blacklist. |
| ipv4Detector | IPV4 | Strict octet-bounded (0–255, no leading zeros). |
| ipv6Detector | IPV6 | Candidate regex + in-tree isIPv6 validator. |
| ibanDetector | IBAN | Regex + ISO 13616 mod-97 checksum. |
Custom detectors are plain objects: { type: string; detect(text): { start, end, value }[] } — pure, synchronous, deterministic.
Identity pack (opt-in)
Names, dates of birth, passport numbers, and addresses have no self-validating shape, so a bare regex would be a false-positive machine. The identity pack uses label-anchored detection — it redacts a value only when it appears with the label that introduces it (Name:, DOB:, Passport No:, Address:), in many languages: Latin-script labels plus CJK, Cyrillic, and Arabic (名前:, Имя:, الاسم:).
import { Sether, basicDetectors, identityDetectors } from '@raeven-co/sether';
const sether = new Sether({
detectors: [...basicDetectors, ...identityDetectors],
});Exports: nameDetector, dobDetector, passportDetector, addressDetector. Unlabelled names in free prose are the job of free-text NER.
Secrets pack (opt-in)
Catches leaked credentials before they reach a third-party model — useful when users paste config files or logs into AI features.
import { Sether, basicDetectors, secretsDetectors } from '@raeven-co/sether';
const sether = new Sether({
detectors: [...basicDetectors, ...secretsDetectors],
});| Export | Token type | Method |
|---|---|---|
| awsAccessKeyDetector | AWS_ACCESS_KEY | AKIA-prefixed access key IDs. |
| openaiKeyDetector | OPENAI_KEY | sk- / sk-proj- API keys. |
| anthropicKeyDetector | ANTHROPIC_KEY | sk-ant- API keys. |
| githubPatDetector | GITHUB_PAT | ghp_ and github_pat_ tokens. |
| slackTokenDetector | SLACK_TOKEN | xox[baprs]- tokens. |
| stripeKeyDetector | STRIPE_KEY | sk_live / pk_live / test keys. |
| jwtDetector | JWT | Three-part base64url tokens. |
| highEntropyDetector | HIGH_ENTROPY | Long high-entropy strings (conservative). |
Streaming & SSE
sether.redact() / sether.restore() are Node Transform streams — chunk-boundary safe, so a pattern split across chunks (foo@ + bar.com) is still caught. For LLM providers that stream Server-Sent Events, the SSE-aware variants redact/restore inside data: payloads while preserving framing:
import { createSSERedactStream, createSSERestoreStream } from '@raeven-co/sether';
providerResponse.body
.pipe(createSSERestoreStream({ vault: sether.vault }))
.pipe(clientResponse);Lower-level building blocks: createRedactStream, createRestoreStream — same options, bring your own vault.
Middlewares
Drop-in wrappers so you don't hand-wire the streams:
import OpenAI from 'openai';
import Anthropic from '@anthropic-ai/sdk';
import {
wrapFetch, wrapOpenAI, wrapAnthropic, createExpressMiddleware,
} from '@raeven-co/sether';
// fetch — redacts request bodies to AI hosts, restores responses
const safeFetch = wrapFetch(fetch, { sether });
// OpenAI / Anthropic SDKs — pass YOUR client instance; Sether never
// imports either SDK (structurally typed, zero peer dependencies)
const openai = wrapOpenAI(new OpenAI(), { sether });
const anthropic = wrapAnthropic(new Anthropic(), { sether });
// Express — redact inbound req bodies / restore outbound
app.use(createExpressMiddleware({ sether }));Audit events & regulation mappings
Every redaction can emit a structured AuditEvent — detector type, token, value length (never the value), and the regulation references the redaction evidences (GDPR Art. 28, SOC 2 CC6.7, HIPAA §164.514, PCI DSS Req. 3.4, EU AI Act, NDPA) via DEFAULT_REGULATION_MAPPINGS.
import { Sether, ConsoleAuditSink, MemoryAuditSink } from '@raeven-co/sether';
const sink = new MemoryAuditSink();
const sether = new Sether({ auditSink: sink });
// …after a request:
sink.events; // → [{ detectorType: 'EMAIL', token: '<EMAIL_…>', valueLength: 17, regulations: […] }]The events deliberately never contain the redacted value — the log itself is your data-minimisation proof. Persistent storage, tamper-evident chaining, and PDF export are the hosted tier's job.
The vault
The vault is the token → original-value map. Default: MemoryVault (in-process, LRU-bounded). It implements a five-method interface — set / get / has / delete / clear — so you can back it with Redis for multi-instance deployments. The vault stays inside your infrastructure; tokens are useless without it.
Browser entry
The package root uses Node streams and cannot be bundled for the browser. @raeven-co/sether/browser (0.5.0+) ships only the pure detection surface — every detector pack plus DEFAULT_REGULATION_MAPPINGS, no node: imports — and powers both the live sandbox and the Sether Shield extension:
import { basicDetectors, identityDetectors } from '@raeven-co/sether/browser';
const matches = [...basicDetectors, ...identityDetectors]
.flatMap((d) => d.detect(text));Free-text NER (@raeven-co/sether-ner)
Unlabelled names, organisations, and locations in running prose need a model, not a regex. That ships as a separate, lazy-loaded package so the core stays small. NER runs on the outbound prompt (sentence-buffered, async) — never on the streaming restore path — and mints the same <TYPE_…> tokens, so restore() works unchanged:
npm install @raeven-co/sether-ner @huggingface/transformersimport { Sether, redactSync, basicDetectors } from '@raeven-co/sether';
import { createNerRedactor } from '@raeven-co/sether-ner';
const sether = new Sether();
const ner = createNerRedactor(); // lazy-loads the model on first call
const { redacted } = await ner.redact(prompt, { vault: sether.vault });
const safe = redactSync(redacted, { detectors: basicDetectors, vault: sether.vault });
// send `safe` to the LLM — sether.restore() swaps BOTH token sets backFirst call downloads the model (~30 MB) — call ner.warmup() at boot. Bring your own model via the infer option.
Sether Shield (browser extension)
The zero-code option for individuals: a Chrome extension that catches personal data in your prompt on ChatGPT, Claude, and Gemini and scrubs it in one click — 100% locally, with no network calls and a single storage permission. It runs the same detector packs via the browser entry above.
Security
Found a vulnerability? Please don't open a public issue — email emorylebo@gmail.com or use GitHub private security advisories. We acknowledge within 48 hours. CI runs a safe-regex2 ReDoS scan (161 patterns, 0 unsafe), an ASCII-only dist gate, and npm provenance attestation on every release. Full policy: SECURITY.md.