Documentation

@raeven-co/sether

Streaming PII redaction for AI applications. Tokenise sensitive data before it reaches any LLM, restore it in the reply. MIT licensed, one dependency, Node 18+.

v0.5.x134 testsReDoS-scanned · 0 unsafeESM + CJS1 dependency

Overview

Sether sits between your application and any LLM API — OpenAI, Anthropic, Gemini, Bedrock, your own fine-tunes. It detects sensitive data, swaps each match for a stable token like <EMAIL_…> before the request leaves your boundary, then restore() swaps the originals back into the response. Your application code never has to branch on redacted text.

Detection is deterministic (validated patterns — Luhn, mod-97, SSA blacklists — not ML guesses), streaming-safe (chunk-boundary correct, property-tested), and local (no network calls; the package has a single dependency, libphonenumber-js).

Install

npm install @raeven-co/sether

Node 18+. ESM and CommonJS both supported. The package makes zero network calls and ships pure-ASCII bundles (supply-chain scanner friendly).

Quickstart

import { Sether } from '@raeven-co/sether';
import { Readable } from 'node:stream';

const sether = new Sether();

// 1. Outgoing request — redact before it reaches the LLM
const safeForLLM = Readable
  .from(['my email is alice@example.com'])
  .pipe(sether.redact());
// → "my email is <EMAIL_…>"

// 2. LLM response — restore before it reaches your user
const safeForUser = llmResponseStream.pipe(sether.restore());
// → "Confirmation sent to alice@example.com"

The same Sether instance shares its vault between redact() and restore() — that is how the round-trip identity is preserved across streaming chunks. One instance per request/conversation is the standard pattern.

For non-streaming text there is a synchronous one-shot: redactSync(text, { detectors, vault }).

Detectors (basic pack)

new Sether() runs the basic pack by default. Pass an explicit list to narrow the scope:

import { Sether, emailDetector, ssnDetector } from '@raeven-co/sether';

const sether = new Sether({
  detectors: [emailDetector, ssnDetector], // only these two
});
ExportToken typeMethod
emailDetectorEMAILRFC 5321-style regex. ASCII local parts.
phoneDetectorPHONElibphonenumber-js — international parsing.
creditCardDetectorCCBounded regex + Luhn check. ReDoS-safe.
ssnDetectorSSNRegex + SSA invalid-prefix blacklist.
ipv4DetectorIPV4Strict octet-bounded (0–255, no leading zeros).
ipv6DetectorIPV6Candidate regex + in-tree isIPv6 validator.
ibanDetectorIBANRegex + ISO 13616 mod-97 checksum.

Custom detectors are plain objects: { type: string; detect(text): { start, end, value }[] } — pure, synchronous, deterministic.

Identity pack (opt-in)

Names, dates of birth, passport numbers, and addresses have no self-validating shape, so a bare regex would be a false-positive machine. The identity pack uses label-anchored detection — it redacts a value only when it appears with the label that introduces it (Name:, DOB:, Passport No:, Address:), in many languages: Latin-script labels plus CJK, Cyrillic, and Arabic (名前:, Имя:, الاسم:).

import { Sether, basicDetectors, identityDetectors } from '@raeven-co/sether';

const sether = new Sether({
  detectors: [...basicDetectors, ...identityDetectors],
});

Exports: nameDetector, dobDetector, passportDetector, addressDetector. Unlabelled names in free prose are the job of free-text NER.

Secrets pack (opt-in)

Catches leaked credentials before they reach a third-party model — useful when users paste config files or logs into AI features.

import { Sether, basicDetectors, secretsDetectors } from '@raeven-co/sether';

const sether = new Sether({
  detectors: [...basicDetectors, ...secretsDetectors],
});
ExportToken typeMethod
awsAccessKeyDetectorAWS_ACCESS_KEYAKIA-prefixed access key IDs.
openaiKeyDetectorOPENAI_KEYsk- / sk-proj- API keys.
anthropicKeyDetectorANTHROPIC_KEYsk-ant- API keys.
githubPatDetectorGITHUB_PATghp_ and github_pat_ tokens.
slackTokenDetectorSLACK_TOKENxox[baprs]- tokens.
stripeKeyDetectorSTRIPE_KEYsk_live / pk_live / test keys.
jwtDetectorJWTThree-part base64url tokens.
highEntropyDetectorHIGH_ENTROPYLong high-entropy strings (conservative).

Streaming & SSE

sether.redact() / sether.restore() are Node Transform streams — chunk-boundary safe, so a pattern split across chunks (foo@ + bar.com) is still caught. For LLM providers that stream Server-Sent Events, the SSE-aware variants redact/restore inside data: payloads while preserving framing:

import { createSSERedactStream, createSSERestoreStream } from '@raeven-co/sether';

providerResponse.body
  .pipe(createSSERestoreStream({ vault: sether.vault }))
  .pipe(clientResponse);

Lower-level building blocks: createRedactStream, createRestoreStream — same options, bring your own vault.

Middlewares

Drop-in wrappers so you don't hand-wire the streams:

import OpenAI from 'openai';
import Anthropic from '@anthropic-ai/sdk';
import {
  wrapFetch, wrapOpenAI, wrapAnthropic, createExpressMiddleware,
} from '@raeven-co/sether';

// fetch — redacts request bodies to AI hosts, restores responses
const safeFetch = wrapFetch(fetch, { sether });

// OpenAI / Anthropic SDKs — pass YOUR client instance; Sether never
// imports either SDK (structurally typed, zero peer dependencies)
const openai = wrapOpenAI(new OpenAI(), { sether });
const anthropic = wrapAnthropic(new Anthropic(), { sether });

// Express — redact inbound req bodies / restore outbound
app.use(createExpressMiddleware({ sether }));

Audit events & regulation mappings

Every redaction can emit a structured AuditEvent — detector type, token, value length (never the value), and the regulation references the redaction evidences (GDPR Art. 28, SOC 2 CC6.7, HIPAA §164.514, PCI DSS Req. 3.4, EU AI Act, NDPA) via DEFAULT_REGULATION_MAPPINGS.

import { Sether, ConsoleAuditSink, MemoryAuditSink } from '@raeven-co/sether';

const sink = new MemoryAuditSink();
const sether = new Sether({ auditSink: sink });
// …after a request:
sink.events; // → [{ detectorType: 'EMAIL', token: '<EMAIL_…>', valueLength: 17, regulations: […] }]

The events deliberately never contain the redacted value — the log itself is your data-minimisation proof. Persistent storage, tamper-evident chaining, and PDF export are the hosted tier's job.

The vault

The vault is the token → original-value map. Default: MemoryVault (in-process, LRU-bounded). It implements a five-method interface — set / get / has / delete / clear — so you can back it with Redis for multi-instance deployments. The vault stays inside your infrastructure; tokens are useless without it.

Browser entry

The package root uses Node streams and cannot be bundled for the browser. @raeven-co/sether/browser (0.5.0+) ships only the pure detection surface — every detector pack plus DEFAULT_REGULATION_MAPPINGS, no node: imports — and powers both the live sandbox and the Sether Shield extension:

import { basicDetectors, identityDetectors } from '@raeven-co/sether/browser';

const matches = [...basicDetectors, ...identityDetectors]
  .flatMap((d) => d.detect(text));

Free-text NER (@raeven-co/sether-ner)

Unlabelled names, organisations, and locations in running prose need a model, not a regex. That ships as a separate, lazy-loaded package so the core stays small. NER runs on the outbound prompt (sentence-buffered, async) — never on the streaming restore path — and mints the same <TYPE_…> tokens, so restore() works unchanged:

npm install @raeven-co/sether-ner @huggingface/transformers
import { Sether, redactSync, basicDetectors } from '@raeven-co/sether';
import { createNerRedactor } from '@raeven-co/sether-ner';

const sether = new Sether();
const ner = createNerRedactor(); // lazy-loads the model on first call

const { redacted } = await ner.redact(prompt, { vault: sether.vault });
const safe = redactSync(redacted, { detectors: basicDetectors, vault: sether.vault });
// send `safe` to the LLM — sether.restore() swaps BOTH token sets back

First call downloads the model (~30 MB) — call ner.warmup() at boot. Bring your own model via the infer option.

Sether Shield (browser extension)

The zero-code option for individuals: a Chrome extension that catches personal data in your prompt on ChatGPT, Claude, and Gemini and scrubs it in one click — 100% locally, with no network calls and a single storage permission. It runs the same detector packs via the browser entry above.

Get Sether Shield →

Security

Found a vulnerability? Please don't open a public issue — email emorylebo@gmail.com or use GitHub private security advisories. We acknowledge within 48 hours. CI runs a safe-regex2 ReDoS scan (161 patterns, 0 unsafe), an ASCII-only dist gate, and npm provenance attestation on every release. Full policy: SECURITY.md.