Skip to content

Report Anonymizer

Local LLM anonymizer for penetration-test reports. Drop a folder of PDFs, Office docs, Markdown or code: customer brand, real IPs, phone numbers, hardcoded credentials, advisory IDs, AD SIDs, cloud resource IDs and other identifying values are rewritten in place with plausible dummy values. Exploit code, payloads and shell output stay untouched. The pipeline talks to a local llama.cpp server, so nothing leaves your machine, and the shipped default preset runs comfortably on a regular laptop.

 Windows installer  AppImage  .deb  View on GitHub  Star on GitHub  Release notes

GPL-3.0  ·  398 tests, 3 Python versions on CI  ·  Runs locally on llama.cpp  ·  Windows installer · AppImage · .deb · one-line installer

Side-by-side native render: original report on the left with NimbusBoard / Lorenzo De Falco / 2026-Q1-NimbusBoard-Web highlighted, anonymized version on the right with VendorBoard / Marco Rossi / 2026-Q1-VendorBoard-Web. Layout, fonts and structure are preserved.
Side-by-side native render. Every customer-identifying value is rewritten in place; layout, fonts and structure are preserved.

Why use it

  • Local-only by design


    No telemetry, no cloud LLMs, no analytics. The only network endpoint ever contacted is huggingface.co, and only when you explicitly download a model. Substitution maps stay in your project folder.

  • Operator-in-the-loop


    The Review pane shows every candidate next to a live render of the anonymized output, not the original with overlay highlights. Approve, skip, edit or add custom words. What you see is what Apply will write.

  • Layout-preserving


    PDFs are redacted in place. Placeholders are length- and shape-preserving (ARN stays an ARN, hex stays hex, phone keeps its country code). Layout, fonts, even byte length stay close to the original.

  • 12 leak categories


    Brand, network, phones, emails, credentials, keys, headers, app packages, user agents, internal IDs, infra IDs (AWS / Azure / GCP / AD SIDs), proprietary URI schemes. Exploit code, payloads and tool output are deliberately left intact.

  • Runs on a regular laptop


    The shipped default preset is CPU-only: a 4 B-parameter model, ~2.5 GB on disk, ~1.5 GB of RAM in use. No GPU required. If you have one (6 GB VRAM is enough), the wizard picks a faster GPU preset for you, but the entry point stays a regular laptop.

  • Image redaction


    Every embedded image (PDF / DOCX / PPTX) gets a thumbnail in the Review » Images tab. Open the editor, paint blackout / blur / pixelate / text-overlay rectangles with a colour picker for the text overlay. The canvas shows the actual baked pixels as you draw, not a translucent placeholder. Same image_id across pages = single decision, applied to every occurrence at the original xref / shape position.

  • Detection mode picker


    A combo box right next to Run on the Pipeline tab lets the operator switch between Fast (one monolithic prompt covers all 12 categories per chunk, ~30 s / typical PDF on the 4B preset) and High accuracy (11 focused per-category prompts are run against every chunk and the candidate lists are merged). On the local 5-PDF bench multi-pass lifted F1 from 0.836 to 0.919, precision +0.12, recall +0.05, at the cost of roughly 5x more detector time. Same toggle from the CLI via --detector-mode single | multipass.


Get it

Four install paths. Same code in all of them; they differ only in how the runtime is brought to your disk.

  • Windows installer  


    Native Setup.exe for Windows 10 / 11. Bundles the embedded Python runtime, llama-server.exe (CPU / CUDA / Vulkan variants), pandoc, pdftotext and everything else. Pick the backend that matches your GPU at install time. Per-user install (no admin), one desktop shortcut, one Start-menu entry.

     Download Setup.exe  ·  338 MB  Install guide

  • AppImage  


    Single self-contained binary. No install, no root, no system Python. Bundles a portable interpreter, every Python dependency (PySide6, WeasyPrint), pandoc and pdftotext. Just chmod +x and run.

     Download AppImage  ·  410 MB

  • .deb (Debian / Ubuntu / Mint)


    Smaller download (240 KB). Runtime Python deps are pulled from PyPI by the postinstall hook. Integrates with apt, registers a desktop entry, adds report-anonymizer to your $PATH. Requires root to install.

     Download .deb  ·  240 KB

  • One-line installer (Linux / macOS)


    Per-user install under ~/.local/share/report-anonymizer, launcher in ~/.local/bin/. Detects missing system tools (pandoc, poppler-utils, Pango) and offers to install them via apt-get, dnf, pacman, zypper or brew. No root needed.

     View install.sh on GitHub

Double-click Report-Anonymizer-Setup-x64-1.0.0.exe. The Setup wizard detects your GPU, recommends the matching llama.cpp variant (CUDA / Vulkan / CPU) and bundles it together with the embedded Python runtime. A desktop and Start-menu entry are created; uninstall is registered with the OS. See the Windows install guide for the full walkthrough + screenshots.

chmod +x Report-Anonymizer-x86_64.AppImage
./Report-Anonymizer-x86_64.AppImage              # GUI
./Report-Anonymizer-x86_64.AppImage cli all in/ -o out/
sudo apt install ./report-anonymizer_1.0.0_amd64.deb
report-anonymizer
curl -fsSL https://raw.githubusercontent.com/nemmusu/report-anonymizer/master/install.sh | bash

AppImage doesn't open?

If you double-click the AppImage and nothing happens, your distro probably needs libfuse2 for AppImage's mount layer: sudo apt install libfuse2 (Debian / Ubuntu / Mint), sudo dnf install fuse-libs (Fedora / RHEL), sudo pacman -S fuse2 (Arch / Manjaro).


A 60-second tour

Pipeline view paused at Approve and promote: visual stepper with Scan and detect ✓, Approve and promote highlighted, progress bar at 30%; the run log shows 18 auto-promoted candidates and 2 needing review.
Single-click pipeline. Pauses at Approve & promote when the operator must decide; resumes through Apply / Build / Verify / Auto-resolve on its own.

Review tree on the left listing already-mapped, auto-promoted and pending rows by category; right pane is a live render of the anonymized PDF with VendorBoard already substituted in the title.
Unified Review tree. The right pane is a live render of the anonymized output, not the original with overlay highlights.

Image review tab: horizontal thumbnail strip on top, embedded editor showing a Burp request screenshot with a baked red blackout rectangle over the Authorization header. Toolbar: Select / Blackout / Blur / Pixelate / Text overlay; intensity / text / size spinboxes; font / background colour pickers.
Per-image editor with 4 tools. The canvas shows real baked pixels (not placeholder rectangles), so what you see is what Apply will write.

Preview of build tab: PDF.js viewer renders the post-anonymization document with selectable text. Bottom row: Back to text candidates, Back to images, Refresh preview, Build (primary).
Final confirmation gate before Apply. PDF.js viewer with native text selection and a one-click route back to text or image edits if something looks off.

Server panel with the preset gallery: cpu_only, default and the highlighted jackrong-qwen3.5-4b-distill-q4 card; quality score, VRAM-fit, downloaded badge and Use / Customize / Re-download / Set as default actions visible.
Preset gallery with Quality / Disk / VRAM-fit per card, command preview, one-click start/stop.

Model Manager on the Curated downloads tab: Jackrong Qwen3.5 4B Claude-Opus distill repo selected, Quality 78/100 badge, ~4.7 GB GPU need, ~3.1 min on the 5-PDF test, 10 GGUF variants with the 2.5 GB Q4_K_M file marked as recommended.
Curated GGUF catalog with Quality / VRAM / time-on-bench badges; resumable streaming downloads with a Queue tab.


Documentation

  • What it anonymizes


    The 12 leak categories the detector emits, with examples of the placeholders it produces and the (long) list of strings it deliberately leaves alone so the report's technical content keeps working.

  • Benchmarks


    Quality score, precision, recall and VRAM for the curated 5 presets. Plus the 24 below-cut models and the 4 architecturally incompatible ones, with root-cause notes.

  • Architecture


    Pipeline data flow, on-disk schema (manifests, substitution maps, applied substitutions, decisions log) and the per-stage cancellation contract.

  • Presets


    The shipped server profiles, how to pick one for your hardware, and how to customise a preset (per-user or per-project scope).

  • FAQ


    Common questions: diff cache, OCR scope, offline-mode behaviour, format adapters, HIPAA / GDPR scope.

  • Windows install


    Step-by-step walkthrough of the Setup.exe wizard: variant pick (CPU / CUDA / Vulkan), where files land, uninstall flow, keep-user-data prompt.

  • Contributing


    Development setup, code style, what we look for in a PR, how to add a format adapter.


Source on GitHub  ·  Releases  ·  Open an issue  ·  GPL-3.0