Server presets¶
Shipped presets live in
config/server_profiles.yml.
User additions go to ~/.config/document-anonymizer/server.yml.
Per-project overrides go to <output_dir>/.anon/server.yml.
Resolution order: builtin ➜ user ➜ project (last wins).
Built-in catalog¶
| Preset | Model | Quality | Ctx | Parallel | Disk | VRAM |
|---|---|---|---|---|---|---|
default ★ |
Qwen 3.5 4B Claude-Opus distill Q4_K_M (CPU) | 78 | 12 K | 1 | 2.5 GB | ~4.8 GB (RAM) |
ministral-3-8b-reasoning-bf16 |
Ministral 3 8B Reasoning BF16 | 83 | 16 K | 2 | 16.0 GB | ~18.9 GB |
rtila-qwen3.5-9b-q4 |
rtila Assistant Lite · 9B Q4_K_M | 82 | 12 K | 1 | 5.2 GB | ~7.1 GB |
jackrong-qwen3.5-4b-distill-q4 ★ |
Qwen 3.5 4B Claude-Opus distill Q4_K_M (GPU) | 78 | 12 K | 1 | 2.5 GB | ~4.8 GB |
qwen3.5-9b-bf16 |
Qwen 3.5 9B BF16 | 78 | 16 K | 1 | 18.4 GB | ~18.0 GB |
ministral-3-8b-reasoning-q5 |
Ministral 3 8B Reasoning Q5_K_M | 76 | 16 K | 2 | 5.8 GB | ~9.2 GB |
Quality column = F1 × 100 from the 5-PDF pentest corpus.
default and jackrong-qwen3.5-4b-distill-q4 share the same GGUF
file; the only difference is n_gpu_layers (0 vs 99). Same model,
two runtime configs.
How to choose¶
| Hardware / goal | Pick |
|---|---|
| 18+ GB VRAM, max quality | ministral-3-8b-reasoning-bf16 |
| 18+ GB VRAM, fewest false alarms | qwen3.5-9b-bf16 |
| ~7 GB VRAM, near-leader quality | rtila-qwen3.5-9b-q4 |
| ~6 GB VRAM, smallest "good" model ★ recommended | jackrong-qwen3.5-4b-distill-q4 |
| ~10 GB VRAM, reasoning quality | ministral-3-8b-reasoning-q5 |
| No GPU | default (Jackrong 4B distill Q4 on CPU) |
To pin a default for repeatable runs use ★ Set as default in the
preset gallery, it persists to
~/.config/document-anonymizer/preferences.yml.
Server lifecycle (sidebar dot + auto-start)¶
The sidebar Server entry shows a small status dot:
- 🔴 red = offline
- 🟡 amber = starting (worker spawning, health endpoint warming up)
- 🟢 green = online
Click the dot to bring the active preset up. A pre-flight check runs first; on failure you're routed to the Server tab with a specific error dialog (missing binary, model not on disk, docker not in PATH, external server unreachable, etc.). When the server is already up or in flight the click just opens the Server tab.
The Server panel also exposes a persistent Auto-start server on
launch toggle. Stored in
~/.config/document-anonymizer/app_settings.yml. Pre-flight
applies here too (silent skip when the active preset can't start,
no startup-time error dialog).
Customising¶
- Open the Server view → Customize on any card.
- Edit any field (cmd-line preview updates live).
- Save → choose user or project scope.
Common tweaks:
- More parallel slots:
parallel: 4cuts wall time on long scans, at the cost ofctx_size / parallelper slot. Use the pre-flight token check (Pipeline → Run) to verify the slot still fits a ~7 500-token request. - Bigger ctx: bump
ctx_sizeto 32 K if you increase the detector'smax_chunk_charspast 5 000 (rare). - Different binary: point
binaryat your own llama.cpp build.
Quantisation policy¶
We ship both BF16 and quantised presets when the quant clears the quality bar. See BENCHMARKS.md for same-model BF16 vs. Q5_K_M / Q4_K_XL / Q8_K_XL deltas. KV cache stays at f16 across the board.