# CLAUDE.md — orientation for future sessions You are in **deerflow-factory**, a hardened deployment of [bytedance/deer-flow](https://github.com/bytedance/deer-flow) on data-nuc. Read this file first before touching anything in this repo. ## Read these in order 1. **HARDENING.md** — threat model, what was changed, why, and how to verify. The full audit trail. Update it whenever you change anything listed in section 6 ("Files touched"). 2. **RUN.md** — how to start/stop/inspect the stack, smoke-test commands. 3. **DEERFLOW_PROMPT_INJECTION_PROTECTION_PLAN.md** — the original plan that the hardening implements. Historical context. ## Layout ``` deerflow-factory/ ├── deer-flow/ ← vendored upstream (no nested .git!) │ └── backend/packages/harness/deerflow/ │ ├── security/ ← hardened content sanitizer (lives here!) │ ├── community/searx/ ← hardened web tools (lives here!) │ └── community// ← stubbed, raise on import ├── backend/ ← factory overlay (mirror, kept in sync) │ └── packages/harness/deerflow/ │ ├── security/ ← duplicated source-of-truth │ └── community/searx/ ← duplicated source-of-truth ├── docker/ │ └── docker-compose.override.yaml ← named bridge br-deerflow ├── scripts/ │ ├── deerflow-firewall.sh ← egress firewall up/down/status │ └── deerflow-firewall.nix ← NixOS module (imported by /etc/nixos/configuration.nix) ├── config.yaml ← runtime config — only references searx tools ├── .env ← real secrets, .gitignored ├── .env.example ← template ├── HARDENING.md ├── RUN.md └── CLAUDE.md ← you are here ``` ## Hard rules (do not violate without explicit user approval) 1. **Native web tools stay disabled.** The legacy providers (`ddg_search`, `tavily`, `exa`, `firecrawl`, `jina_ai`, `infoquest`, `image_search`) and the matching helper clients (`jina_client.py`, `infoquest_client.py`) are intentionally replaced with import-time `RuntimeError` stubs. Re-enabling **any** of them requires: - hardening the call site (sanitize → wrap_untrusted_content) - moving the matching test out of `tests/_disabled_native/` - updating HARDENING.md sections 2.3, 2.4, and 6 - explicit user sign-off in the same conversation 2. **All web output must be sanitized and delimited.** Any new code path that returns external data to the model **must** route through `deerflow.security.sanitizer.sanitize()` and `deerflow.security.content_delimiter.wrap_untrusted_content()`. The whole point of this build is that the LLM never sees raw web bytes. 3. **No secrets in git.** `.env` is `.gitignored`. Before staging, verify with `git diff --cached | grep -iE 'api_key|secret|token|password'`. Use `.env.example` for templates only — placeholders, never live keys. 4. **Two source trees, one truth.** The hardened code lives in **both** `deer-flow/backend/packages/harness/deerflow/{security,community/searx}/` (the runtime path) **and** `backend/packages/harness/deerflow/...` (the factory overlay used by the standalone tests). They must stay identical. If you fix a bug in one, mirror it to the other in the same commit, or delete one of the two trees and pick a single source of truth. 5. **The egress firewall is part of the threat model.** Do not change `scripts/deerflow-firewall.sh` allow/block lists without updating HARDENING.md section 2.7. Specifically: - allow: `10.67.67.1` (Searx), `10.67.67.2` (XTTS/Whisper/Ollama-local) - block: `192.168.3.0/24` (home LAN), `10.0.0.0/8`, `172.16.0.0/12` 6. **deer-flow is vendored, not a submodule.** The upstream `.git` was removed and is parked at `/tmp/deer-flow-upstream.git.bak` on data-nuc. If you need to pull upstream changes, do it in a separate working copy and rebase manually — do not re-introduce a nested git into this repo. ## Where things run - **Host:** data-nuc (NixOS 25.11, kernel 6.12.x). `data` user is in the `docker` group, can use `docker compose` directly. - **Repo path:** `/home/data/deerflow-factory` - **Gitea remote:** `https://git.beerbandit.de/DATA/deerflow-factory` (credentials in `~/.git-credentials` for user `data`) - **Egress firewall:** `systemctl status deerflow-firewall` - active = rules in DOCKER-USER, applied to `br-deerflow` - inactive = rules removed (no firewall) - **DeerFlow stack:** not running yet at the time of this CLAUDE.md initial commit. First start: see RUN.md. ## Commit / push style - Imperative subject, present-tense body. Reference HARDENING.md sections by number when you change something they describe. - Do not amend or force-push without asking. Add a follow-up commit. - Pre-commit secret check: ```bash git diff --cached --name-only | xargs -I{} grep -lE \ 'api_key|secret_key|sk-[a-zA-Z0-9]{20,}|ghp_|tvly-' {} 2>/dev/null ``` Only `.env.example` should appear. If anything else does, abort. ## Quick verification (run before declaring "it works") From `/home/data/deerflow-factory`: ```bash PYTHONPATH=deer-flow/backend/packages/harness python3 -c " # 1. hardened modules import from deerflow.security.content_delimiter import wrap_untrusted_content from deerflow.security.sanitizer import sanitizer from deerflow.security.html_cleaner import extract_secure_text import importlib.util assert importlib.util.find_spec('deerflow.community.searx.tools') is not None # 2. native modules fail closed for prov in ['ddg_search','tavily','exa','firecrawl','jina_ai','infoquest','image_search']: try: __import__(f'deerflow.community.{prov}.tools') raise SystemExit(f'FAIL: {prov} imported') except RuntimeError as e: assert 'disabled in this hardened DeerFlow build' in str(e) print('OK') " # 3. security tests PYTHONPATH=deer-flow/backend/packages/harness pytest \ backend/tests/test_security_sanitizer.py \ backend/tests/test_security_html_cleaner.py -q # expected: 8 passed # 4. firewall service systemctl is-active deerflow-firewall sudo scripts/deerflow-firewall.sh status ``` If any of these fail and you cannot fix them in the same session, stop and report — do not paper over the failure. ## Common footguns - **`pip` is not installed system-wide on NixOS.** If you need a Python dep for a one-off script, use `nix-shell -p python3Packages.` or run inside a deer-flow `.venv` once it exists. Do not try `pip install --user` — it will fail. - **`sudo` is passwordless for `data`.** Be careful: any `sudo` you run succeeds without a prompt. Double-check destructive commands. - **NixOS rewrites `/etc/systemd/system/`.** Do not drop unit files in there directly; they will be wiped on `nixos-rebuild switch`. Add a `systemd.services.` block to a Nix module instead (see `scripts/deerflow-firewall.nix` for the pattern). - **The factory overlay (`backend/`) is currently a mirror, not the runtime.** When you import from Python at runtime, the path that matters is `deer-flow/backend/packages/harness`. The overlay only matters for the standalone factory tests. Keep them in sync until we pick one as canonical. - **`docker compose down` does not remove the firewall rules.** That is by design. Only `systemctl stop deerflow-firewall` removes them. ## What this repo is NOT - Not a fork on GitHub. The vendored upstream `.git` was deleted on purpose. If you need to compare against upstream, clone it fresh into `/tmp/`. - Not a Python package (yet). There is no `pyproject.toml` at the factory root; the Python entry point is the deer-flow tree's own `backend/pyproject.toml`. We only put files into its harness package. - Not multi-tenant. There is exactly one DeerFlow instance, one Searx, one set of credentials. Keep it that way unless the user explicitly asks for tenancy.