Files
deerflow-factory/CLAUDE.md
DATA 4237f03a83 Add CLAUDE.md project guide for future Claude Code sessions
Quick orientation: layout, hard rules (native tools stay disabled,
sanitize+wrap, no secrets, two trees in sync, firewall is part of the
threat model, deer-flow is vendored), where things run on data-nuc,
commit style, a one-page verification block, and the common NixOS /
docker / pip footguns to avoid.
2026-04-12 15:23:37 +02:00

8.0 KiB

CLAUDE.md — orientation for future sessions

You are in deerflow-factory, a hardened deployment of bytedance/deer-flow on data-nuc. Read this file first before touching anything in this repo.

Read these in order

  1. HARDENING.md — threat model, what was changed, why, and how to verify. The full audit trail. Update it whenever you change anything listed in section 6 ("Files touched").
  2. RUN.md — how to start/stop/inspect the stack, smoke-test commands.
  3. DEERFLOW_PROMPT_INJECTION_PROTECTION_PLAN.md — the original plan that the hardening implements. Historical context.

Layout

deerflow-factory/
├── deer-flow/                    ← vendored upstream (no nested .git!)
│   └── backend/packages/harness/deerflow/
│       ├── security/             ← hardened content sanitizer  (lives here!)
│       ├── community/searx/      ← hardened web tools           (lives here!)
│       └── community/<native>/   ← stubbed, raise on import
├── backend/                      ← factory overlay (mirror, kept in sync)
│   └── packages/harness/deerflow/
│       ├── security/             ← duplicated source-of-truth
│       └── community/searx/      ← duplicated source-of-truth
├── docker/
│   └── docker-compose.override.yaml  ← named bridge br-deerflow
├── scripts/
│   ├── deerflow-firewall.sh      ← egress firewall up/down/status
│   └── deerflow-firewall.nix     ← NixOS module (imported by /etc/nixos/configuration.nix)
├── config.yaml                    ← runtime config — only references searx tools
├── .env                           ← real secrets, .gitignored
├── .env.example                   ← template
├── HARDENING.md
├── RUN.md
└── CLAUDE.md                      ← you are here

Hard rules (do not violate without explicit user approval)

  1. Native web tools stay disabled. The legacy providers (ddg_search, tavily, exa, firecrawl, jina_ai, infoquest, image_search) and the matching helper clients (jina_client.py, infoquest_client.py) are intentionally replaced with import-time RuntimeError stubs. Re-enabling any of them requires:

    • hardening the call site (sanitize → wrap_untrusted_content)
    • moving the matching test out of tests/_disabled_native/
    • updating HARDENING.md sections 2.3, 2.4, and 6
    • explicit user sign-off in the same conversation
  2. All web output must be sanitized and delimited. Any new code path that returns external data to the model must route through deerflow.security.sanitizer.sanitize() and deerflow.security.content_delimiter.wrap_untrusted_content(). The whole point of this build is that the LLM never sees raw web bytes.

  3. No secrets in git. .env is .gitignored. Before staging, verify with git diff --cached | grep -iE 'api_key|secret|token|password'. Use .env.example for templates only — placeholders, never live keys.

  4. Two source trees, one truth. The hardened code lives in both deer-flow/backend/packages/harness/deerflow/{security,community/searx}/ (the runtime path) and backend/packages/harness/deerflow/... (the factory overlay used by the standalone tests). They must stay identical. If you fix a bug in one, mirror it to the other in the same commit, or delete one of the two trees and pick a single source of truth.

  5. The egress firewall is part of the threat model. Do not change scripts/deerflow-firewall.sh allow/block lists without updating HARDENING.md section 2.7. Specifically:

    • allow: 10.67.67.1 (Searx), 10.67.67.2 (XTTS/Whisper/Ollama-local)
    • block: 192.168.3.0/24 (home LAN), 10.0.0.0/8, 172.16.0.0/12
  6. deer-flow is vendored, not a submodule. The upstream .git was removed and is parked at /tmp/deer-flow-upstream.git.bak on data-nuc. If you need to pull upstream changes, do it in a separate working copy and rebase manually — do not re-introduce a nested git into this repo.

Where things run

  • Host: data-nuc (NixOS 25.11, kernel 6.12.x). data user is in the docker group, can use docker compose directly.
  • Repo path: /home/data/deerflow-factory
  • Gitea remote: https://git.beerbandit.de/DATA/deerflow-factory (credentials in ~/.git-credentials for user data)
  • Egress firewall: systemctl status deerflow-firewall
    • active = rules in DOCKER-USER, applied to br-deerflow
    • inactive = rules removed (no firewall)
  • DeerFlow stack: not running yet at the time of this CLAUDE.md initial commit. First start: see RUN.md.

Commit / push style

  • Imperative subject, present-tense body. Reference HARDENING.md sections by number when you change something they describe.
  • Do not amend or force-push without asking. Add a follow-up commit.
  • Pre-commit secret check:
    git diff --cached --name-only | xargs -I{} grep -lE \
      'api_key|secret_key|sk-[a-zA-Z0-9]{20,}|ghp_|tvly-' {} 2>/dev/null
    
    Only .env.example should appear. If anything else does, abort.

Quick verification (run before declaring "it works")

From /home/data/deerflow-factory:

PYTHONPATH=deer-flow/backend/packages/harness python3 -c "
# 1. hardened modules import
from deerflow.security.content_delimiter import wrap_untrusted_content
from deerflow.security.sanitizer import sanitizer
from deerflow.security.html_cleaner import extract_secure_text
import importlib.util
assert importlib.util.find_spec('deerflow.community.searx.tools') is not None

# 2. native modules fail closed
for prov in ['ddg_search','tavily','exa','firecrawl','jina_ai','infoquest','image_search']:
    try:
        __import__(f'deerflow.community.{prov}.tools')
        raise SystemExit(f'FAIL: {prov} imported')
    except RuntimeError as e:
        assert 'disabled in this hardened DeerFlow build' in str(e)
print('OK')
"

# 3. security tests
PYTHONPATH=deer-flow/backend/packages/harness pytest \
    backend/tests/test_security_sanitizer.py \
    backend/tests/test_security_html_cleaner.py -q
# expected: 8 passed

# 4. firewall service
systemctl is-active deerflow-firewall
sudo scripts/deerflow-firewall.sh status

If any of these fail and you cannot fix them in the same session, stop and report — do not paper over the failure.

Common footguns

  • pip is not installed system-wide on NixOS. If you need a Python dep for a one-off script, use nix-shell -p python3Packages.<name> or run inside a deer-flow .venv once it exists. Do not try pip install --user — it will fail.
  • sudo is passwordless for data. Be careful: any sudo you run succeeds without a prompt. Double-check destructive commands.
  • NixOS rewrites /etc/systemd/system/. Do not drop unit files in there directly; they will be wiped on nixos-rebuild switch. Add a systemd.services.<name> block to a Nix module instead (see scripts/deerflow-firewall.nix for the pattern).
  • The factory overlay (backend/) is currently a mirror, not the runtime. When you import from Python at runtime, the path that matters is deer-flow/backend/packages/harness. The overlay only matters for the standalone factory tests. Keep them in sync until we pick one as canonical.
  • docker compose down does not remove the firewall rules. That is by design. Only systemctl stop deerflow-firewall removes them.

What this repo is NOT

  • Not a fork on GitHub. The vendored upstream .git was deleted on purpose. If you need to compare against upstream, clone it fresh into /tmp/.
  • Not a Python package (yet). There is no pyproject.toml at the factory root; the Python entry point is the deer-flow tree's own backend/pyproject.toml. We only put files into its harness package.
  • Not multi-tenant. There is exactly one DeerFlow instance, one Searx, one set of credentials. Keep it that way unless the user explicitly asks for tenancy.