Quick orientation: layout, hard rules (native tools stay disabled, sanitize+wrap, no secrets, two trees in sync, firewall is part of the threat model, deer-flow is vendored), where things run on data-nuc, commit style, a one-page verification block, and the common NixOS / docker / pip footguns to avoid.
8.0 KiB
CLAUDE.md — orientation for future sessions
You are in deerflow-factory, a hardened deployment of bytedance/deer-flow on data-nuc. Read this file first before touching anything in this repo.
Read these in order
- HARDENING.md — threat model, what was changed, why, and how to verify. The full audit trail. Update it whenever you change anything listed in section 6 ("Files touched").
- RUN.md — how to start/stop/inspect the stack, smoke-test commands.
- DEERFLOW_PROMPT_INJECTION_PROTECTION_PLAN.md — the original plan that the hardening implements. Historical context.
Layout
deerflow-factory/
├── deer-flow/ ← vendored upstream (no nested .git!)
│ └── backend/packages/harness/deerflow/
│ ├── security/ ← hardened content sanitizer (lives here!)
│ ├── community/searx/ ← hardened web tools (lives here!)
│ └── community/<native>/ ← stubbed, raise on import
├── backend/ ← factory overlay (mirror, kept in sync)
│ └── packages/harness/deerflow/
│ ├── security/ ← duplicated source-of-truth
│ └── community/searx/ ← duplicated source-of-truth
├── docker/
│ └── docker-compose.override.yaml ← named bridge br-deerflow
├── scripts/
│ ├── deerflow-firewall.sh ← egress firewall up/down/status
│ └── deerflow-firewall.nix ← NixOS module (imported by /etc/nixos/configuration.nix)
├── config.yaml ← runtime config — only references searx tools
├── .env ← real secrets, .gitignored
├── .env.example ← template
├── HARDENING.md
├── RUN.md
└── CLAUDE.md ← you are here
Hard rules (do not violate without explicit user approval)
-
Native web tools stay disabled. The legacy providers (
ddg_search,tavily,exa,firecrawl,jina_ai,infoquest,image_search) and the matching helper clients (jina_client.py,infoquest_client.py) are intentionally replaced with import-timeRuntimeErrorstubs. Re-enabling any of them requires:- hardening the call site (sanitize → wrap_untrusted_content)
- moving the matching test out of
tests/_disabled_native/ - updating HARDENING.md sections 2.3, 2.4, and 6
- explicit user sign-off in the same conversation
-
All web output must be sanitized and delimited. Any new code path that returns external data to the model must route through
deerflow.security.sanitizer.sanitize()anddeerflow.security.content_delimiter.wrap_untrusted_content(). The whole point of this build is that the LLM never sees raw web bytes. -
No secrets in git.
.envis.gitignored. Before staging, verify withgit diff --cached | grep -iE 'api_key|secret|token|password'. Use.env.examplefor templates only — placeholders, never live keys. -
Two source trees, one truth. The hardened code lives in both
deer-flow/backend/packages/harness/deerflow/{security,community/searx}/(the runtime path) andbackend/packages/harness/deerflow/...(the factory overlay used by the standalone tests). They must stay identical. If you fix a bug in one, mirror it to the other in the same commit, or delete one of the two trees and pick a single source of truth. -
The egress firewall is part of the threat model. Do not change
scripts/deerflow-firewall.shallow/block lists without updating HARDENING.md section 2.7. Specifically:- allow:
10.67.67.1(Searx),10.67.67.2(XTTS/Whisper/Ollama-local) - block:
192.168.3.0/24(home LAN),10.0.0.0/8,172.16.0.0/12
- allow:
-
deer-flow is vendored, not a submodule. The upstream
.gitwas removed and is parked at/tmp/deer-flow-upstream.git.bakon data-nuc. If you need to pull upstream changes, do it in a separate working copy and rebase manually — do not re-introduce a nested git into this repo.
Where things run
- Host: data-nuc (NixOS 25.11, kernel 6.12.x).
datauser is in thedockergroup, can usedocker composedirectly. - Repo path:
/home/data/deerflow-factory - Gitea remote:
https://git.beerbandit.de/DATA/deerflow-factory(credentials in~/.git-credentialsfor userdata) - Egress firewall:
systemctl status deerflow-firewall- active = rules in DOCKER-USER, applied to
br-deerflow - inactive = rules removed (no firewall)
- active = rules in DOCKER-USER, applied to
- DeerFlow stack: not running yet at the time of this CLAUDE.md initial commit. First start: see RUN.md.
Commit / push style
- Imperative subject, present-tense body. Reference HARDENING.md sections by number when you change something they describe.
- Do not amend or force-push without asking. Add a follow-up commit.
- Pre-commit secret check:
Only
git diff --cached --name-only | xargs -I{} grep -lE \ 'api_key|secret_key|sk-[a-zA-Z0-9]{20,}|ghp_|tvly-' {} 2>/dev/null.env.exampleshould appear. If anything else does, abort.
Quick verification (run before declaring "it works")
From /home/data/deerflow-factory:
PYTHONPATH=deer-flow/backend/packages/harness python3 -c "
# 1. hardened modules import
from deerflow.security.content_delimiter import wrap_untrusted_content
from deerflow.security.sanitizer import sanitizer
from deerflow.security.html_cleaner import extract_secure_text
import importlib.util
assert importlib.util.find_spec('deerflow.community.searx.tools') is not None
# 2. native modules fail closed
for prov in ['ddg_search','tavily','exa','firecrawl','jina_ai','infoquest','image_search']:
try:
__import__(f'deerflow.community.{prov}.tools')
raise SystemExit(f'FAIL: {prov} imported')
except RuntimeError as e:
assert 'disabled in this hardened DeerFlow build' in str(e)
print('OK')
"
# 3. security tests
PYTHONPATH=deer-flow/backend/packages/harness pytest \
backend/tests/test_security_sanitizer.py \
backend/tests/test_security_html_cleaner.py -q
# expected: 8 passed
# 4. firewall service
systemctl is-active deerflow-firewall
sudo scripts/deerflow-firewall.sh status
If any of these fail and you cannot fix them in the same session, stop and report — do not paper over the failure.
Common footguns
pipis not installed system-wide on NixOS. If you need a Python dep for a one-off script, usenix-shell -p python3Packages.<name>or run inside a deer-flow.venvonce it exists. Do not trypip install --user— it will fail.sudois passwordless fordata. Be careful: anysudoyou run succeeds without a prompt. Double-check destructive commands.- NixOS rewrites
/etc/systemd/system/. Do not drop unit files in there directly; they will be wiped onnixos-rebuild switch. Add asystemd.services.<name>block to a Nix module instead (seescripts/deerflow-firewall.nixfor the pattern). - The factory overlay (
backend/) is currently a mirror, not the runtime. When you import from Python at runtime, the path that matters isdeer-flow/backend/packages/harness. The overlay only matters for the standalone factory tests. Keep them in sync until we pick one as canonical. docker compose downdoes not remove the firewall rules. That is by design. Onlysystemctl stop deerflow-firewallremoves them.
What this repo is NOT
- Not a fork on GitHub. The vendored upstream
.gitwas deleted on purpose. If you need to compare against upstream, clone it fresh into/tmp/. - Not a Python package (yet). There is no
pyproject.tomlat the factory root; the Python entry point is the deer-flow tree's ownbackend/pyproject.toml. We only put files into its harness package. - Not multi-tenant. There is exactly one DeerFlow instance, one Searx, one set of credentials. Keep it that way unless the user explicitly asks for tenancy.