Add CLAUDE.md project guide for future Claude Code sessions
Quick orientation: layout, hard rules (native tools stay disabled, sanitize+wrap, no secrets, two trees in sync, firewall is part of the threat model, deer-flow is vendored), where things run on data-nuc, commit style, a one-page verification block, and the common NixOS / docker / pip footguns to avoid.
This commit is contained in:
177
CLAUDE.md
Normal file
177
CLAUDE.md
Normal file
@@ -0,0 +1,177 @@
|
||||
# CLAUDE.md — orientation for future sessions
|
||||
|
||||
You are in **deerflow-factory**, a hardened deployment of
|
||||
[bytedance/deer-flow](https://github.com/bytedance/deer-flow) on data-nuc.
|
||||
Read this file first before touching anything in this repo.
|
||||
|
||||
## Read these in order
|
||||
|
||||
1. **HARDENING.md** — threat model, what was changed, why, and how to
|
||||
verify. The full audit trail. Update it whenever you change anything
|
||||
listed in section 6 ("Files touched").
|
||||
2. **RUN.md** — how to start/stop/inspect the stack, smoke-test commands.
|
||||
3. **DEERFLOW_PROMPT_INJECTION_PROTECTION_PLAN.md** — the original plan
|
||||
that the hardening implements. Historical context.
|
||||
|
||||
## Layout
|
||||
|
||||
```
|
||||
deerflow-factory/
|
||||
├── deer-flow/ ← vendored upstream (no nested .git!)
|
||||
│ └── backend/packages/harness/deerflow/
|
||||
│ ├── security/ ← hardened content sanitizer (lives here!)
|
||||
│ ├── community/searx/ ← hardened web tools (lives here!)
|
||||
│ └── community/<native>/ ← stubbed, raise on import
|
||||
├── backend/ ← factory overlay (mirror, kept in sync)
|
||||
│ └── packages/harness/deerflow/
|
||||
│ ├── security/ ← duplicated source-of-truth
|
||||
│ └── community/searx/ ← duplicated source-of-truth
|
||||
├── docker/
|
||||
│ └── docker-compose.override.yaml ← named bridge br-deerflow
|
||||
├── scripts/
|
||||
│ ├── deerflow-firewall.sh ← egress firewall up/down/status
|
||||
│ └── deerflow-firewall.nix ← NixOS module (imported by /etc/nixos/configuration.nix)
|
||||
├── config.yaml ← runtime config — only references searx tools
|
||||
├── .env ← real secrets, .gitignored
|
||||
├── .env.example ← template
|
||||
├── HARDENING.md
|
||||
├── RUN.md
|
||||
└── CLAUDE.md ← you are here
|
||||
```
|
||||
|
||||
## Hard rules (do not violate without explicit user approval)
|
||||
|
||||
1. **Native web tools stay disabled.** The legacy providers
|
||||
(`ddg_search`, `tavily`, `exa`, `firecrawl`, `jina_ai`, `infoquest`,
|
||||
`image_search`) and the matching helper clients (`jina_client.py`,
|
||||
`infoquest_client.py`) are intentionally replaced with import-time
|
||||
`RuntimeError` stubs. Re-enabling **any** of them requires:
|
||||
- hardening the call site (sanitize → wrap_untrusted_content)
|
||||
- moving the matching test out of `tests/_disabled_native/`
|
||||
- updating HARDENING.md sections 2.3, 2.4, and 6
|
||||
- explicit user sign-off in the same conversation
|
||||
|
||||
2. **All web output must be sanitized and delimited.** Any new code path
|
||||
that returns external data to the model **must** route through
|
||||
`deerflow.security.sanitizer.sanitize()` and
|
||||
`deerflow.security.content_delimiter.wrap_untrusted_content()`.
|
||||
The whole point of this build is that the LLM never sees raw web
|
||||
bytes.
|
||||
|
||||
3. **No secrets in git.** `.env` is `.gitignored`. Before staging,
|
||||
verify with `git diff --cached | grep -iE 'api_key|secret|token|password'`.
|
||||
Use `.env.example` for templates only — placeholders, never live keys.
|
||||
|
||||
4. **Two source trees, one truth.** The hardened code lives in **both**
|
||||
`deer-flow/backend/packages/harness/deerflow/{security,community/searx}/`
|
||||
(the runtime path) **and** `backend/packages/harness/deerflow/...`
|
||||
(the factory overlay used by the standalone tests). They must stay
|
||||
identical. If you fix a bug in one, mirror it to the other in the
|
||||
same commit, or delete one of the two trees and pick a single source
|
||||
of truth.
|
||||
|
||||
5. **The egress firewall is part of the threat model.** Do not change
|
||||
`scripts/deerflow-firewall.sh` allow/block lists without updating
|
||||
HARDENING.md section 2.7. Specifically:
|
||||
- allow: `10.67.67.1` (Searx), `10.67.67.2` (XTTS/Whisper/Ollama-local)
|
||||
- block: `192.168.3.0/24` (home LAN), `10.0.0.0/8`, `172.16.0.0/12`
|
||||
|
||||
6. **deer-flow is vendored, not a submodule.** The upstream `.git` was
|
||||
removed and is parked at `/tmp/deer-flow-upstream.git.bak` on
|
||||
data-nuc. If you need to pull upstream changes, do it in a separate
|
||||
working copy and rebase manually — do not re-introduce a nested git
|
||||
into this repo.
|
||||
|
||||
## Where things run
|
||||
|
||||
- **Host:** data-nuc (NixOS 25.11, kernel 6.12.x). `data` user is in
|
||||
the `docker` group, can use `docker compose` directly.
|
||||
- **Repo path:** `/home/data/deerflow-factory`
|
||||
- **Gitea remote:** `https://git.beerbandit.de/DATA/deerflow-factory`
|
||||
(credentials in `~/.git-credentials` for user `data`)
|
||||
- **Egress firewall:** `systemctl status deerflow-firewall`
|
||||
- active = rules in DOCKER-USER, applied to `br-deerflow`
|
||||
- inactive = rules removed (no firewall)
|
||||
- **DeerFlow stack:** not running yet at the time of this CLAUDE.md
|
||||
initial commit. First start: see RUN.md.
|
||||
|
||||
## Commit / push style
|
||||
|
||||
- Imperative subject, present-tense body. Reference HARDENING.md
|
||||
sections by number when you change something they describe.
|
||||
- Do not amend or force-push without asking. Add a follow-up commit.
|
||||
- Pre-commit secret check:
|
||||
```bash
|
||||
git diff --cached --name-only | xargs -I{} grep -lE \
|
||||
'api_key|secret_key|sk-[a-zA-Z0-9]{20,}|ghp_|tvly-' {} 2>/dev/null
|
||||
```
|
||||
Only `.env.example` should appear. If anything else does, abort.
|
||||
|
||||
## Quick verification (run before declaring "it works")
|
||||
|
||||
From `/home/data/deerflow-factory`:
|
||||
|
||||
```bash
|
||||
PYTHONPATH=deer-flow/backend/packages/harness python3 -c "
|
||||
# 1. hardened modules import
|
||||
from deerflow.security.content_delimiter import wrap_untrusted_content
|
||||
from deerflow.security.sanitizer import sanitizer
|
||||
from deerflow.security.html_cleaner import extract_secure_text
|
||||
import importlib.util
|
||||
assert importlib.util.find_spec('deerflow.community.searx.tools') is not None
|
||||
|
||||
# 2. native modules fail closed
|
||||
for prov in ['ddg_search','tavily','exa','firecrawl','jina_ai','infoquest','image_search']:
|
||||
try:
|
||||
__import__(f'deerflow.community.{prov}.tools')
|
||||
raise SystemExit(f'FAIL: {prov} imported')
|
||||
except RuntimeError as e:
|
||||
assert 'disabled in this hardened DeerFlow build' in str(e)
|
||||
print('OK')
|
||||
"
|
||||
|
||||
# 3. security tests
|
||||
PYTHONPATH=deer-flow/backend/packages/harness pytest \
|
||||
backend/tests/test_security_sanitizer.py \
|
||||
backend/tests/test_security_html_cleaner.py -q
|
||||
# expected: 8 passed
|
||||
|
||||
# 4. firewall service
|
||||
systemctl is-active deerflow-firewall
|
||||
sudo scripts/deerflow-firewall.sh status
|
||||
```
|
||||
|
||||
If any of these fail and you cannot fix them in the same session, stop
|
||||
and report — do not paper over the failure.
|
||||
|
||||
## Common footguns
|
||||
|
||||
- **`pip` is not installed system-wide on NixOS.** If you need a Python
|
||||
dep for a one-off script, use `nix-shell -p python3Packages.<name>`
|
||||
or run inside a deer-flow `.venv` once it exists. Do not try
|
||||
`pip install --user` — it will fail.
|
||||
- **`sudo` is passwordless for `data`.** Be careful: any `sudo` you run
|
||||
succeeds without a prompt. Double-check destructive commands.
|
||||
- **NixOS rewrites `/etc/systemd/system/`.** Do not drop unit files in
|
||||
there directly; they will be wiped on `nixos-rebuild switch`. Add a
|
||||
`systemd.services.<name>` block to a Nix module instead (see
|
||||
`scripts/deerflow-firewall.nix` for the pattern).
|
||||
- **The factory overlay (`backend/`) is currently a mirror, not the
|
||||
runtime.** When you import from Python at runtime, the path that
|
||||
matters is `deer-flow/backend/packages/harness`. The overlay only
|
||||
matters for the standalone factory tests. Keep them in sync until we
|
||||
pick one as canonical.
|
||||
- **`docker compose down` does not remove the firewall rules.** That is
|
||||
by design. Only `systemctl stop deerflow-firewall` removes them.
|
||||
|
||||
## What this repo is NOT
|
||||
|
||||
- Not a fork on GitHub. The vendored upstream `.git` was deleted on
|
||||
purpose. If you need to compare against upstream, clone it fresh into
|
||||
`/tmp/`.
|
||||
- Not a Python package (yet). There is no `pyproject.toml` at the
|
||||
factory root; the Python entry point is the deer-flow tree's own
|
||||
`backend/pyproject.toml`. We only put files into its harness package.
|
||||
- Not multi-tenant. There is exactly one DeerFlow instance, one Searx,
|
||||
one set of credentials. Keep it that way unless the user explicitly
|
||||
asks for tenancy.
|
||||
Reference in New Issue
Block a user