Files
deerflow-factory/HARDENING.md
DATA 75315d958e Network isolation: egress firewall + named bridge
Adds the host-level egress firewall recommended by the upstream
DeerFlow team's "run in a VLAN" guidance, adapted to a Fritzbox-only
home network where LAN VLANs are not available.

- docker/docker-compose.override.yaml: pins the upstream deer-flow
  Docker network to a stable Linux bridge name br-deerflow so the
  firewall can address it without guessing Docker's auto-generated
  br-<hash>. Used as a -f overlay on top of the upstream compose file.

- scripts/deerflow-firewall.sh: idempotent up/down/status wrapper that
  installs DOCKER-USER iptables rules. Allowlist for 10.67.67.1 (Searx)
  and 10.67.67.2 (XTTS/Whisper/Ollama-local), hard block for
  192.168.3.0/24 (home LAN), 10.0.0.0/8, 172.16.0.0/12. Stateful return
  rule keeps inbound LAN access to published ports working.

- scripts/deerflow-firewall.nix: NixOS module snippet defining a
  systemd unit ordered After=docker.service so the rules survive
  dockerd restarts and follow its lifecycle. Copy into
  configuration.nix and nixos-rebuild switch.

- HARDENING.md: new section 2.7 "Network isolation (egress firewall)"
  with allow/block tables, bring-up steps, and smoke-test commands.

Guarantees: rules match on -i br-deerflow, so if the bridge does not
exist, the rules are no-ops and do not affect any other container
(paperclip, telebrowser, openclaw-gateway, ...). Stopping the
container leaves the rules in place but inert; stopping the systemd
unit removes them.
2026-04-12 14:56:26 +02:00

14 KiB

DeerFlow Hardening Notes

This repository is a hardened deployment of bytedance/deer-flow with the only goal of preventing prompt-injection attacks via the agent's web access surface.

The upstream tree lives in deer-flow/ and is checked in directly (no submodule, no nested git). All hardening changes are kept inside that tree so that python -m deerflow.community.searx.tools resolves out of the box once deer-flow/backend/packages/harness is on PYTHONPATH.

This document is a defense-in-depth audit trail. If you change any of the files listed here, please update this document in the same commit.

1. Threat model

Prompt-injection via untrusted web content. An attacker controls the body of an HTML page (or a search-result snippet) and tries to make the model:

  1. Treat externally fetched text as system instructions (delimiter confusion).
  2. Smuggle hidden tokens via invisible Unicode (zero-width spaces, BOM, PUA, tag characters).
  3. Inject executable HTML (<script>, <iframe>, <form>, ...) that the model would summarise verbatim.

The hardening below is a port of the OpenClaw approach (searx-scripts/, fetch-scripts/) to DeerFlow's adapter contract.

2. What was changed

2.1 New: deerflow.security

deer-flow/backend/packages/harness/deerflow/security/

File Purpose
__init__.py Public re-exports
content_delimiter.py Wraps untrusted content in <<<EXTERNAL_UNTRUSTED_CONTENT>>> ... <<<END_EXTERNAL_UNTRUSTED_CONTENT>>> so the LLM has a semantic boundary between system instructions and external data
html_cleaner.py SecureTextExtractor strips script, style, noscript, header, footer, nav, aside, iframe, object, embed, form
sanitizer.py PromptInjectionSanitizer: 8 layers — invisible chars, control chars, symbols (So/Sk), NFC normalize, PUA, tag chars, horizontal-whitespace collapse (newlines/tabs preserved), length cap

2.2 New: deerflow.community.searx

deer-flow/backend/packages/harness/deerflow/community/searx/tools.py

LangChain @tool exports:

  • web_search_tool(query, max_results=10) — calls a private SearX instance, sanitizes title + content, wraps results in security delimiters
  • web_fetch_tool(url, max_chars=10000) — fetches URL, runs extract_secure_text then sanitizer.sanitize, wraps result
  • image_search_tool(query, max_results=5) — SearX categories=images, sanitized title/url/thumbnail, wrapped

Reads its config from get_app_config().get_tool_config(<name>).model_extra: searx_url, max_results, max_chars.

2.3 Disabled: native community web tools

Every legacy provider's tools.py was replaced with a hard-fail stub that raises NativeWebToolDisabledError at module import time. Importing the module aborts with a clear message pointing at the searx replacement, so a misconfigured tool.use: path in config.yaml fails loud, not silent.

Provider Status Reason
community/ddg_search/tools.py stub unhardened DuckDuckGo HTML scrape
community/tavily/tools.py stub external API, no sanitization
community/exa/tools.py stub external API, no sanitization
community/firecrawl/tools.py stub external API, no sanitization
community/jina_ai/tools.py stub unhardened Jina Reader
community/jina_ai/jina_client.py stub back-door client, also disabled
community/infoquest/tools.py stub external API, no sanitization
community/infoquest/infoquest_client.py stub back-door client, also disabled
community/image_search/tools.py stub unhardened DDG image fallback

Central reject helper: community/_disabled_native.pyreject_native_provider(name) raises NativeWebToolDisabledError.

2.4 Quarantined tests

Tests that expected the native modules to be importable are moved to deer-flow/backend/tests/_disabled_native/. A conftest.py in that directory sets collect_ignore_glob = ["*.py"] so pytest skips them without erroring.

Test Reason
test_exa_tools.py imports deerflow.community.exa.tools
test_firecrawl_tools.py imports deerflow.community.firecrawl.tools
test_jina_client.py imports deerflow.community.jina_ai.jina_client
test_infoquest_client.py imports deerflow.community.infoquest.infoquest_client

test_doctor.py and test_setup_wizard.py reference the native paths only as strings in test configs (not as imports), so they continue to run unchanged.

2.5 Sanitizer bug fix

PromptInjectionSanitizer.sanitize() Layer 7 used to do re.sub(r'\s+', ' ', text) which collapsed \n and \t into single spaces — destroying list/table structure from web pages. Replaced with horizontal-whitespace-only collapse plus \n{3,} -> \n\n. Verified by test_security_sanitizer.py::test_preserves_newlines_and_tabs.

2.6 Hardened runtime config

config.yaml (top-level, not deer-flow/config.example.yaml) is the runtime config and references only the searx-backed tools:

tools:
  - name: web_search
    group: web
    use: deerflow.community.searx.tools:web_search_tool
    searx_url: http://10.67.67.1:8888
    max_results: 10
  - name: web_fetch
    group: web
    use: deerflow.community.searx.tools:web_fetch_tool
    max_chars: 10000
  - name: image_search
    group: web
    use: deerflow.community.searx.tools:image_search_tool
    max_results: 5

The guardrail layer is intentionally not used as the primary block: DeerFlow guardrails see only tool.name (e.g. web_search), and both the hardened and the native version export the same name. The real block is the import-time stub above.

2.7 Network isolation (egress firewall)

The DeerFlow team recommends running the agent in a dedicated VLAN. Our Fritzbox cannot do LAN VLANs, so instead we put the container behind an egress firewall on the Docker host. The container can reach the Internet plus a small whitelist of Wireguard hosts (Searx, local model servers), but cannot scan or attack any device on the home LAN. Inbound traffic from the LAN to the container's published ports is unaffected because the rules are stateful.

Allow (egress from container):

Destination Purpose
1.0.0.0/8 ... 223.0.0.0/8 (public Internet) Ollama Cloud, search backends
10.67.67.1 Searx (Wireguard)
10.67.67.2 XTTS / Whisper / Ollama-local (Wireguard)

Block (egress from container):

Destination Reason
192.168.3.0/24 home LAN — no lateral movement
10.0.0.0/8 (except whitelisted /32) other Wireguard subnets, RFC1918
172.16.0.0/12 other Docker bridges

Implementation:

File Role
docker/docker-compose.override.yaml Pins the upstream deer-flow Docker network to a stable Linux bridge name br-deerflow, so the firewall can address it without guessing Docker's auto-generated br-<hash>. Used as a -f overlay on top of deer-flow/docker/docker-compose.yaml.
scripts/deerflow-firewall.sh Idempotent up/down/status wrapper that installs the iptables rules in the DOCKER-USER chain. Inserted in reverse order so the final chain order is: stateful return, allow Searx, allow Ollama-local, block LAN, block /8, block /12.
scripts/deerflow-firewall.nix NixOS module snippet defining systemd.services.deerflow-firewall. Ordered After=docker.service, Requires=docker.service, PartOf=docker.service so the rules survive dockerd restarts and follow its lifecycle. Copy into configuration.nix and nixos-rebuild switch.

Important guarantees:

  • The rules match on -i br-deerflow. If the bridge does not exist (e.g. DeerFlow has never been started), the rules are no-ops and do not affect any other container (paperclip, telebrowser, openclaw-gateway, ...). They activate automatically the moment docker compose ... up -d creates the bridge.
  • Stopping or removing the DeerFlow container leaves the rules in place but inert. Stopping the systemd unit removes them.
  • The script is idempotent: up will never duplicate a rule, down removes all copies.

Bring up:

cd /home/data/deerflow-factory
docker compose \
    -f deer-flow/docker/docker-compose.yaml \
    -f docker/docker-compose.override.yaml \
    up -d

# Then either run the script directly:
sudo scripts/deerflow-firewall.sh up

# ...or, on NixOS, copy scripts/deerflow-firewall.nix into configuration.nix
# and:
sudo nixos-rebuild switch
systemctl status deerflow-firewall

Smoke tests (run from inside the container, e.g. docker exec -it <id> sh):

# allowed
curl -s -o /dev/null -w "%{http_code}\n" --max-time 5 http://10.67.67.1:8888/    # Searx -> 200
curl -s -o /dev/null -w "%{http_code}\n" --max-time 5 https://api.cloudflare.com/  # Internet -> 200/4xx

# blocked (should fail with "no route" / "host prohibited" / timeout)
curl -s -o /dev/null -w "%{http_code}\n" --max-time 5 http://192.168.3.1/         # FAIL
curl -s -o /dev/null -w "%{http_code}\n" --max-time 5 http://10.67.67.16/         # FAIL (blocked by 10/8 reject; .16 is not whitelisted)

3. Verification

All checks below assume PYTHONPATH=deer-flow/backend/packages/harness.

3.1 Hardened modules import

python3 -c "
from deerflow.security.content_delimiter import wrap_untrusted_content
from deerflow.security.sanitizer import sanitizer
from deerflow.security.html_cleaner import extract_secure_text
import importlib.util
assert importlib.util.find_spec('deerflow.community.searx.tools') is not None
print('OK')
"

3.2 Native modules fail closed

python3 -c "
for prov in ['ddg_search','tavily','exa','firecrawl','jina_ai','infoquest','image_search']:
    try:
        __import__(f'deerflow.community.{prov}.tools')
        raise SystemExit(f'FAIL: {prov} imported')
    except RuntimeError as e:
        assert 'disabled in this hardened DeerFlow build' in str(e)
print('OK — all native providers blocked')
"

3.3 Security tests

PYTHONPATH=deer-flow/backend/packages/harness pytest \
    backend/tests/test_security_sanitizer.py \
    backend/tests/test_security_html_cleaner.py -q

Expected: 8 passed.

4. Adding a new web tool

  1. Implement it in deer-flow/backend/packages/harness/deerflow/community/<name>/tools.py.
  2. Always sanitize external strings via deerflow.security.sanitizer.
  3. Always wrap the response with wrap_untrusted_content().
  4. For HTML input, use extract_secure_text() first.
  5. Add a test to backend/tests/ that asserts the security delimiters are present in the tool output.
  6. Update this document.

5. Re-enabling a native provider (don't)

If you really must:

  1. Replace the stub in community/<provider>/tools.py with a hardened wrapper (sanitize → delimiter, just like searx).
  2. Move the matching test out of tests/_disabled_native/.
  3. Update this document and explain the threat-model change in your commit message.

6. Files touched (audit trail)

deer-flow/backend/packages/harness/deerflow/security/__init__.py          (new)
deer-flow/backend/packages/harness/deerflow/security/content_delimiter.py (new)
deer-flow/backend/packages/harness/deerflow/security/html_cleaner.py      (new)
deer-flow/backend/packages/harness/deerflow/security/sanitizer.py         (new, with newline-preserving fix)

deer-flow/backend/packages/harness/deerflow/community/searx/__init__.py   (new)
deer-flow/backend/packages/harness/deerflow/community/searx/tools.py      (new)

deer-flow/backend/packages/harness/deerflow/community/_disabled_native.py        (new)
deer-flow/backend/packages/harness/deerflow/community/ddg_search/tools.py        (replaced with stub)
deer-flow/backend/packages/harness/deerflow/community/tavily/tools.py            (replaced with stub)
deer-flow/backend/packages/harness/deerflow/community/exa/tools.py               (replaced with stub)
deer-flow/backend/packages/harness/deerflow/community/firecrawl/tools.py         (replaced with stub)
deer-flow/backend/packages/harness/deerflow/community/jina_ai/tools.py           (replaced with stub)
deer-flow/backend/packages/harness/deerflow/community/jina_ai/jina_client.py     (replaced with stub)
deer-flow/backend/packages/harness/deerflow/community/infoquest/tools.py         (replaced with stub)
deer-flow/backend/packages/harness/deerflow/community/infoquest/infoquest_client.py (replaced with stub)
deer-flow/backend/packages/harness/deerflow/community/image_search/tools.py      (replaced with stub)

deer-flow/backend/tests/_disabled_native/conftest.py                      (new — collect_ignore_glob)
deer-flow/backend/tests/_disabled_native/test_exa_tools.py                (moved)
deer-flow/backend/tests/_disabled_native/test_firecrawl_tools.py          (moved)
deer-flow/backend/tests/_disabled_native/test_jina_client.py              (moved)
deer-flow/backend/tests/_disabled_native/test_infoquest_client.py         (moved)

backend/packages/harness/deerflow/security/                               (factory overlay, kept in sync)
backend/packages/harness/deerflow/community/searx/                        (factory overlay, kept in sync)
backend/tests/test_security_sanitizer.py                                  (factory tests)
backend/tests/test_security_html_cleaner.py                               (factory tests)
backend/tests/test_searx_tools.py                                         (factory tests)

config.yaml                                                                (hardened runtime config, references only searx tools)
.env.example                                                               (template, no secrets)
HARDENING.md                                                               (this file)

docker/docker-compose.override.yaml                                        (named bridge br-deerflow)
scripts/deerflow-firewall.sh                                               (egress firewall up/down/status)
scripts/deerflow-firewall.nix                                              (NixOS systemd unit snippet)