Files
deerflow-factory/config.yaml
DATA e510f975f6 No-images policy: refuse non-text fetches, drop image_search_tool
Agents in this build are text-only researchers. Image, audio, video,
and binary content has no role in the pipeline and only widens the
attack surface (server-side image fetches, exfiltration via rendered
img tags, etc). The cleanest answer is to never load it in the first
place rather than maintain a domain allowlist that nobody can keep
up to date.

- web_fetch_tool now uses httpx.AsyncClient.stream and inspects the
  Content-Type header BEFORE the body is read into memory. Only
  text/*, application/json, application/xml, application/xhtml+xml,
  application/ld+json, application/atom+xml, application/rss+xml are
  accepted; everything else (image/*, audio/*, video/*, octet-stream,
  pdf, font, missing header, ...) is refused with a wrap_untrusted
  error reply. The body bytes never enter the process for refused
  responses. Read budget is bounded to ~4x max_chars regardless.

- image_search_tool removed from deerflow.community.searx.tools
  (both the deer-flow runtime tree and the factory overlay). The
  function is gone, not stubbed — any tool.use referencing it will
  raise AttributeError at tool-loading time.

- config.yaml: image_search tool entry removed; the example
  allowed_tools list updated to drop image_search.

- HARDENING.md: new section 2.8 explains the policy and the frontend
  caveat (the LLM can still emit ![](url) markdown which the user's
  browser would render — that requires a separate frontend patch
  that is not yet implemented). Section 3.4 adds a verification
  snippet for the policy. The web_fetch entry in section 2.2 is
  updated to mention the streaming Content-Type gate.

Both source trees stay in sync.
2026-04-12 15:59:55 +02:00

212 lines
6.8 KiB
YAML

# ============================================================================
# DeerFlow Configuration - Hardened with Prompt Injection Protection
# ============================================================================
# This config uses OpenClaw-style hardened web search/fetch with SearX
# and Ollama Cloud for LLM inference.
config_version: 6
# ============================================================================
# Logging
# ============================================================================
log_level: info
# ============================================================================
# Token Usage Tracking
# ============================================================================
token_usage:
enabled: true
# ============================================================================
# Models Configuration - Ollama Cloud
# ============================================================================
models:
# Primary model: Ollama Cloud (Kimi K2.5)
- name: kimi-k2.5
display_name: Kimi K2.5 (Ollama Cloud)
use: langchain_ollama:ChatOllama
model: ollama-cloud/kimi-k2.5
base_url: https://api.ollama.cloud/v1
api_key: $OLLAMA_CLOUD_API_KEY
num_predict: 8192
temperature: 0.7
reasoning: true
supports_thinking: true
supports_vision: true
# Fallback: Lightweight model for summarization/titles
- name: qwen2.5
display_name: Qwen 2.5 (Ollama Cloud)
use: langchain_ollama:ChatOllama
model: ollama-cloud/qwen2.5
base_url: https://api.ollama.cloud/v1
api_key: $OLLAMA_CLOUD_API_KEY
num_predict: 4096
temperature: 0.7
supports_thinking: false
supports_vision: false
# ============================================================================
# Tool Groups
# ============================================================================
tool_groups:
- name: web
- name: file:read
- name: file:write
- name: bash
# ============================================================================
# Tools Configuration - Hardened SearX
# ============================================================================
# NOTE: These use OpenClaw-style hardening with prompt injection protection.
# The searx_url points to the private SearX instance.
tools:
# Hardened web search with prompt injection protection
- name: web_search
group: web
use: deerflow.community.searx.tools:web_search_tool
searx_url: http://10.67.67.1:8888
max_results: 10
# Hardened web fetch with HTML sanitization
- name: web_fetch
group: web
use: deerflow.community.searx.tools:web_fetch_tool
max_chars: 10000
# NOTE: image_search is intentionally NOT registered in this build.
# Agents are text-only researchers. See HARDENING.md sec. 2.8.
# File operations (standard)
- name: ls
group: file:read
use: deerflow.sandbox.tools:ls_tool
- name: read_file
group: file:read
use: deerflow.sandbox.tools:read_file_tool
- name: glob
group: file:read
use: deerflow.sandbox.tools:glob_tool
max_results: 200
- name: grep
group: file:read
use: deerflow.sandbox.tools:grep_tool
max_results: 100
- name: write_file
group: file:write
use: deerflow.sandbox.tools:write_file_tool
- name: str_replace
group: file:write
use: deerflow.sandbox.tools:str_replace_tool
# Bash execution (disabled by default for security)
# Uncomment only if using Docker sandbox or trusted environment
# - name: bash
# group: bash
# use: deerflow.sandbox.tools:bash_tool
# ============================================================================
# Guardrails Configuration (Additional Security Layer)
# ============================================================================
# Blocks dangerous tool calls before execution.
# See: backend/docs/GUARDRAILS.md
guardrails:
enabled: true
provider:
use: deerflow.guardrails.builtin:AllowlistProvider
config:
# Deny potentially dangerous tools
denied_tools: []
# Or use allowlist approach (only these allowed):
# allowed_tools: ["web_search", "web_fetch", "read_file", "write_file", "ls", "glob", "grep"]
# ============================================================================
# Sandbox Configuration
# ============================================================================
# For production, use Docker sandbox. For local dev, local sandbox is fine.
sandbox:
use: deerflow.sandbox.local:LocalSandboxProvider
# Host bash is disabled by default for security
allow_host_bash: false
# Optional: Mount additional directories
# mounts:
# - host_path: /home/user/projects
# container_path: /mnt/projects
# read_only: false
# Tool output truncation limits
bash_output_max_chars: 20000
read_file_output_max_chars: 50000
ls_output_max_chars: 20000
# ============================================================================
# Skills Configuration
# ============================================================================
skills:
container_path: /mnt/skills
# ============================================================================
# Title Generation
# ============================================================================
title:
enabled: true
max_words: 6
max_chars: 60
model_name: qwen2.5 # Use lightweight model
# ============================================================================
# Summarization
# ============================================================================
summarization:
enabled: true
model_name: qwen2.5 # Use lightweight model
trigger:
- type: tokens
value: 15564
keep:
type: messages
value: 10
trim_tokens_to_summarize: 15564
# ============================================================================
# Memory Configuration
# ============================================================================
memory:
enabled: true
storage_path: memory.json
debounce_seconds: 30
model_name: qwen2.5
max_facts: 100
fact_confidence_threshold: 0.7
injection_enabled: true
max_injection_tokens: 2000
# ============================================================================
# Skill Self-Evolution (Disabled for security)
# ============================================================================
skill_evolution:
enabled: false
# ============================================================================
# Checkpointer Configuration
# ============================================================================
checkpointer:
type: sqlite
connection_string: checkpoints.db
# ============================================================================
# IM Channels (Disabled by default)
# ============================================================================
# Uncomment and configure if needed
# channels:
# langgraph_url: http://localhost:2024
# gateway_url: http://localhost:8001