Initial commit: hardened DeerFlow factory

Vendored deer-flow upstream (bytedance/deer-flow) plus prompt-injection
hardening:

- New deerflow.security package: content_delimiter, html_cleaner,
  sanitizer (8 layers — invisible chars, control chars, symbols, NFC,
  PUA, tag chars, horizontal whitespace collapse with newline/tab
  preservation, length cap)
- New deerflow.community.searx package: web_search, web_fetch,
  image_search backed by a private SearX instance, every external
  string sanitized and wrapped in <<<EXTERNAL_UNTRUSTED_CONTENT>>>
  delimiters
- All native community web providers (ddg_search, tavily, exa,
  firecrawl, jina_ai, infoquest, image_search) replaced with hard-fail
  stubs that raise NativeWebToolDisabledError at import time, so a
  misconfigured tool.use path fails loud rather than silently falling
  back to unsanitized output
- Native client back-doors (jina_client.py, infoquest_client.py)
  stubbed too
- Native-tool tests quarantined under tests/_disabled_native/
  (collect_ignore_glob via local conftest.py)
- Sanitizer Layer 7 fix: only collapse horizontal whitespace, preserve
  newlines and tabs so list/table structure survives
- Hardened runtime config.yaml references only the searx-backed tools
- Factory overlay (backend/) kept in sync with deer-flow tree as a
  reference / source

See HARDENING.md for the full audit trail and verification steps.
This commit is contained in:
2026-04-12 14:23:57 +02:00
commit 6de0bf9f5b
889 changed files with 173052 additions and 0 deletions

72
.env.example Normal file
View File

@@ -0,0 +1,72 @@
# ============================================================================
# DeerFlow Environment Variables
# ============================================================================
# ----------------------------------------------------------------------------
# API Keys
# ----------------------------------------------------------------------------
# Ollama Cloud API Key (REQUIRED)
OLLAMA_CLOUD_API_KEY=your-ollama-cloud-key-here
# Optional: Other API Keys (uncomment if needed)
# OPENAI_API_KEY=sk-...
# ANTHROPIC_API_KEY=sk-ant-...
# TAVILY_API_KEY=tvly-...
# ----------------------------------------------------------------------------
# DeerFlow Paths
# ----------------------------------------------------------------------------
# Config file path (optional - defaults to config.yaml in CWD)
# DEER_FLOW_CONFIG_PATH=/app/config.yaml
# Data home directory (where threads, uploads, etc. are stored)
# DEER_FLOW_HOME=/app/backend/.deer-flow
# ----------------------------------------------------------------------------
# Security & Authentication
# ----------------------------------------------------------------------------
# Better Auth Secret (required for production)
# Generate with: openssl rand -base64 32
# BETTER_AUTH_SECRET=your-secret-here
# JWT Secret (if using custom auth)
# JWT_SECRET=your-jwt-secret
# ----------------------------------------------------------------------------
# Optional: External Service Configuration
# ----------------------------------------------------------------------------
# GitHub Token (for GitHub research skills)
# GITHUB_TOKEN=ghp_...
# Jina AI API Key (for higher rate limits)
# JINA_API_KEY=jina_...
# Exa API Key (if using Exa search)
# EXA_API_KEY=...
# Firecrawl API Key (if using Firecrawl)
# FIRECRAWL_API_KEY=fc-...
# ----------------------------------------------------------------------------
# Docker / Deployment
# ----------------------------------------------------------------------------
# Docker Compose environment
COMPOSE_PROJECT_NAME=deerflow
# Optional: LangGraph Cloud (if using hosted LangGraph)
# LANGSMITH_API_KEY=ls-...
# ----------------------------------------------------------------------------
# Development (only for local dev)
# ----------------------------------------------------------------------------
# Enable debug mode (more verbose logging)
# DEBUG=true
# Hot reload for development
# RELOAD=true