Initial commit: hardened DeerFlow factory

Vendored deer-flow upstream (bytedance/deer-flow) plus prompt-injection
hardening:

- New deerflow.security package: content_delimiter, html_cleaner,
  sanitizer (8 layers — invisible chars, control chars, symbols, NFC,
  PUA, tag chars, horizontal whitespace collapse with newline/tab
  preservation, length cap)
- New deerflow.community.searx package: web_search, web_fetch,
  image_search backed by a private SearX instance, every external
  string sanitized and wrapped in <<<EXTERNAL_UNTRUSTED_CONTENT>>>
  delimiters
- All native community web providers (ddg_search, tavily, exa,
  firecrawl, jina_ai, infoquest, image_search) replaced with hard-fail
  stubs that raise NativeWebToolDisabledError at import time, so a
  misconfigured tool.use path fails loud rather than silently falling
  back to unsanitized output
- Native client back-doors (jina_client.py, infoquest_client.py)
  stubbed too
- Native-tool tests quarantined under tests/_disabled_native/
  (collect_ignore_glob via local conftest.py)
- Sanitizer Layer 7 fix: only collapse horizontal whitespace, preserve
  newlines and tabs so list/table structure survives
- Hardened runtime config.yaml references only the searx-backed tools
- Factory overlay (backend/) kept in sync with deer-flow tree as a
  reference / source

See HARDENING.md for the full audit trail and verification steps.
This commit is contained in:
2026-04-12 14:23:57 +02:00
commit 6de0bf9f5b
889 changed files with 173052 additions and 0 deletions

38
deer-flow/.env.example Normal file
View File

@@ -0,0 +1,38 @@
# TAVILY API Key
TAVILY_API_KEY=your-tavily-api-key
# Jina API Key
JINA_API_KEY=your-jina-api-key
# InfoQuest API Key
INFOQUEST_API_KEY=your-infoquest-api-key
# CORS Origins (comma-separated) - e.g., http://localhost:3000,http://localhost:3001
# CORS_ORIGINS=http://localhost:3000
# Optional:
# FIRECRAWL_API_KEY=your-firecrawl-api-key
# VOLCENGINE_API_KEY=your-volcengine-api-key
# OPENAI_API_KEY=your-openai-api-key
# GEMINI_API_KEY=your-gemini-api-key
# DEEPSEEK_API_KEY=your-deepseek-api-key
# NOVITA_API_KEY=your-novita-api-key # OpenAI-compatible, see https://novita.ai
# MINIMAX_API_KEY=your-minimax-api-key # OpenAI-compatible, see https://platform.minimax.io
# VLLM_API_KEY=your-vllm-api-key # OpenAI-compatible
# FEISHU_APP_ID=your-feishu-app-id
# FEISHU_APP_SECRET=your-feishu-app-secret
# SLACK_BOT_TOKEN=your-slack-bot-token
# SLACK_APP_TOKEN=your-slack-app-token
# TELEGRAM_BOT_TOKEN=your-telegram-bot-token
# DISCORD_BOT_TOKEN=your-discord-bot-token
# Enable LangSmith to monitor and debug your LLM calls, agent runs, and tool executions.
# LANGSMITH_TRACING=true
# LANGSMITH_ENDPOINT=https://api.smith.langchain.com
# LANGSMITH_API_KEY=your-langsmith-api-key
# LANGSMITH_PROJECT=your-langsmith-project
# GitHub API Token
# GITHUB_TOKEN=your-github-token
# WECOM_BOT_ID=your-wecom-bot-id
# WECOM_BOT_SECRET=your-wecom-bot-secret