Files
DATA 6de0bf9f5b Initial commit: hardened DeerFlow factory
Vendored deer-flow upstream (bytedance/deer-flow) plus prompt-injection
hardening:

- New deerflow.security package: content_delimiter, html_cleaner,
  sanitizer (8 layers — invisible chars, control chars, symbols, NFC,
  PUA, tag chars, horizontal whitespace collapse with newline/tab
  preservation, length cap)
- New deerflow.community.searx package: web_search, web_fetch,
  image_search backed by a private SearX instance, every external
  string sanitized and wrapped in <<<EXTERNAL_UNTRUSTED_CONTENT>>>
  delimiters
- All native community web providers (ddg_search, tavily, exa,
  firecrawl, jina_ai, infoquest, image_search) replaced with hard-fail
  stubs that raise NativeWebToolDisabledError at import time, so a
  misconfigured tool.use path fails loud rather than silently falling
  back to unsanitized output
- Native client back-doors (jina_client.py, infoquest_client.py)
  stubbed too
- Native-tool tests quarantined under tests/_disabled_native/
  (collect_ignore_glob via local conftest.py)
- Sanitizer Layer 7 fix: only collapse horizontal whitespace, preserve
  newlines and tabs so list/table structure survives
- Hardened runtime config.yaml references only the searx-backed tools
- Factory overlay (backend/) kept in sync with deer-flow tree as a
  reference / source

See HARDENING.md for the full audit trail and verification steps.
2026-04-12 14:23:57 +02:00

1.7 KiB

TODO List

Completed Features

  • Launch the sandbox only after the first file system or bash tool is called
  • Add Clarification Process for the whole process
  • Implement Context Summarization Mechanism to avoid context explosion
  • Integrate MCP (Model Context Protocol) for extensible tools
  • Add file upload support with automatic document conversion
  • Implement automatic thread title generation
  • Add Plan Mode with TodoList middleware
  • Add vision model support with ViewImageMiddleware
  • Skills system with SKILL.md format

Planned Features

  • Pooling the sandbox resources to reduce the number of sandbox containers
  • Add authentication/authorization layer
  • Implement rate limiting
  • Add metrics and monitoring
  • Support for more document formats in upload
  • Skill marketplace / remote skill installation
  • Optimize async concurrency in agent hot path (IM channels multi-task scenario)
    • Replace time.sleep(5) with asyncio.sleep() in packages/harness/deerflow/tools/builtins/task_tool.py (subagent polling)
    • Replace subprocess.run() with asyncio.create_subprocess_shell() in packages/harness/deerflow/sandbox/local/local_sandbox.py
    • Replace sync requests with httpx.AsyncClient in community tools (tavily, jina_ai, firecrawl, infoquest, image_search)
    • Replace sync model.invoke() with async model.ainvoke() in title_middleware and memory updater
    • Consider asyncio.to_thread() wrapper for remaining blocking file I/O
    • For production: use langgraph up (multi-worker) instead of langgraph dev (single-worker)

Resolved Issues

  • Make sure that no duplicated files in state.artifacts
  • Long thinking but with empty content (answer inside thinking process)