Initial commit: hardened DeerFlow factory

Vendored deer-flow upstream (bytedance/deer-flow) plus prompt-injection
hardening:

- New deerflow.security package: content_delimiter, html_cleaner,
  sanitizer (8 layers — invisible chars, control chars, symbols, NFC,
  PUA, tag chars, horizontal whitespace collapse with newline/tab
  preservation, length cap)
- New deerflow.community.searx package: web_search, web_fetch,
  image_search backed by a private SearX instance, every external
  string sanitized and wrapped in <<<EXTERNAL_UNTRUSTED_CONTENT>>>
  delimiters
- All native community web providers (ddg_search, tavily, exa,
  firecrawl, jina_ai, infoquest, image_search) replaced with hard-fail
  stubs that raise NativeWebToolDisabledError at import time, so a
  misconfigured tool.use path fails loud rather than silently falling
  back to unsanitized output
- Native client back-doors (jina_client.py, infoquest_client.py)
  stubbed too
- Native-tool tests quarantined under tests/_disabled_native/
  (collect_ignore_glob via local conftest.py)
- Sanitizer Layer 7 fix: only collapse horizontal whitespace, preserve
  newlines and tabs so list/table structure survives
- Hardened runtime config.yaml references only the searx-backed tools
- Factory overlay (backend/) kept in sync with deer-flow tree as a
  reference / source

See HARDENING.md for the full audit trail and verification steps.
This commit is contained in:
2026-04-12 14:23:57 +02:00
commit 6de0bf9f5b
889 changed files with 173052 additions and 0 deletions

View File

@@ -0,0 +1,96 @@
from abc import ABC, abstractmethod
from deerflow.config import get_app_config
from deerflow.reflection import resolve_class
from deerflow.sandbox.sandbox import Sandbox
class SandboxProvider(ABC):
"""Abstract base class for sandbox providers"""
@abstractmethod
def acquire(self, thread_id: str | None = None) -> str:
"""Acquire a sandbox environment and return its ID.
Returns:
The ID of the acquired sandbox environment.
"""
pass
@abstractmethod
def get(self, sandbox_id: str) -> Sandbox | None:
"""Get a sandbox environment by ID.
Args:
sandbox_id: The ID of the sandbox environment to retain.
"""
pass
@abstractmethod
def release(self, sandbox_id: str) -> None:
"""Release a sandbox environment.
Args:
sandbox_id: The ID of the sandbox environment to destroy.
"""
pass
_default_sandbox_provider: SandboxProvider | None = None
def get_sandbox_provider(**kwargs) -> SandboxProvider:
"""Get the sandbox provider singleton.
Returns a cached singleton instance. Use `reset_sandbox_provider()` to clear
the cache, or `shutdown_sandbox_provider()` to properly shutdown and clear.
Returns:
A sandbox provider instance.
"""
global _default_sandbox_provider
if _default_sandbox_provider is None:
config = get_app_config()
cls = resolve_class(config.sandbox.use, SandboxProvider)
_default_sandbox_provider = cls(**kwargs)
return _default_sandbox_provider
def reset_sandbox_provider() -> None:
"""Reset the sandbox provider singleton.
This clears the cached instance without calling shutdown.
The next call to `get_sandbox_provider()` will create a new instance.
Useful for testing or when switching configurations.
Note: If the provider has active sandboxes, they will be orphaned.
Use `shutdown_sandbox_provider()` for proper cleanup.
"""
global _default_sandbox_provider
_default_sandbox_provider = None
def shutdown_sandbox_provider() -> None:
"""Shutdown and reset the sandbox provider.
This properly shuts down the provider (releasing all sandboxes)
before clearing the singleton. Call this when the application
is shutting down or when you need to completely reset the sandbox system.
"""
global _default_sandbox_provider
if _default_sandbox_provider is not None:
if hasattr(_default_sandbox_provider, "shutdown"):
_default_sandbox_provider.shutdown()
_default_sandbox_provider = None
def set_sandbox_provider(provider: SandboxProvider) -> None:
"""Set a custom sandbox provider instance.
This allows injecting a custom or mock provider for testing purposes.
Args:
provider: The SandboxProvider instance to use.
"""
global _default_sandbox_provider
_default_sandbox_provider = provider