Initial commit: hardened DeerFlow factory
Vendored deer-flow upstream (bytedance/deer-flow) plus prompt-injection hardening: - New deerflow.security package: content_delimiter, html_cleaner, sanitizer (8 layers — invisible chars, control chars, symbols, NFC, PUA, tag chars, horizontal whitespace collapse with newline/tab preservation, length cap) - New deerflow.community.searx package: web_search, web_fetch, image_search backed by a private SearX instance, every external string sanitized and wrapped in <<<EXTERNAL_UNTRUSTED_CONTENT>>> delimiters - All native community web providers (ddg_search, tavily, exa, firecrawl, jina_ai, infoquest, image_search) replaced with hard-fail stubs that raise NativeWebToolDisabledError at import time, so a misconfigured tool.use path fails loud rather than silently falling back to unsanitized output - Native client back-doors (jina_client.py, infoquest_client.py) stubbed too - Native-tool tests quarantined under tests/_disabled_native/ (collect_ignore_glob via local conftest.py) - Sanitizer Layer 7 fix: only collapse horizontal whitespace, preserve newlines and tabs so list/table structure survives - Hardened runtime config.yaml references only the searx-backed tools - Factory overlay (backend/) kept in sync with deer-flow tree as a reference / source See HARDENING.md for the full audit trail and verification steps.
This commit is contained in:
@@ -0,0 +1,83 @@
|
||||
from pydantic import BaseModel, ConfigDict, Field
|
||||
|
||||
|
||||
class VolumeMountConfig(BaseModel):
|
||||
"""Configuration for a volume mount."""
|
||||
|
||||
host_path: str = Field(..., description="Path on the host machine")
|
||||
container_path: str = Field(..., description="Path inside the container")
|
||||
read_only: bool = Field(default=False, description="Whether the mount is read-only")
|
||||
|
||||
|
||||
class SandboxConfig(BaseModel):
|
||||
"""Config section for a sandbox.
|
||||
|
||||
Common options:
|
||||
use: Class path of the sandbox provider (required)
|
||||
allow_host_bash: Enable host-side bash execution for LocalSandboxProvider.
|
||||
Dangerous and intended only for fully trusted local workflows.
|
||||
|
||||
AioSandboxProvider specific options:
|
||||
image: Docker image to use (default: enterprise-public-cn-beijing.cr.volces.com/vefaas-public/all-in-one-sandbox:latest)
|
||||
port: Base port for sandbox containers (default: 8080)
|
||||
replicas: Maximum number of concurrent sandbox containers (default: 3). When the limit is reached the least-recently-used sandbox is evicted to make room.
|
||||
container_prefix: Prefix for container names (default: deer-flow-sandbox)
|
||||
idle_timeout: Idle timeout in seconds before sandbox is released (default: 600 = 10 minutes). Set to 0 to disable.
|
||||
mounts: List of volume mounts to share directories with the container
|
||||
environment: Environment variables to inject into the container (values starting with $ are resolved from host env)
|
||||
"""
|
||||
|
||||
use: str = Field(
|
||||
...,
|
||||
description="Class path of the sandbox provider (e.g. deerflow.sandbox.local:LocalSandboxProvider)",
|
||||
)
|
||||
allow_host_bash: bool = Field(
|
||||
default=False,
|
||||
description="Allow the bash tool to execute directly on the host when using LocalSandboxProvider. Dangerous; intended only for fully trusted local environments.",
|
||||
)
|
||||
image: str | None = Field(
|
||||
default=None,
|
||||
description="Docker image to use for the sandbox container",
|
||||
)
|
||||
port: int | None = Field(
|
||||
default=None,
|
||||
description="Base port for sandbox containers",
|
||||
)
|
||||
replicas: int | None = Field(
|
||||
default=None,
|
||||
description="Maximum number of concurrent sandbox containers (default: 3). When the limit is reached the least-recently-used sandbox is evicted to make room.",
|
||||
)
|
||||
container_prefix: str | None = Field(
|
||||
default=None,
|
||||
description="Prefix for container names",
|
||||
)
|
||||
idle_timeout: int | None = Field(
|
||||
default=None,
|
||||
description="Idle timeout in seconds before sandbox is released (default: 600 = 10 minutes). Set to 0 to disable.",
|
||||
)
|
||||
mounts: list[VolumeMountConfig] = Field(
|
||||
default_factory=list,
|
||||
description="List of volume mounts to share directories between host and container",
|
||||
)
|
||||
environment: dict[str, str] = Field(
|
||||
default_factory=dict,
|
||||
description="Environment variables to inject into the sandbox container. Values starting with $ will be resolved from host environment variables.",
|
||||
)
|
||||
|
||||
bash_output_max_chars: int = Field(
|
||||
default=20000,
|
||||
ge=0,
|
||||
description="Maximum characters to keep from bash tool output. Output exceeding this limit is middle-truncated (head + tail), preserving the first and last half. Set to 0 to disable truncation.",
|
||||
)
|
||||
read_file_output_max_chars: int = Field(
|
||||
default=50000,
|
||||
ge=0,
|
||||
description="Maximum characters to keep from read_file tool output. Output exceeding this limit is head-truncated. Set to 0 to disable truncation.",
|
||||
)
|
||||
ls_output_max_chars: int = Field(
|
||||
default=20000,
|
||||
ge=0,
|
||||
description="Maximum characters to keep from ls tool output. Output exceeding this limit is head-truncated. Set to 0 to disable truncation.",
|
||||
)
|
||||
|
||||
model_config = ConfigDict(extra="allow")
|
||||
Reference in New Issue
Block a user