Initial commit: hardened DeerFlow factory

Vendored deer-flow upstream (bytedance/deer-flow) plus prompt-injection hardening: - New deerflow.security package: content_delimiter, html_cleaner, sanitizer (8 layers — invisible chars, control chars, symbols, NFC, PUA, tag chars, horizontal whitespace collapse with newline/tab preservation, length cap) - New deerflow.community.searx package: web_search, web_fetch, image_search backed by a private SearX instance, every external string sanitized and wrapped in <<<EXTERNAL_UNTRUSTED_CONTENT>>> delimiters - All native community web providers (ddg_search, tavily, exa, firecrawl, jina_ai, infoquest, image_search) replaced with hard-fail stubs that raise NativeWebToolDisabledError at import time, so a misconfigured tool.use path fails loud rather than silently falling back to unsanitized output - Native client back-doors (jina_client.py, infoquest_client.py) stubbed too - Native-tool tests quarantined under tests/_disabled_native/ (collect_ignore_glob via local conftest.py) - Sanitizer Layer 7 fix: only collapse horizontal whitespace, preserve newlines and tabs so list/table structure survives - Hardened runtime config.yaml references only the searx-backed tools - Factory overlay (backend/) kept in sync with deer-flow tree as a reference / source See HARDENING.md for the full audit trail and verification steps.
2026-04-12 14:23:57 +02:00
commit 6de0bf9f5b
889 changed files with 173052 additions and 0 deletions
--- a/deer-flow/backend/docs/API.md
+++ b/deer-flow/backend/docs/API.md
@@ -0,0 +1,655 @@
+# API Reference
+
+This document provides a complete reference for the DeerFlow backend APIs.
+
+## Overview
+
+DeerFlow backend exposes two sets of APIs:
+
+1. **LangGraph API** - Agent interactions, threads, and streaming (`/api/langgraph/*`)
+2. **Gateway API** - Models, MCP, skills, uploads, and artifacts (`/api/*`)
+
+All APIs are accessed through the Nginx reverse proxy at port 2026.
+
+## LangGraph API
+
+Base URL: `/api/langgraph`
+
+The LangGraph API is provided by the LangGraph server and follows the LangGraph SDK conventions.
+
+### Threads
+
+#### Create Thread
+
+```http
+POST /api/langgraph/threads
+Content-Type: application/json
+```
+
+**Request Body:**
+```json
+{
+  "metadata": {}
+}
+```
+
+**Response:**
+```json
+{
+  "thread_id": "abc123",
+  "created_at": "2024-01-15T10:30:00Z",
+  "metadata": {}
+}
+```
+
+#### Get Thread State
+
+```http
+GET /api/langgraph/threads/{thread_id}/state
+```
+
+**Response:**
+```json
+{
+  "values": {
+    "messages": [...],
+    "sandbox": {...},
+    "artifacts": [...],
+    "thread_data": {...},
+    "title": "Conversation Title"
+  },
+  "next": [],
+  "config": {...}
+}
+```
+
+### Runs
+
+#### Create Run
+
+Execute the agent with input.
+
+```http
+POST /api/langgraph/threads/{thread_id}/runs
+Content-Type: application/json
+```
+
+**Request Body:**
+```json
+{
+  "input": {
+    "messages": [
+      {
+        "role": "user",
+        "content": "Hello, can you help me?"
+      }
+    ]
+  },
+  "config": {
+    "recursion_limit": 100,
+    "configurable": {
+      "model_name": "gpt-4",
+      "thinking_enabled": false,
+      "is_plan_mode": false
+    }
+  },
+  "stream_mode": ["values", "messages-tuple", "custom"]
+}
+```
+
+**Stream Mode Compatibility:**
+- Use: `values`, `messages-tuple`, `custom`, `updates`, `events`, `debug`, `tasks`, `checkpoints`
+- Do not use: `tools` (deprecated/invalid in current `langgraph-api` and will trigger schema validation errors)
+
+**Recursion Limit:**
+
+`config.recursion_limit` caps the number of graph steps LangGraph will execute
+in a single run. The `/api/langgraph/*` endpoints go straight to the LangGraph
+server and therefore inherit LangGraph's native default of **25**, which is
+too low for plan-mode or subagent-heavy runs — the agent typically errors out
+with `GraphRecursionError` after the first round of subagent results comes
+back, before the lead agent can synthesize the final answer.
+
+DeerFlow's own Gateway and IM-channel paths mitigate this by defaulting to
+`100` in `build_run_config` (see `backend/app/gateway/services.py`), but
+clients calling the LangGraph API directly must set `recursion_limit`
+explicitly in the request body. `100` matches the Gateway default and is a
+safe starting point; increase it if you run deeply nested subagent graphs.
+
+**Configurable Options:**
+- `model_name` (string): Override the default model
+- `thinking_enabled` (boolean): Enable extended thinking for supported models
+- `is_plan_mode` (boolean): Enable TodoList middleware for task tracking
+
+**Response:** Server-Sent Events (SSE) stream
+
+```
+event: values
+data: {"messages": [...], "title": "..."}
+
+event: messages
+data: {"content": "Hello! I'd be happy to help.", "role": "assistant"}
+
+event: end
+data: {}
+```
+
+#### Get Run History
+
+```http
+GET /api/langgraph/threads/{thread_id}/runs
+```
+
+**Response:**
+```json
+{
+  "runs": [
+    {
+      "run_id": "run123",
+      "status": "success",
+      "created_at": "2024-01-15T10:30:00Z"
+    }
+  ]
+}
+```
+
+#### Stream Run
+
+Stream responses in real-time.
+
+```http
+POST /api/langgraph/threads/{thread_id}/runs/stream
+Content-Type: application/json
+```
+
+Same request body as Create Run. Returns SSE stream.
+
+---
+
+## Gateway API
+
+Base URL: `/api`
+
+### Models
+
+#### List Models
+
+Get all available LLM models from configuration.
+
+```http
+GET /api/models
+```
+
+**Response:**
+```json
+{
+  "models": [
+    {
+      "name": "gpt-4",
+      "display_name": "GPT-4",
+      "supports_thinking": false,
+      "supports_vision": true
+    },
+    {
+      "name": "claude-3-opus",
+      "display_name": "Claude 3 Opus",
+      "supports_thinking": false,
+      "supports_vision": true
+    },
+    {
+      "name": "deepseek-v3",
+      "display_name": "DeepSeek V3",
+      "supports_thinking": true,
+      "supports_vision": false
+    }
+  ]
+}
+```
+
+#### Get Model Details
+
+```http
+GET /api/models/{model_name}
+```
+
+**Response:**
+```json
+{
+  "name": "gpt-4",
+  "display_name": "GPT-4",
+  "model": "gpt-4",
+  "max_tokens": 4096,
+  "supports_thinking": false,
+  "supports_vision": true
+}
+```
+
+### MCP Configuration
+
+#### Get MCP Config
+
+Get current MCP server configurations.
+
+```http
+GET /api/mcp/config
+```
+
+**Response:**
+```json
+{
+  "mcpServers": {
+    "github": {
+      "enabled": true,
+      "type": "stdio",
+      "command": "npx",
+      "args": ["-y", "@modelcontextprotocol/server-github"],
+      "env": {
+        "GITHUB_TOKEN": "***"
+      },
+      "description": "GitHub operations"
+    },
+    "filesystem": {
+      "enabled": false,
+      "type": "stdio",
+      "command": "npx",
+      "args": ["-y", "@modelcontextprotocol/server-filesystem"],
+      "description": "File system access"
+    }
+  }
+}
+```
+
+#### Update MCP Config
+
+Update MCP server configurations.
+
+```http
+PUT /api/mcp/config
+Content-Type: application/json
+```
+
+**Request Body:**
+```json
+{
+  "mcpServers": {
+    "github": {
+      "enabled": true,
+      "type": "stdio",
+      "command": "npx",
+      "args": ["-y", "@modelcontextprotocol/server-github"],
+      "env": {
+        "GITHUB_TOKEN": "$GITHUB_TOKEN"
+      },
+      "description": "GitHub operations"
+    }
+  }
+}
+```
+
+**Response:**
+```json
+{
+  "success": true,
+  "message": "MCP configuration updated"
+}
+```
+
+### Skills
+
+#### List Skills
+
+Get all available skills.
+
+```http
+GET /api/skills
+```
+
+**Response:**
+```json
+{
+  "skills": [
+    {
+      "name": "pdf-processing",
+      "display_name": "PDF Processing",
+      "description": "Handle PDF documents efficiently",
+      "enabled": true,
+      "license": "MIT",
+      "path": "public/pdf-processing"
+    },
+    {
+      "name": "frontend-design",
+      "display_name": "Frontend Design",
+      "description": "Design and build frontend interfaces",
+      "enabled": false,
+      "license": "MIT",
+      "path": "public/frontend-design"
+    }
+  ]
+}
+```
+
+#### Get Skill Details
+
+```http
+GET /api/skills/{skill_name}
+```
+
+**Response:**
+```json
+{
+  "name": "pdf-processing",
+  "display_name": "PDF Processing",
+  "description": "Handle PDF documents efficiently",
+  "enabled": true,
+  "license": "MIT",
+  "path": "public/pdf-processing",
+  "allowed_tools": ["read_file", "write_file", "bash"],
+  "content": "# PDF Processing\n\nInstructions for the agent..."
+}
+```
+
+#### Enable Skill
+
+```http
+POST /api/skills/{skill_name}/enable
+```
+
+**Response:**
+```json
+{
+  "success": true,
+  "message": "Skill 'pdf-processing' enabled"
+}
+```
+
+#### Disable Skill
+
+```http
+POST /api/skills/{skill_name}/disable
+```
+
+**Response:**
+```json
+{
+  "success": true,
+  "message": "Skill 'pdf-processing' disabled"
+}
+```
+
+#### Install Skill
+
+Install a skill from a `.skill` file.
+
+```http
+POST /api/skills/install
+Content-Type: multipart/form-data
+```
+
+**Request Body:**
+- `file`: The `.skill` file to install
+
+**Response:**
+```json
+{
+  "success": true,
+  "message": "Skill 'my-skill' installed successfully",
+  "skill": {
+    "name": "my-skill",
+    "display_name": "My Skill",
+    "path": "custom/my-skill"
+  }
+}
+```
+
+### File Uploads
+
+#### Upload Files
+
+Upload one or more files to a thread.
+
+```http
+POST /api/threads/{thread_id}/uploads
+Content-Type: multipart/form-data
+```
+
+**Request Body:**
+- `files`: One or more files to upload
+
+**Response:**
+```json
+{
+  "success": true,
+  "files": [
+    {
+      "filename": "document.pdf",
+      "size": 1234567,
+      "path": ".deer-flow/threads/abc123/user-data/uploads/document.pdf",
+      "virtual_path": "/mnt/user-data/uploads/document.pdf",
+      "artifact_url": "/api/threads/abc123/artifacts/mnt/user-data/uploads/document.pdf",
+      "markdown_file": "document.md",
+      "markdown_path": ".deer-flow/threads/abc123/user-data/uploads/document.md",
+      "markdown_virtual_path": "/mnt/user-data/uploads/document.md",
+      "markdown_artifact_url": "/api/threads/abc123/artifacts/mnt/user-data/uploads/document.md"
+    }
+  ],
+  "message": "Successfully uploaded 1 file(s)"
+}
+```
+
+**Supported Document Formats** (auto-converted to Markdown):
+- PDF (`.pdf`)
+- PowerPoint (`.ppt`, `.pptx`)
+- Excel (`.xls`, `.xlsx`)
+- Word (`.doc`, `.docx`)
+
+#### List Uploaded Files
+
+```http
+GET /api/threads/{thread_id}/uploads/list
+```
+
+**Response:**
+```json
+{
+  "files": [
+    {
+      "filename": "document.pdf",
+      "size": 1234567,
+      "path": ".deer-flow/threads/abc123/user-data/uploads/document.pdf",
+      "virtual_path": "/mnt/user-data/uploads/document.pdf",
+      "artifact_url": "/api/threads/abc123/artifacts/mnt/user-data/uploads/document.pdf",
+      "extension": ".pdf",
+      "modified": 1705997600.0
+    }
+  ],
+  "count": 1
+}
+```
+
+#### Delete File
+
+```http
+DELETE /api/threads/{thread_id}/uploads/{filename}
+```
+
+**Response:**
+```json
+{
+  "success": true,
+  "message": "Deleted document.pdf"
+}
+```
+
+### Thread Cleanup
+
+Remove DeerFlow-managed local thread files under `.deer-flow/threads/{thread_id}` after the LangGraph thread itself has been deleted.
+
+```http
+DELETE /api/threads/{thread_id}
+```
+
+**Response:**
+```json
+{
+  "success": true,
+  "message": "Deleted local thread data for abc123"
+}
+```
+
+**Error behavior:**
+- `422` for invalid thread IDs
+- `500` returns a generic `{"detail": "Failed to delete local thread data."}` response while full exception details stay in server logs
+
+### Artifacts
+
+#### Get Artifact
+
+Download or view an artifact generated by the agent.
+
+```http
+GET /api/threads/{thread_id}/artifacts/{path}
+```
+
+**Path Examples:**
+- `/api/threads/abc123/artifacts/mnt/user-data/outputs/result.txt`
+- `/api/threads/abc123/artifacts/mnt/user-data/uploads/document.pdf`
+
+**Query Parameters:**
+- `download` (boolean): If `true`, force download with Content-Disposition header
+
+**Response:** File content with appropriate Content-Type
+
+---
+
+## Error Responses
+
+All APIs return errors in a consistent format:
+
+```json
+{
+  "detail": "Error message describing what went wrong"
+}
+```
+
+**HTTP Status Codes:**
+- `400` - Bad Request: Invalid input
+- `404` - Not Found: Resource not found
+- `422` - Validation Error: Request validation failed
+- `500` - Internal Server Error: Server-side error
+
+---
+
+## Authentication
+
+Currently, DeerFlow does not implement authentication. All APIs are accessible without credentials.
+
+Note: This is about DeerFlow API authentication. MCP outbound connections can still use OAuth for configured HTTP/SSE MCP servers.
+
+For production deployments, it is recommended to:
+1. Use Nginx for basic auth or OAuth integration
+2. Deploy behind a VPN or private network
+3. Implement custom authentication middleware
+
+---
+
+## Rate Limiting
+
+No rate limiting is implemented by default. For production deployments, configure rate limiting in Nginx:
+
+```nginx
+limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s;
+
+location /api/ {
+    limit_req zone=api burst=20 nodelay;
+    proxy_pass http://backend;
+}
+```
+
+---
+
+## WebSocket Support
+
+The LangGraph server supports WebSocket connections for real-time streaming. Connect to:
+
+```
+ws://localhost:2026/api/langgraph/threads/{thread_id}/runs/stream
+```
+
+---
+
+## SDK Usage
+
+### Python (LangGraph SDK)
+
+```python
+from langgraph_sdk import get_client
+
+client = get_client(url="http://localhost:2026/api/langgraph")
+
+# Create thread
+thread = await client.threads.create()
+
+# Run agent
+async for event in client.runs.stream(
+    thread["thread_id"],
+    "lead_agent",
+    input={"messages": [{"role": "user", "content": "Hello"}]},
+    config={"configurable": {"model_name": "gpt-4"}},
+    stream_mode=["values", "messages-tuple", "custom"],
+):
+    print(event)
+```
+
+### JavaScript/TypeScript
+
+```typescript
+// Using fetch for Gateway API
+const response = await fetch('/api/models');
+const data = await response.json();
+console.log(data.models);
+
+// Using EventSource for streaming
+const eventSource = new EventSource(
+  `/api/langgraph/threads/${threadId}/runs/stream`
+);
+eventSource.onmessage = (event) => {
+  console.log(JSON.parse(event.data));
+};
+```
+
+### cURL Examples
+
+```bash
+# List models
+curl http://localhost:2026/api/models
+
+# Get MCP config
+curl http://localhost:2026/api/mcp/config
+
+# Upload file
+curl -X POST http://localhost:2026/api/threads/abc123/uploads \
+  -F "files=@document.pdf"
+
+# Enable skill
+curl -X POST http://localhost:2026/api/skills/pdf-processing/enable
+
+# Create thread and run agent
+curl -X POST http://localhost:2026/api/langgraph/threads \
+  -H "Content-Type: application/json" \
+  -d '{}'
+
+curl -X POST http://localhost:2026/api/langgraph/threads/abc123/runs \
+  -H "Content-Type: application/json" \
+  -d '{
+    "input": {"messages": [{"role": "user", "content": "Hello"}]},
+    "config": {
+      "recursion_limit": 100,
+      "configurable": {"model_name": "gpt-4"}
+    }
+  }'
+```
+
+> The `/api/langgraph/*` endpoints bypass DeerFlow's Gateway and inherit
+> LangGraph's native `recursion_limit` default of 25, which is too low for
+> plan-mode or subagent runs. Set `config.recursion_limit` explicitly — see
+> the [Create Run](#create-run) section for details.
--- a/deer-flow/backend/docs/APPLE_CONTAINER.md
+++ b/deer-flow/backend/docs/APPLE_CONTAINER.md
@@ -0,0 +1,238 @@
+# Apple Container Support
+
+DeerFlow now supports Apple Container as the preferred container runtime on macOS, with automatic fallback to Docker.
+
+## Overview
+
+Starting with this version, DeerFlow automatically detects and uses Apple Container on macOS when available, falling back to Docker when:
+- Apple Container is not installed
+- Running on non-macOS platforms
+
+This provides better performance on Apple Silicon Macs while maintaining compatibility across all platforms.
+
+## Benefits
+
+### On Apple Silicon Macs with Apple Container:
+- **Better Performance**: Native ARM64 execution without Rosetta 2 translation
+- **Lower Resource Usage**: Lighter weight than Docker Desktop
+- **Native Integration**: Uses macOS Virtualization.framework
+
+### Fallback to Docker:
+- Full backward compatibility
+- Works on all platforms (macOS, Linux, Windows)
+- No configuration changes needed
+
+## Requirements
+
+### For Apple Container (macOS only):
+- macOS 15.0 or later
+- Apple Silicon (M1/M2/M3/M4)
+- Apple Container CLI installed
+
+### Installation:
+```bash
+# Download from GitHub releases
+# https://github.com/apple/container/releases
+
+# Verify installation
+container --version
+
+# Start the service
+container system start
+```
+
+### For Docker (all platforms):
+- Docker Desktop or Docker Engine
+
+## How It Works
+
+### Automatic Detection
+
+The `AioSandboxProvider` automatically detects the available container runtime:
+
+1. On macOS: Try `container --version`
+   - Success → Use Apple Container
+   - Failure → Fall back to Docker
+
+2. On other platforms: Use Docker directly
+
+### Runtime Differences
+
+Both runtimes use nearly identical command syntax:
+
+**Container Startup:**
+```bash
+# Apple Container
+container run --rm -d -p 8080:8080 -v /host:/container -e KEY=value image
+
+# Docker
+docker run --rm -d -p 8080:8080 -v /host:/container -e KEY=value image
+```
+
+**Container Cleanup:**
+```bash
+# Apple Container (with --rm flag)
+container stop <id>  # Auto-removes due to --rm
+
+# Docker (with --rm flag)
+docker stop <id>     # Auto-removes due to --rm
+```
+
+### Implementation Details
+
+The implementation is in `backend/packages/harness/deerflow/community/aio_sandbox/aio_sandbox_provider.py`:
+
+- `_detect_container_runtime()`: Detects available runtime at startup
+- `_start_container()`: Uses detected runtime, skips Docker-specific options for Apple Container
+- `_stop_container()`: Uses appropriate stop command for the runtime
+
+## Configuration
+
+No configuration changes are needed! The system works automatically.
+
+However, you can verify the runtime in use by checking the logs:
+
+```
+INFO:deerflow.community.aio_sandbox.aio_sandbox_provider:Detected Apple Container: container version 0.1.0
+INFO:deerflow.community.aio_sandbox.aio_sandbox_provider:Starting sandbox container using container: ...
+```
+
+Or for Docker:
+```
+INFO:deerflow.community.aio_sandbox.aio_sandbox_provider:Apple Container not available, falling back to Docker
+INFO:deerflow.community.aio_sandbox.aio_sandbox_provider:Starting sandbox container using docker: ...
+```
+
+## Container Images
+
+Both runtimes use OCI-compatible images. The default image works with both:
+
+```yaml
+sandbox:
+  use: deerflow.community.aio_sandbox:AioSandboxProvider
+  image: enterprise-public-cn-beijing.cr.volces.com/vefaas-public/all-in-one-sandbox:latest  # Default image
+```
+
+Make sure your images are available for the appropriate architecture:
+- ARM64 for Apple Container on Apple Silicon
+- AMD64 for Docker on Intel Macs
+- Multi-arch images work on both
+
+### Pre-pulling Images (Recommended)
+
+**Important**: Container images are typically large (500MB+) and are pulled on first use, which can cause a long wait time without clear feedback.
+
+**Best Practice**: Pre-pull the image during setup:
+
+```bash
+# From project root
+make setup-sandbox
+```
+
+This command will:
+1. Read the configured image from `config.yaml` (or use default)
+2. Detect available runtime (Apple Container or Docker)
+3. Pull the image with progress indication
+4. Verify the image is ready for use
+
+**Manual pre-pull**:
+
+```bash
+# Using Apple Container
+container image pull enterprise-public-cn-beijing.cr.volces.com/vefaas-public/all-in-one-sandbox:latest
+
+# Using Docker
+docker pull enterprise-public-cn-beijing.cr.volces.com/vefaas-public/all-in-one-sandbox:latest
+```
+
+If you skip pre-pulling, the image will be automatically pulled on first agent execution, which may take several minutes depending on your network speed.
+
+## Cleanup Scripts
+
+The project includes a unified cleanup script that handles both runtimes:
+
+**Script:** `scripts/cleanup-containers.sh`
+
+**Usage:**
+```bash
+# Clean up all DeerFlow sandbox containers
+./scripts/cleanup-containers.sh deer-flow-sandbox
+
+# Custom prefix
+./scripts/cleanup-containers.sh my-prefix
+```
+
+**Makefile Integration:**
+
+All cleanup commands in `Makefile` automatically handle both runtimes:
+```bash
+make stop   # Stops all services and cleans up containers
+make clean  # Full cleanup including logs
+```
+
+## Testing
+
+Test the container runtime detection:
+
+```bash
+cd backend
+python test_container_runtime.py
+```
+
+This will:
+1. Detect the available runtime
+2. Optionally start a test container
+3. Verify connectivity
+4. Clean up
+
+## Troubleshooting
+
+### Apple Container not detected on macOS
+
+1. Check if installed:
+   ```bash
+   which container
+   container --version
+   ```
+
+2. Check if service is running:
+   ```bash
+   container system start
+   ```
+
+3. Check logs for detection:
+   ```bash
+   # Look for detection message in application logs
+   grep "container runtime" logs/*.log
+   ```
+
+### Containers not cleaning up
+
+1. Manually check running containers:
+   ```bash
+   # Apple Container
+   container list
+
+   # Docker
+   docker ps
+   ```
+
+2. Run cleanup script manually:
+   ```bash
+   ./scripts/cleanup-containers.sh deer-flow-sandbox
+   ```
+
+### Performance issues
+
+- Apple Container should be faster on Apple Silicon
+- If experiencing issues, you can force Docker by temporarily renaming the `container` command:
+   ```bash
+   # Temporary workaround - not recommended for permanent use
+   sudo mv /opt/homebrew/bin/container /opt/homebrew/bin/container.bak
+   ```
+
+## References
+
+- [Apple Container GitHub](https://github.com/apple/container)
+- [Apple Container Documentation](https://github.com/apple/container/blob/main/docs/)
+- [OCI Image Spec](https://github.com/opencontainers/image-spec)
--- a/deer-flow/backend/docs/ARCHITECTURE.md
+++ b/deer-flow/backend/docs/ARCHITECTURE.md
@@ -0,0 +1,484 @@
+# Architecture Overview
+
+This document provides a comprehensive overview of the DeerFlow backend architecture.
+
+## System Architecture
+
+```
+┌──────────────────────────────────────────────────────────────────────────┐
+│                              Client (Browser)                             │
+└─────────────────────────────────┬────────────────────────────────────────┘
+                                  │
+                                  ▼
+┌──────────────────────────────────────────────────────────────────────────┐
+│                          Nginx (Port 2026)                               │
+│                    Unified Reverse Proxy Entry Point                      │
+│  ┌────────────────────────────────────────────────────────────────────┐  │
+│  │  /api/langgraph/*  →  LangGraph Server (2024)                      │  │
+│  │  /api/*            →  Gateway API (8001)                           │  │
+│  │  /*                →  Frontend (3000)                               │  │
+│  └────────────────────────────────────────────────────────────────────┘  │
+└─────────────────────────────────┬────────────────────────────────────────┘
+                                  │
+          ┌───────────────────────┼───────────────────────┐
+          │                       │                       │
+          ▼                       ▼                       ▼
+┌─────────────────────┐ ┌─────────────────────┐ ┌─────────────────────┐
+│   LangGraph Server  │ │    Gateway API      │ │     Frontend        │
+│     (Port 2024)     │ │    (Port 8001)      │ │    (Port 3000)      │
+│                     │ │                     │ │                     │
+│  - Agent Runtime    │ │  - Models API       │ │  - Next.js App      │
+│  - Thread Mgmt      │ │  - MCP Config       │ │  - React UI         │
+│  - SSE Streaming    │ │  - Skills Mgmt      │ │  - Chat Interface   │
+│  - Checkpointing    │ │  - File Uploads     │ │                     │
+│                     │ │  - Thread Cleanup   │ │                     │
+│                     │ │  - Artifacts        │ │                     │
+└─────────────────────┘ └─────────────────────┘ └─────────────────────┘
+          │                       │
+          │     ┌─────────────────┘
+          │     │
+          ▼     ▼
+┌──────────────────────────────────────────────────────────────────────────┐
+│                         Shared Configuration                              │
+│  ┌─────────────────────────┐  ┌────────────────────────────────────────┐ │
+│  │      config.yaml        │  │      extensions_config.json            │ │
+│  │  - Models               │  │  - MCP Servers                         │ │
+│  │  - Tools                │  │  - Skills State                        │ │
+│  │  - Sandbox              │  │                                        │ │
+│  │  - Summarization        │  │                                        │ │
+│  └─────────────────────────┘  └────────────────────────────────────────┘ │
+└──────────────────────────────────────────────────────────────────────────┘
+```
+
+## Component Details
+
+### LangGraph Server
+
+The LangGraph server is the core agent runtime, built on LangGraph for robust multi-agent workflow orchestration.
+
+**Entry Point**: `packages/harness/deerflow/agents/lead_agent/agent.py:make_lead_agent`
+
+**Key Responsibilities**:
+- Agent creation and configuration
+- Thread state management
+- Middleware chain execution
+- Tool execution orchestration
+- SSE streaming for real-time responses
+
+**Configuration**: `langgraph.json`
+
+```json
+{
+  "agent": {
+    "type": "agent",
+    "path": "deerflow.agents:make_lead_agent"
+  }
+}
+```
+
+### Gateway API
+
+FastAPI application providing REST endpoints for non-agent operations.
+
+**Entry Point**: `app/gateway/app.py`
+
+**Routers**:
+- `models.py` - `/api/models` - Model listing and details
+- `mcp.py` - `/api/mcp` - MCP server configuration
+- `skills.py` - `/api/skills` - Skills management
+- `uploads.py` - `/api/threads/{id}/uploads` - File upload
+- `threads.py` - `/api/threads/{id}` - Local DeerFlow thread data cleanup after LangGraph deletion
+- `artifacts.py` - `/api/threads/{id}/artifacts` - Artifact serving
+- `suggestions.py` - `/api/threads/{id}/suggestions` - Follow-up suggestion generation
+
+The web conversation delete flow is now split across both backend surfaces: LangGraph handles `DELETE /api/langgraph/threads/{thread_id}` for thread state, then the Gateway `threads.py` router removes DeerFlow-managed filesystem data via `Paths.delete_thread_dir()`.
+
+### Agent Architecture
+
+```
+┌─────────────────────────────────────────────────────────────────────────┐
+│                           make_lead_agent(config)                        │
+└────────────────────────────────────┬────────────────────────────────────┘
+                                     │
+                                     ▼
+┌─────────────────────────────────────────────────────────────────────────┐
+│                            Middleware Chain                              │
+│  ┌──────────────────────────────────────────────────────────────────┐   │
+│  │ 1. ThreadDataMiddleware  - Initialize workspace/uploads/outputs  │   │
+│  │ 2. UploadsMiddleware     - Process uploaded files               │   │
+│  │ 3. SandboxMiddleware     - Acquire sandbox environment          │   │
+│  │ 4. SummarizationMiddleware - Context reduction (if enabled)     │   │
+│  │ 5. TitleMiddleware       - Auto-generate titles                 │   │
+│  │ 6. TodoListMiddleware    - Task tracking (if plan_mode)         │   │
+│  │ 7. ViewImageMiddleware   - Vision model support                 │   │
+│  │ 8. ClarificationMiddleware - Handle clarifications              │   │
+│  └──────────────────────────────────────────────────────────────────┘   │
+└────────────────────────────────────┬────────────────────────────────────┘
+                                     │
+                                     ▼
+┌─────────────────────────────────────────────────────────────────────────┐
+│                              Agent Core                                  │
+│  ┌──────────────────┐  ┌──────────────────┐  ┌──────────────────────┐   │
+│  │      Model       │  │      Tools       │  │    System Prompt     │   │
+│  │  (from factory)  │  │  (configured +   │  │  (with skills)       │   │
+│  │                  │  │   MCP + builtin) │  │                      │   │
+│  └──────────────────┘  └──────────────────┘  └──────────────────────┘   │
+└─────────────────────────────────────────────────────────────────────────┘
+```
+
+### Thread State
+
+The `ThreadState` extends LangGraph's `AgentState` with additional fields:
+
+```python
+class ThreadState(AgentState):
+    # Core state from AgentState
+    messages: list[BaseMessage]
+
+    # DeerFlow extensions
+    sandbox: dict             # Sandbox environment info
+    artifacts: list[str]      # Generated file paths
+    thread_data: dict         # {workspace, uploads, outputs} paths
+    title: str | None         # Auto-generated conversation title
+    todos: list[dict]         # Task tracking (plan mode)
+    viewed_images: dict       # Vision model image data
+```
+
+### Sandbox System
+
+```
+┌─────────────────────────────────────────────────────────────────────────┐
+│                           Sandbox Architecture                           │
+└─────────────────────────────────────────────────────────────────────────┘
+
+                      ┌─────────────────────────┐
+                      │    SandboxProvider      │ (Abstract)
+                      │  - acquire()            │
+                      │  - get()                │
+                      │  - release()            │
+                      └────────────┬────────────┘
+                                   │
+              ┌────────────────────┼────────────────────┐
+              │                                         │
+              ▼                                         ▼
+┌─────────────────────────┐              ┌─────────────────────────┐
+│  LocalSandboxProvider   │              │  AioSandboxProvider     │
+│  (packages/harness/deerflow/sandbox/local.py) │              │  (packages/harness/deerflow/community/)       │
+│                         │              │                         │
+│  - Singleton instance   │              │  - Docker-based         │
+│  - Direct execution     │              │  - Isolated containers  │
+│  - Development use      │              │  - Production use       │
+└─────────────────────────┘              └─────────────────────────┘
+
+                      ┌─────────────────────────┐
+                      │        Sandbox          │ (Abstract)
+                      │  - execute_command()    │
+                      │  - read_file()          │
+                      │  - write_file()         │
+                      │  - list_dir()           │
+                      └─────────────────────────┘
+```
+
+**Virtual Path Mapping**:
+
+| Virtual Path | Physical Path |
+|-------------|---------------|
+| `/mnt/user-data/workspace` | `backend/.deer-flow/threads/{thread_id}/user-data/workspace` |
+| `/mnt/user-data/uploads` | `backend/.deer-flow/threads/{thread_id}/user-data/uploads` |
+| `/mnt/user-data/outputs` | `backend/.deer-flow/threads/{thread_id}/user-data/outputs` |
+| `/mnt/skills` | `deer-flow/skills/` |
+
+### Tool System
+
+```
+┌─────────────────────────────────────────────────────────────────────────┐
+│                            Tool Sources                                  │
+└─────────────────────────────────────────────────────────────────────────┘
+
+┌─────────────────────┐  ┌─────────────────────┐  ┌─────────────────────┐
+│   Built-in Tools    │  │  Configured Tools   │  │     MCP Tools       │
+│  (packages/harness/deerflow/tools/)       │  │  (config.yaml)      │  │  (extensions.json)  │
+├─────────────────────┤  ├─────────────────────┤  ├─────────────────────┤
+│ - present_file      │  │ - web_search        │  │ - github            │
+│ - ask_clarification │  │ - web_fetch         │  │ - filesystem        │
+│ - view_image        │  │ - bash              │  │ - postgres          │
+│                     │  │ - read_file         │  │ - brave-search      │
+│                     │  │ - write_file        │  │ - puppeteer         │
+│                     │  │ - str_replace       │  │ - ...               │
+│                     │  │ - ls                │  │                     │
+└─────────────────────┘  └─────────────────────┘  └─────────────────────┘
+           │                       │                       │
+           └───────────────────────┴───────────────────────┘
+                                   │
+                                   ▼
+                      ┌─────────────────────────┐
+                      │   get_available_tools() │
+                      │   (packages/harness/deerflow/tools/__init__)  │
+                      └─────────────────────────┘
+```
+
+### Model Factory
+
+```
+┌─────────────────────────────────────────────────────────────────────────┐
+│                          Model Factory                                   │
+│                     (packages/harness/deerflow/models/factory.py)                              │
+└─────────────────────────────────────────────────────────────────────────┘
+
+config.yaml:
+┌─────────────────────────────────────────────────────────────────────────┐
+│ models:                                                                  │
+│   - name: gpt-4                                                         │
+│     display_name: GPT-4                                                 │
+│     use: langchain_openai:ChatOpenAI                                    │
+│     model: gpt-4                                                        │
+│     api_key: $OPENAI_API_KEY                                            │
+│     max_tokens: 4096                                                    │
+│     supports_thinking: false                                            │
+│     supports_vision: true                                               │
+└─────────────────────────────────────────────────────────────────────────┘
+                                   │
+                                   ▼
+                      ┌─────────────────────────┐
+                      │   create_chat_model()   │
+                      │  - name: str            │
+                      │  - thinking_enabled     │
+                      └────────────┬────────────┘
+                                   │
+                                   ▼
+                      ┌─────────────────────────┐
+                      │   resolve_class()       │
+                      │  (reflection system)    │
+                      └────────────┬────────────┘
+                                   │
+                                   ▼
+                      ┌─────────────────────────┐
+                      │   BaseChatModel         │
+                      │  (LangChain instance)   │
+                      └─────────────────────────┘
+```
+
+**Supported Providers**:
+- OpenAI (`langchain_openai:ChatOpenAI`)
+- Anthropic (`langchain_anthropic:ChatAnthropic`)
+- DeepSeek (`langchain_deepseek:ChatDeepSeek`)
+- Custom via LangChain integrations
+
+### MCP Integration
+
+```
+┌─────────────────────────────────────────────────────────────────────────┐
+│                          MCP Integration                                 │
+│                        (packages/harness/deerflow/mcp/manager.py)                              │
+└─────────────────────────────────────────────────────────────────────────┘
+
+extensions_config.json:
+┌─────────────────────────────────────────────────────────────────────────┐
+│ {                                                                        │
+│   "mcpServers": {                                                       │
+│     "github": {                                                         │
+│       "enabled": true,                                                  │
+│       "type": "stdio",                                                  │
+│       "command": "npx",                                                 │
+│       "args": ["-y", "@modelcontextprotocol/server-github"],           │
+│       "env": {"GITHUB_TOKEN": "$GITHUB_TOKEN"}                          │
+│     }                                                                   │
+│   }                                                                     │
+│ }                                                                       │
+└─────────────────────────────────────────────────────────────────────────┘
+                                   │
+                                   ▼
+                      ┌─────────────────────────┐
+                      │  MultiServerMCPClient   │
+                      │  (langchain-mcp-adapters)│
+                      └────────────┬────────────┘
+                                   │
+              ┌────────────────────┼────────────────────┐
+              │                    │                    │
+              ▼                    ▼                    ▼
+       ┌───────────┐        ┌───────────┐        ┌───────────┐
+       │  stdio    │        │   SSE     │        │   HTTP    │
+       │ transport │        │ transport │        │ transport │
+       └───────────┘        └───────────┘        └───────────┘
+```
+
+### Skills System
+
+```
+┌─────────────────────────────────────────────────────────────────────────┐
+│                          Skills System                                   │
+│                       (packages/harness/deerflow/skills/loader.py)                             │
+└─────────────────────────────────────────────────────────────────────────┘
+
+Directory Structure:
+┌─────────────────────────────────────────────────────────────────────────┐
+│ skills/                                                                  │
+│ ├── public/                        # Public skills (committed)           │
+│ │   ├── pdf-processing/                                                 │
+│ │   │   └── SKILL.md                                                    │
+│ │   ├── frontend-design/                                                │
+│ │   │   └── SKILL.md                                                    │
+│ │   └── ...                                                             │
+│ └── custom/                        # Custom skills (gitignored)          │
+│     └── user-installed/                                                 │
+│         └── SKILL.md                                                    │
+└─────────────────────────────────────────────────────────────────────────┘
+
+SKILL.md Format:
+┌─────────────────────────────────────────────────────────────────────────┐
+│ ---                                                                      │
+│ name: PDF Processing                                                     │
+│ description: Handle PDF documents efficiently                            │
+│ license: MIT                                                            │
+│ allowed-tools:                                                          │
+│   - read_file                                                           │
+│   - write_file                                                          │
+│   - bash                                                                │
+│ ---                                                                      │
+│                                                                          │
+│ # Skill Instructions                                                     │
+│ Content injected into system prompt...                                   │
+└─────────────────────────────────────────────────────────────────────────┘
+```
+
+### Request Flow
+
+```
+┌─────────────────────────────────────────────────────────────────────────┐
+│                         Request Flow Example                             │
+│                    User sends message to agent                           │
+└─────────────────────────────────────────────────────────────────────────┘
+
+1. Client → Nginx
+   POST /api/langgraph/threads/{thread_id}/runs
+   {"input": {"messages": [{"role": "user", "content": "Hello"}]}}
+
+2. Nginx → LangGraph Server (2024)
+   Proxied to LangGraph server
+
+3. LangGraph Server
+   a. Load/create thread state
+   b. Execute middleware chain:
+      - ThreadDataMiddleware: Set up paths
+      - UploadsMiddleware: Inject file list
+      - SandboxMiddleware: Acquire sandbox
+      - SummarizationMiddleware: Check token limits
+      - TitleMiddleware: Generate title if needed
+      - TodoListMiddleware: Load todos (if plan mode)
+      - ViewImageMiddleware: Process images
+      - ClarificationMiddleware: Check for clarifications
+
+   c. Execute agent:
+      - Model processes messages
+      - May call tools (bash, web_search, etc.)
+      - Tools execute via sandbox
+      - Results added to messages
+
+   d. Stream response via SSE
+
+4. Client receives streaming response
+```
+
+## Data Flow
+
+### File Upload Flow
+
+```
+1. Client uploads file
+   POST /api/threads/{thread_id}/uploads
+   Content-Type: multipart/form-data
+
+2. Gateway receives file
+   - Validates file
+   - Stores in .deer-flow/threads/{thread_id}/user-data/uploads/
+   - If document: converts to Markdown via markitdown
+
+3. Returns response
+   {
+     "files": [{
+       "filename": "doc.pdf",
+       "path": ".deer-flow/.../uploads/doc.pdf",
+       "virtual_path": "/mnt/user-data/uploads/doc.pdf",
+       "artifact_url": "/api/threads/.../artifacts/mnt/.../doc.pdf"
+     }]
+   }
+
+4. Next agent run
+   - UploadsMiddleware lists files
+   - Injects file list into messages
+   - Agent can access via virtual_path
+```
+
+### Thread Cleanup Flow
+
+```
+1. Client deletes conversation via LangGraph
+   DELETE /api/langgraph/threads/{thread_id}
+
+2. Web UI follows up with Gateway cleanup
+   DELETE /api/threads/{thread_id}
+
+3. Gateway removes local DeerFlow-managed files
+   - Deletes .deer-flow/threads/{thread_id}/ recursively
+   - Missing directories are treated as a no-op
+   - Invalid thread IDs are rejected before filesystem access
+```
+
+### Configuration Reload
+
+```
+1. Client updates MCP config
+   PUT /api/mcp/config
+
+2. Gateway writes extensions_config.json
+   - Updates mcpServers section
+   - File mtime changes
+
+3. MCP Manager detects change
+   - get_cached_mcp_tools() checks mtime
+   - If changed: reinitializes MCP client
+   - Loads updated server configurations
+
+4. Next agent run uses new tools
+```
+
+## Security Considerations
+
+### Sandbox Isolation
+
+- Agent code executes within sandbox boundaries
+- Local sandbox: Direct execution (development only)
+- Docker sandbox: Container isolation (production recommended)
+- Path traversal prevention in file operations
+
+### API Security
+
+- Thread isolation: Each thread has separate data directories
+- File validation: Uploads checked for path safety
+- Environment variable resolution: Secrets not stored in config
+
+### MCP Security
+
+- Each MCP server runs in its own process
+- Environment variables resolved at runtime
+- Servers can be enabled/disabled independently
+
+## Performance Considerations
+
+### Caching
+
+- MCP tools cached with file mtime invalidation
+- Configuration loaded once, reloaded on file change
+- Skills parsed once at startup, cached in memory
+
+### Streaming
+
+- SSE used for real-time response streaming
+- Reduces time to first token
+- Enables progress visibility for long operations
+
+### Context Management
+
+- Summarization middleware reduces context when limits approached
+- Configurable triggers: tokens, messages, or fraction
+- Preserves recent messages while summarizing older ones
--- a/deer-flow/backend/docs/AUTO_TITLE_GENERATION.md
+++ b/deer-flow/backend/docs/AUTO_TITLE_GENERATION.md
@@ -0,0 +1,258 @@
+# 自动 Thread Title 生成功能
+
+## 功能说明
+
+自动为对话线程生成标题，在用户首次提问并收到回复后自动触发。
+
+## 实现方式
+
+使用 `TitleMiddleware` 在 `after_model` 钩子中：
+1. 检测是否是首次对话（1个用户消息 + 1个助手回复）
+2. 检查 state 是否已有 title
+3. 调用 LLM 生成简洁的标题（默认最多6个词）
+4. 将 title 存储到 `ThreadState` 中（会被 checkpointer 持久化）
+
+TitleMiddleware 会先把 LangChain message content 里的结构化 block/list 内容归一化为纯文本，再拼到 title prompt 里，避免把 Python/JSON 的原始 repr 泄漏到标题生成模型。
+
+## ⚠️ 重要：存储机制
+
+### Title 存储位置
+
+Title 存储在 **`ThreadState.title`** 中，而非 thread metadata：
+
+```python
+class ThreadState(AgentState):
+    sandbox: SandboxState | None = None
+    title: str | None = None  # ✅ Title stored here
+```
+
+### 持久化说明
+
+| 部署方式 | 持久化 | 说明 |
+|---------|--------|------|
+| **LangGraph Studio (本地)** | ❌ 否 | 仅内存存储，重启后丢失 |
+| **LangGraph Platform** | ✅ 是 | 自动持久化到数据库 |
+| **自定义 + Checkpointer** | ✅ 是 | 需配置 PostgreSQL/SQLite checkpointer |
+
+### 如何启用持久化
+
+如果需要在本地开发时也持久化 title，需要配置 checkpointer：
+
+```python
+# 在 langgraph.json 同级目录创建 checkpointer.py
+from langgraph.checkpoint.postgres import PostgresSaver
+
+checkpointer = PostgresSaver.from_conn_string(
+    "postgresql://user:pass@localhost/dbname"
+)
+```
+
+然后在 `langgraph.json` 中引用：
+
+```json
+{
+  "graphs": {
+    "lead_agent": "deerflow.agents:lead_agent"
+  },
+  "checkpointer": "checkpointer:checkpointer"
+}
+```
+
+## 配置
+
+在 `config.yaml` 中添加（可选）：
+
+```yaml
+title:
+  enabled: true
+  max_words: 6
+  max_chars: 60
+  model_name: null  # 使用默认模型
+```
+
+或在代码中配置：
+
+```python
+from deerflow.config.title_config import TitleConfig, set_title_config
+
+set_title_config(TitleConfig(
+    enabled=True,
+    max_words=8,
+    max_chars=80,
+))
+```
+
+## 客户端使用
+
+### 获取 Thread Title
+
+```typescript
+// 方式1: 从 thread state 获取
+const state = await client.threads.getState(threadId);
+const title = state.values.title || "New Conversation";
+
+// 方式2: 监听 stream 事件
+for await (const chunk of client.runs.stream(threadId, assistantId, {
+  input: { messages: [{ role: "user", content: "Hello" }] }
+})) {
+  if (chunk.event === "values" && chunk.data.title) {
+    console.log("Title:", chunk.data.title);
+  }
+}
+```
+
+### 显示 Title
+
+```typescript
+// 在对话列表中显示
+function ConversationList() {
+  const [threads, setThreads] = useState([]);
+
+  useEffect(() => {
+    async function loadThreads() {
+      const allThreads = await client.threads.list();
+      
+      // 获取每个 thread 的 state 来读取 title
+      const threadsWithTitles = await Promise.all(
+        allThreads.map(async (t) => {
+          const state = await client.threads.getState(t.thread_id);
+          return {
+            id: t.thread_id,
+            title: state.values.title || "New Conversation",
+            updatedAt: t.updated_at,
+          };
+        })
+      );
+      
+      setThreads(threadsWithTitles);
+    }
+    loadThreads();
+  }, []);
+
+  return (
+    <ul>
+      {threads.map(thread => (
+        <li key={thread.id}>
+          <a href={`/chat/${thread.id}`}>{thread.title}</a>
+        </li>
+      ))}
+    </ul>
+  );
+}
+```
+
+## 工作流程
+
+```mermaid
+sequenceDiagram
+    participant User
+    participant Client
+    participant LangGraph
+    participant TitleMiddleware
+    participant LLM
+    participant Checkpointer
+
+    User->>Client: 发送首条消息
+    Client->>LangGraph: POST /threads/{id}/runs
+    LangGraph->>Agent: 处理消息
+    Agent-->>LangGraph: 返回回复
+    LangGraph->>TitleMiddleware: after_agent()
+    TitleMiddleware->>TitleMiddleware: 检查是否需要生成 title
+    TitleMiddleware->>LLM: 生成 title
+    LLM-->>TitleMiddleware: 返回 title
+    TitleMiddleware->>LangGraph: return {"title": "..."}
+    LangGraph->>Checkpointer: 保存 state (含 title)
+    LangGraph-->>Client: 返回响应
+    Client->>Client: 从 state.values.title 读取
+```
+
+## 优势
+
+✅ **可靠持久化** - 使用 LangGraph 的 state 机制，自动持久化  
+✅ **完全后端处理** - 客户端无需额外逻辑  
+✅ **自动触发** - 首次对话后自动生成  
+✅ **可配置** - 支持自定义长度、模型等  
+✅ **容错性强** - 失败时使用 fallback 策略  
+✅ **架构一致** - 与现有 SandboxMiddleware 保持一致  
+
+## 注意事项
+
+1. **读取方式不同**：Title 在 `state.values.title` 而非 `thread.metadata.title`
+2. **性能考虑**：title 生成会增加约 0.5-1 秒延迟，可通过使用更快的模型优化
+3. **并发安全**：middleware 在 agent 执行后运行，不会阻塞主流程
+4. **Fallback 策略**：如果 LLM 调用失败，会使用用户消息的前几个词作为 title
+
+## 测试
+
+```python
+# 测试 title 生成
+import pytest
+from deerflow.agents.title_middleware import TitleMiddleware
+
+def test_title_generation():
+    # TODO: 添加单元测试
+    pass
+```
+
+## 故障排查
+
+### Title 没有生成
+
+1. 检查配置是否启用：`get_title_config().enabled == True`
+2. 检查日志：查找 "Generated thread title" 或错误信息
+3. 确认是首次对话：只有 1 个用户消息和 1 个助手回复时才会触发
+
+### Title 生成但客户端看不到
+
+1. 确认读取位置：应该从 `state.values.title` 读取，而非 `thread.metadata.title`
+2. 检查 API 响应：确认 state 中包含 title 字段
+3. 尝试重新获取 state：`client.threads.getState(threadId)`
+
+### Title 重启后丢失
+
+1. 检查是否配置了 checkpointer（本地开发需要）
+2. 确认部署方式：LangGraph Platform 会自动持久化
+3. 查看数据库：确认 checkpointer 正常工作
+
+## 架构设计
+
+### 为什么使用 State 而非 Metadata？
+
+| 特性 | State | Metadata |
+|------|-------|----------|
+| **持久化** | ✅ 自动（通过 checkpointer） | ⚠️ 取决于实现 |
+| **版本控制** | ✅ 支持时间旅行 | ❌ 不支持 |
+| **类型安全** | ✅ TypedDict 定义 | ❌ 任意字典 |
+| **可追溯** | ✅ 每次更新都记录 | ⚠️ 只有最新值 |
+| **标准化** | ✅ LangGraph 核心机制 | ⚠️ 扩展功能 |
+
+### 实现细节
+
+```python
+# TitleMiddleware 核心逻辑
+@override
+def after_agent(self, state: TitleMiddlewareState, runtime: Runtime) -> dict | None:
+    """Generate and set thread title after the first agent response."""
+    if self._should_generate_title(state, runtime):
+        title = self._generate_title(runtime)
+        print(f"Generated thread title: {title}")
+        
+        # ✅ 返回 state 更新，会被 checkpointer 自动持久化
+        return {"title": title}
+    
+    return None
+```
+
+## 相关文件
+
+- [`packages/harness/deerflow/agents/thread_state.py`](../packages/harness/deerflow/agents/thread_state.py) - ThreadState 定义
+- [`packages/harness/deerflow/agents/middlewares/title_middleware.py`](../packages/harness/deerflow/agents/middlewares/title_middleware.py) - TitleMiddleware 实现
+- [`packages/harness/deerflow/config/title_config.py`](../packages/harness/deerflow/config/title_config.py) - 配置管理
+- [`config.yaml`](../../config.example.yaml) - 配置文件
+- [`packages/harness/deerflow/agents/lead_agent/agent.py`](../packages/harness/deerflow/agents/lead_agent/agent.py) - Middleware 注册
+
+## 参考资料
+
+- [LangGraph Checkpointer 文档](https://langchain-ai.github.io/langgraph/concepts/persistence/)
+- [LangGraph State 管理](https://langchain-ai.github.io/langgraph/concepts/low_level/#state)
+- [LangGraph Middleware](https://langchain-ai.github.io/langgraph/concepts/middleware/)
--- a/deer-flow/backend/docs/CONFIGURATION.md
+++ b/deer-flow/backend/docs/CONFIGURATION.md
@@ -0,0 +1,369 @@
+# Configuration Guide
+
+This guide explains how to configure DeerFlow for your environment.
+
+## Config Versioning
+
+`config.example.yaml` contains a `config_version` field that tracks schema changes. When the example version is higher than your local `config.yaml`, the application emits a startup warning:
+
+```
+WARNING - Your config.yaml (version 0) is outdated — the latest version is 1.
+Run `make config-upgrade` to merge new fields into your config.
+```
+
+- **Missing `config_version`** in your config is treated as version 0.
+- Run `make config-upgrade` to auto-merge missing fields (your existing values are preserved, a `.bak` backup is created).
+- When changing the config schema, bump `config_version` in `config.example.yaml`.
+
+## Configuration Sections
+
+### Models
+
+Configure the LLM models available to the agent:
+
+```yaml
+models:
+  - name: gpt-4                    # Internal identifier
+    display_name: GPT-4            # Human-readable name
+    use: langchain_openai:ChatOpenAI  # LangChain class path
+    model: gpt-4                   # Model identifier for API
+    api_key: $OPENAI_API_KEY       # API key (use env var)
+    max_tokens: 4096               # Max tokens per request
+    temperature: 0.7               # Sampling temperature
+```
+
+**Supported Providers**:
+- OpenAI (`langchain_openai:ChatOpenAI`)
+- Anthropic (`langchain_anthropic:ChatAnthropic`)
+- DeepSeek (`langchain_deepseek:ChatDeepSeek`)
+- Claude Code OAuth (`deerflow.models.claude_provider:ClaudeChatModel`)
+- Codex CLI (`deerflow.models.openai_codex_provider:CodexChatModel`)
+- Any LangChain-compatible provider
+
+CLI-backed provider examples:
+
+```yaml
+models:
+  - name: gpt-5.4
+    display_name: GPT-5.4 (Codex CLI)
+    use: deerflow.models.openai_codex_provider:CodexChatModel
+    model: gpt-5.4
+    supports_thinking: true
+    supports_reasoning_effort: true
+
+  - name: claude-sonnet-4.6
+    display_name: Claude Sonnet 4.6 (Claude Code OAuth)
+    use: deerflow.models.claude_provider:ClaudeChatModel
+    model: claude-sonnet-4-6
+    max_tokens: 4096
+    supports_thinking: true
+```
+
+**Auth behavior for CLI-backed providers**:
+- `CodexChatModel` loads Codex CLI auth from `~/.codex/auth.json`
+- The Codex Responses endpoint currently rejects `max_tokens` and `max_output_tokens`, so `CodexChatModel` does not expose a request-level token cap
+- `ClaudeChatModel` accepts `CLAUDE_CODE_OAUTH_TOKEN`, `ANTHROPIC_AUTH_TOKEN`, `CLAUDE_CODE_OAUTH_TOKEN_FILE_DESCRIPTOR`, `CLAUDE_CODE_CREDENTIALS_PATH`, or plaintext `~/.claude/.credentials.json`
+- On macOS, DeerFlow does not probe Keychain automatically. Use `scripts/export_claude_code_oauth.py` to export Claude Code auth explicitly when needed
+
+To use OpenAI's `/v1/responses` endpoint with LangChain, keep using `langchain_openai:ChatOpenAI` and set:
+
+```yaml
+models:
+  - name: gpt-5-responses
+    display_name: GPT-5 (Responses API)
+    use: langchain_openai:ChatOpenAI
+    model: gpt-5
+    api_key: $OPENAI_API_KEY
+    use_responses_api: true
+    output_version: responses/v1
+```
+
+For OpenAI-compatible gateways (for example Novita or OpenRouter), keep using `langchain_openai:ChatOpenAI` and set `base_url`:
+
+```yaml
+models:
+  - name: novita-deepseek-v3.2
+    display_name: Novita DeepSeek V3.2
+    use: langchain_openai:ChatOpenAI
+    model: deepseek/deepseek-v3.2
+    api_key: $NOVITA_API_KEY
+    base_url: https://api.novita.ai/openai
+    supports_thinking: true
+    when_thinking_enabled:
+      extra_body:
+        thinking:
+          type: enabled
+
+  - name: minimax-m2.5
+    display_name: MiniMax M2.5
+    use: langchain_openai:ChatOpenAI
+    model: MiniMax-M2.5
+    api_key: $MINIMAX_API_KEY
+    base_url: https://api.minimax.io/v1
+    max_tokens: 4096
+    temperature: 1.0  # MiniMax requires temperature in (0.0, 1.0]
+    supports_vision: true
+
+  - name: minimax-m2.5-highspeed
+    display_name: MiniMax M2.5 Highspeed
+    use: langchain_openai:ChatOpenAI
+    model: MiniMax-M2.5-highspeed
+    api_key: $MINIMAX_API_KEY
+    base_url: https://api.minimax.io/v1
+    max_tokens: 4096
+    temperature: 1.0  # MiniMax requires temperature in (0.0, 1.0]
+    supports_vision: true
+  - name: openrouter-gemini-2.5-flash
+    display_name: Gemini 2.5 Flash (OpenRouter)
+    use: langchain_openai:ChatOpenAI
+    model: google/gemini-2.5-flash-preview
+    api_key: $OPENAI_API_KEY
+    base_url: https://openrouter.ai/api/v1
+```
+
+If your OpenRouter key lives in a different environment variable name, point `api_key` at that variable explicitly (for example `api_key: $OPENROUTER_API_KEY`).
+
+**Thinking Models**:
+Some models support "thinking" mode for complex reasoning:
+
+```yaml
+models:
+  - name: deepseek-v3
+    supports_thinking: true
+    when_thinking_enabled:
+      extra_body:
+        thinking:
+          type: enabled
+```
+
+**Gemini with thinking via OpenAI-compatible gateway**:
+
+When routing Gemini through an OpenAI-compatible proxy (Vertex AI OpenAI compat endpoint, AI Studio, or third-party gateways) with thinking enabled, the API attaches a `thought_signature` to each tool-call object returned in the response.  Every subsequent request that replays those assistant messages **must** echo those signatures back on the tool-call entries or the API returns:
+
+```
+HTTP 400 INVALID_ARGUMENT: function call `<tool>` in the N. content block is
+missing a `thought_signature`.
+```
+
+Standard `langchain_openai:ChatOpenAI` silently drops `thought_signature` when serialising messages.  Use `deerflow.models.patched_openai:PatchedChatOpenAI` instead — it re-injects the tool-call signatures (sourced from `AIMessage.additional_kwargs["tool_calls"]`) into every outgoing payload:
+
+```yaml
+models:
+  - name: gemini-2.5-pro-thinking
+    display_name: Gemini 2.5 Pro (Thinking)
+    use: deerflow.models.patched_openai:PatchedChatOpenAI
+    model: google/gemini-2.5-pro-preview   # model name as expected by your gateway
+    api_key: $GEMINI_API_KEY
+    base_url: https://<your-openai-compat-gateway>/v1
+    max_tokens: 16384
+    supports_thinking: true
+    supports_vision: true
+    when_thinking_enabled:
+      extra_body:
+        thinking:
+          type: enabled
+```
+
+For Gemini accessed **without** thinking (e.g. via OpenRouter where thinking is not activated), the plain `langchain_openai:ChatOpenAI` with `supports_thinking: false` is sufficient and no patch is needed.
+
+### Tool Groups
+
+Organize tools into logical groups:
+
+```yaml
+tool_groups:
+  - name: web          # Web browsing and search
+  - name: file:read    # Read-only file operations
+  - name: file:write   # Write file operations
+  - name: bash         # Shell command execution
+```
+
+### Tools
+
+Configure specific tools available to the agent:
+
+```yaml
+tools:
+  - name: web_search
+    group: web
+    use: deerflow.community.tavily.tools:web_search_tool
+    max_results: 5
+    # api_key: $TAVILY_API_KEY  # Optional
+```
+
+**Built-in Tools**:
+- `web_search` - Search the web (DuckDuckGo, Tavily, Exa, InfoQuest, Firecrawl)
+- `web_fetch` - Fetch web pages (Jina AI, Exa, InfoQuest, Firecrawl)
+- `ls` - List directory contents
+- `read_file` - Read file contents
+- `write_file` - Write file contents
+- `str_replace` - String replacement in files
+- `bash` - Execute bash commands
+
+### Sandbox
+
+DeerFlow supports multiple sandbox execution modes. Configure your preferred mode in `config.yaml`:
+
+**Local Execution** (runs sandbox code directly on the host machine):
+```yaml
+sandbox:
+   use: deerflow.sandbox.local:LocalSandboxProvider # Local execution
+   allow_host_bash: false # default; host bash is disabled unless explicitly re-enabled
+```
+
+**Docker Execution** (runs sandbox code in isolated Docker containers):
+```yaml
+sandbox:
+   use: deerflow.community.aio_sandbox:AioSandboxProvider # Docker-based sandbox
+```
+
+**Docker Execution with Kubernetes** (runs sandbox code in Kubernetes pods via provisioner service):
+
+This mode runs each sandbox in an isolated Kubernetes Pod on your **host machine's cluster**. Requires Docker Desktop K8s, OrbStack, or similar local K8s setup.
+
+```yaml
+sandbox:
+   use: deerflow.community.aio_sandbox:AioSandboxProvider
+   provisioner_url: http://provisioner:8002
+```
+
+When using Docker development (`make docker-start`), DeerFlow starts the `provisioner` service only if this provisioner mode is configured. In local or plain Docker sandbox modes, `provisioner` is skipped.
+
+See [Provisioner Setup Guide](../../docker/provisioner/README.md) for detailed configuration, prerequisites, and troubleshooting.
+
+Choose between local execution or Docker-based isolation:
+
+**Option 1: Local Sandbox** (default, simpler setup):
+```yaml
+sandbox:
+  use: deerflow.sandbox.local:LocalSandboxProvider
+  allow_host_bash: false
+```
+
+`allow_host_bash` is intentionally `false` by default. DeerFlow's local sandbox is a host-side convenience mode, not a secure shell isolation boundary. If you need `bash`, prefer `AioSandboxProvider`. Only set `allow_host_bash: true` for fully trusted single-user local workflows.
+
+**Option 2: Docker Sandbox** (isolated, more secure):
+```yaml
+sandbox:
+  use: deerflow.community.aio_sandbox:AioSandboxProvider
+  port: 8080
+  auto_start: true
+  container_prefix: deer-flow-sandbox
+
+  # Optional: Additional mounts
+  mounts:
+    - host_path: /path/on/host
+      container_path: /path/in/container
+      read_only: false
+```
+
+When you configure `sandbox.mounts`, DeerFlow exposes those `container_path` values in the agent prompt so the agent can discover and operate on mounted directories directly instead of assuming everything must live under `/mnt/user-data`.
+
+### Skills
+
+Configure the skills directory for specialized workflows:
+
+```yaml
+skills:
+  # Host path (optional, default: ../skills)
+  path: /custom/path/to/skills
+
+  # Container mount path (default: /mnt/skills)
+  container_path: /mnt/skills
+```
+
+**How Skills Work**:
+- Skills are stored in `deer-flow/skills/{public,custom}/`
+- Each skill has a `SKILL.md` file with metadata
+- Skills are automatically discovered and loaded
+- Available in both local and Docker sandbox via path mapping
+
+**Per-Agent Skill Filtering**:
+Custom agents can restrict which skills they load by defining a `skills` field in their `config.yaml` (located at `workspace/agents/<agent_name>/config.yaml`):
+- **Omitted or `null`**: Loads all globally enabled skills (default fallback).
+- **`[]` (empty list)**: Disables all skills for this specific agent.
+- **`["skill-name"]`**: Loads only the explicitly specified skills.
+
+### Title Generation
+
+Automatic conversation title generation:
+
+```yaml
+title:
+  enabled: true
+  max_words: 6
+  max_chars: 60
+  model_name: null  # Use first model in list
+```
+
+### GitHub API Token (Optional for GitHub Deep Research Skill)
+
+The default GitHub API rate limits are quite restrictive. For frequent project research, we recommend configuring a personal access token (PAT) with read-only permissions.
+
+**Configuration Steps**:
+1. Uncomment the `GITHUB_TOKEN` line in the `.env` file and add your personal access token
+2. Restart the DeerFlow service to apply changes
+
+## Environment Variables
+
+DeerFlow supports environment variable substitution using the `$` prefix:
+
+```yaml
+models:
+  - api_key: $OPENAI_API_KEY  # Reads from environment
+```
+
+**Common Environment Variables**:
+- `OPENAI_API_KEY` - OpenAI API key
+- `ANTHROPIC_API_KEY` - Anthropic API key
+- `DEEPSEEK_API_KEY` - DeepSeek API key
+- `NOVITA_API_KEY` - Novita API key (OpenAI-compatible endpoint)
+- `TAVILY_API_KEY` - Tavily search API key
+- `DEER_FLOW_CONFIG_PATH` - Custom config file path
+
+## Configuration Location
+
+The configuration file should be placed in the **project root directory** (`deer-flow/config.yaml`), not in the backend directory.
+
+## Configuration Priority
+
+DeerFlow searches for configuration in this order:
+
+1. Path specified in code via `config_path` argument
+2. Path from `DEER_FLOW_CONFIG_PATH` environment variable
+3. `config.yaml` in current working directory (typically `backend/` when running)
+4. `config.yaml` in parent directory (project root: `deer-flow/`)
+
+## Best Practices
+
+1. **Place `config.yaml` in project root** - Not in `backend/` directory
+2. **Never commit `config.yaml`** - It's already in `.gitignore`
+3. **Use environment variables for secrets** - Don't hardcode API keys
+4. **Keep `config.example.yaml` updated** - Document all new options
+5. **Test configuration changes locally** - Before deploying
+6. **Use Docker sandbox for production** - Better isolation and security
+
+## Troubleshooting
+
+### "Config file not found"
+- Ensure `config.yaml` exists in the **project root** directory (`deer-flow/config.yaml`)
+- The backend searches parent directory by default, so root location is preferred
+- Alternatively, set `DEER_FLOW_CONFIG_PATH` environment variable to custom location
+
+### "Invalid API key"
+- Verify environment variables are set correctly
+- Check that `$` prefix is used for env var references
+
+### "Skills not loading"
+- Check that `deer-flow/skills/` directory exists
+- Verify skills have valid `SKILL.md` files
+- Check `skills.path` configuration if using custom path
+
+### "Docker sandbox fails to start"
+- Ensure Docker is running
+- Check port 8080 (or configured port) is available
+- Verify Docker image is accessible
+
+## Examples
+
+See `config.example.yaml` for complete examples of all configuration options.
--- a/deer-flow/backend/docs/FILE_UPLOAD.md
+++ b/deer-flow/backend/docs/FILE_UPLOAD.md
@@ -0,0 +1,293 @@
+# 文件上传功能
+
+## 概述
+
+DeerFlow 后端提供了完整的文件上传功能，支持多文件上传，并自动将 Office 文档和 PDF 转换为 Markdown 格式。
+
+## 功能特性
+
+- ✅ 支持多文件同时上传
+- ✅ 自动转换文档为 Markdown（PDF、PPT、Excel、Word）
+- ✅ 文件存储在线程隔离的目录中
+- ✅ Agent 自动感知已上传的文件
+- ✅ 支持文件列表查询和删除
+
+## API 端点
+
+### 1. 上传文件
+```
+POST /api/threads/{thread_id}/uploads
+```
+
+**请求体：** `multipart/form-data`
+- `files`: 一个或多个文件
+
+**响应：**
+```json
+{
+  "success": true,
+  "files": [
+    {
+      "filename": "document.pdf",
+      "size": 1234567,
+      "path": ".deer-flow/threads/{thread_id}/user-data/uploads/document.pdf",
+      "virtual_path": "/mnt/user-data/uploads/document.pdf",
+      "artifact_url": "/api/threads/{thread_id}/artifacts/mnt/user-data/uploads/document.pdf",
+      "markdown_file": "document.md",
+      "markdown_path": ".deer-flow/threads/{thread_id}/user-data/uploads/document.md",
+      "markdown_virtual_path": "/mnt/user-data/uploads/document.md",
+      "markdown_artifact_url": "/api/threads/{thread_id}/artifacts/mnt/user-data/uploads/document.md"
+    }
+  ],
+  "message": "Successfully uploaded 1 file(s)"
+}
+```
+
+**路径说明：**
+- `path`: 实际文件系统路径（相对于 `backend/` 目录）
+- `virtual_path`: Agent 在沙箱中使用的虚拟路径
+- `artifact_url`: 前端通过 HTTP 访问文件的 URL
+
+### 2. 列出已上传文件
+```
+GET /api/threads/{thread_id}/uploads/list
+```
+
+**响应：**
+```json
+{
+  "files": [
+    {
+      "filename": "document.pdf",
+      "size": 1234567,
+      "path": ".deer-flow/threads/{thread_id}/user-data/uploads/document.pdf",
+      "virtual_path": "/mnt/user-data/uploads/document.pdf",
+      "artifact_url": "/api/threads/{thread_id}/artifacts/mnt/user-data/uploads/document.pdf",
+      "extension": ".pdf",
+      "modified": 1705997600.0
+    }
+  ],
+  "count": 1
+}
+```
+
+### 3. 删除文件
+```
+DELETE /api/threads/{thread_id}/uploads/{filename}
+```
+
+**响应：**
+```json
+{
+  "success": true,
+  "message": "Deleted document.pdf"
+}
+```
+
+## 支持的文档格式
+
+以下格式会自动转换为 Markdown：
+- PDF (`.pdf`)
+- PowerPoint (`.ppt`, `.pptx`)
+- Excel (`.xls`, `.xlsx`)
+- Word (`.doc`, `.docx`)
+
+转换后的 Markdown 文件会保存在同一目录下，文件名为原文件名 + `.md` 扩展名。
+
+## Agent 集成
+
+### 自动文件列举
+
+Agent 在每次请求时会自动收到已上传文件的列表，格式如下：
+
+```xml
+<uploaded_files>
+The following files have been uploaded and are available for use:
+
+- document.pdf (1.2 MB)
+  Path: /mnt/user-data/uploads/document.pdf
+
+- document.md (45.3 KB)
+  Path: /mnt/user-data/uploads/document.md
+
+You can read these files using the `read_file` tool with the paths shown above.
+</uploaded_files>
+```
+
+### 使用上传的文件
+
+Agent 在沙箱中运行，使用虚拟路径访问文件。Agent 可以直接使用 `read_file` 工具读取上传的文件：
+
+```python
+# 读取原始 PDF（如果支持）
+read_file(path="/mnt/user-data/uploads/document.pdf")
+
+# 读取转换后的 Markdown（推荐）
+read_file(path="/mnt/user-data/uploads/document.md")
+```
+
+**路径映射关系：**
+- Agent 使用：`/mnt/user-data/uploads/document.pdf`（虚拟路径）
+- 实际存储：`backend/.deer-flow/threads/{thread_id}/user-data/uploads/document.pdf`
+- 前端访问：`/api/threads/{thread_id}/artifacts/mnt/user-data/uploads/document.pdf`（HTTP URL）
+
+上传流程采用“线程目录优先”策略：
+- 先写入 `backend/.deer-flow/threads/{thread_id}/user-data/uploads/` 作为权威存储
+- 本地沙箱（`sandbox_id=local`）直接使用线程目录内容
+- 非本地沙箱会额外同步到 `/mnt/user-data/uploads/*`，确保运行时可见
+
+## 测试示例
+
+### 使用 curl 测试
+
+```bash
+# 1. 上传单个文件
+curl -X POST http://localhost:2026/api/threads/test-thread/uploads \
+  -F "files=@/path/to/document.pdf"
+
+# 2. 上传多个文件
+curl -X POST http://localhost:2026/api/threads/test-thread/uploads \
+  -F "files=@/path/to/document.pdf" \
+  -F "files=@/path/to/presentation.pptx" \
+  -F "files=@/path/to/spreadsheet.xlsx"
+
+# 3. 列出已上传文件
+curl http://localhost:2026/api/threads/test-thread/uploads/list
+
+# 4. 删除文件
+curl -X DELETE http://localhost:2026/api/threads/test-thread/uploads/document.pdf
+```
+
+### 使用 Python 测试
+
+```python
+import requests
+
+thread_id = "test-thread"
+base_url = "http://localhost:2026"
+
+# 上传文件
+files = [
+    ("files", open("document.pdf", "rb")),
+    ("files", open("presentation.pptx", "rb")),
+]
+response = requests.post(
+    f"{base_url}/api/threads/{thread_id}/uploads",
+    files=files
+)
+print(response.json())
+
+# 列出文件
+response = requests.get(f"{base_url}/api/threads/{thread_id}/uploads/list")
+print(response.json())
+
+# 删除文件
+response = requests.delete(
+    f"{base_url}/api/threads/{thread_id}/uploads/document.pdf"
+)
+print(response.json())
+```
+
+## 文件存储结构
+
+```
+backend/.deer-flow/threads/
+└── {thread_id}/
+    └── user-data/
+        └── uploads/
+            ├── document.pdf          # 原始文件
+            ├── document.md           # 转换后的 Markdown
+            ├── presentation.pptx
+            ├── presentation.md
+            └── ...
+```
+
+## 限制
+
+- 最大文件大小：100MB（可在 nginx.conf 中配置 `client_max_body_size`）
+- 文件名安全性：系统会自动验证文件路径，防止目录遍历攻击
+- 线程隔离：每个线程的上传文件相互隔离，无法跨线程访问
+
+## 技术实现
+
+### 组件
+
+1. **Upload Router** (`app/gateway/routers/uploads.py`)
+   - 处理文件上传、列表、删除请求
+   - 使用 markitdown 转换文档
+
+2. **Uploads Middleware** (`packages/harness/deerflow/agents/middlewares/uploads_middleware.py`)
+   - 在每次 Agent 请求前注入文件列表
+   - 自动生成格式化的文件列表消息
+
+3. **Nginx 配置** (`nginx.conf`)
+   - 路由上传请求到 Gateway API
+   - 配置大文件上传支持
+
+### 依赖
+
+- `markitdown>=0.0.1a2` - 文档转换
+- `python-multipart>=0.0.20` - 文件上传处理
+
+## 故障排查
+
+### 文件上传失败
+
+1. 检查文件大小是否超过限制
+2. 检查 Gateway API 是否正常运行
+3. 检查磁盘空间是否充足
+4. 查看 Gateway 日志：`make gateway`
+
+### 文档转换失败
+
+1. 检查 markitdown 是否正确安装：`uv run python -c "import markitdown"`
+2. 查看日志中的具体错误信息
+3. 某些损坏或加密的文档可能无法转换，但原文件仍会保存
+
+### Agent 看不到上传的文件
+
+1. 确认 UploadsMiddleware 已在 agent.py 中注册
+2. 检查 thread_id 是否正确
+3. 确认文件确实已上传到 `backend/.deer-flow/threads/{thread_id}/user-data/uploads/`
+4. 非本地沙箱场景下，确认上传接口没有报错（需要成功完成 sandbox 同步）
+
+## 开发建议
+
+### 前端集成
+
+```typescript
+// 上传文件示例
+async function uploadFiles(threadId: string, files: File[]) {
+  const formData = new FormData();
+  files.forEach(file => {
+    formData.append('files', file);
+  });
+
+  const response = await fetch(
+    `/api/threads/${threadId}/uploads`,
+    {
+      method: 'POST',
+      body: formData,
+    }
+  );
+
+  return response.json();
+}
+
+// 列出文件
+async function listFiles(threadId: string) {
+  const response = await fetch(
+    `/api/threads/${threadId}/uploads/list`
+  );
+  return response.json();
+}
+```
+
+### 扩展功能建议
+
+1. **文件预览**：添加预览端点，支持在浏览器中直接查看文件
+2. **批量删除**：支持一次删除多个文件
+3. **文件搜索**：支持按文件名或类型搜索
+4. **版本控制**：保留文件的多个版本
+5. **压缩包支持**：自动解压 zip 文件
+6. **图片 OCR**：对上传的图片进行 OCR 识别
--- a/deer-flow/backend/docs/GUARDRAILS.md
+++ b/deer-flow/backend/docs/GUARDRAILS.md
@@ -0,0 +1,385 @@
+# Guardrails: Pre-Tool-Call Authorization
+
+> **Context:** [Issue #1213](https://github.com/bytedance/deer-flow/issues/1213) — DeerFlow has Docker sandboxing and human approval via `ask_clarification`, but no deterministic, policy-driven authorization layer for tool calls. An agent running autonomous multi-step tasks can execute any loaded tool with any arguments. Guardrails add a middleware that evaluates every tool call against a policy **before** execution.
+
+## Why Guardrails
+
+```
+Without guardrails:                      With guardrails:
+
+  Agent                                    Agent
+    │                                        │
+    ▼                                        ▼
+  ┌──────────┐                             ┌──────────┐
+  │ bash     │──▶ executes immediately     │ bash     │──▶ GuardrailMiddleware
+  │ rm -rf / │                             │ rm -rf / │        │
+  └──────────┘                             └──────────┘        ▼
+                                                         ┌──────────────┐
+                                                         │  Provider    │
+                                                         │  evaluates   │
+                                                         │  against     │
+                                                         │  policy      │
+                                                         └──────┬───────┘
+                                                                │
+                                                          ┌─────┴─────┐
+                                                          │           │
+                                                        ALLOW       DENY
+                                                          │           │
+                                                          ▼           ▼
+                                                      Tool runs   Agent sees:
+                                                      normally    "Guardrail denied:
+                                                                   rm -rf blocked"
+```
+
+- **Sandboxing** provides process isolation but not semantic authorization. A sandboxed `bash` can still `curl` data out.
+- **Human approval** (`ask_clarification`) requires a human in the loop for every action. Not viable for autonomous workflows.
+- **Guardrails** provide deterministic, policy-driven authorization that works without human intervention.
+
+## Architecture
+
+```
+┌─────────────────────────────────────────────────────────────────────┐
+│                        Middleware Chain                               │
+│                                                                      │
+│  1. ThreadDataMiddleware     ─── per-thread dirs                     │
+│  2. UploadsMiddleware        ─── file upload tracking                │
+│  3. SandboxMiddleware        ─── sandbox acquisition                 │
+│  4. DanglingToolCallMiddleware ── fix incomplete tool calls           │
+│  5. GuardrailMiddleware ◄──── EVALUATES EVERY TOOL CALL             │
+│  6. ToolErrorHandlingMiddleware ── convert exceptions to messages     │
+│  7-12. (Summarization, Title, Memory, Vision, Subagent, Clarify)    │
+│                                                                      │
+└─────────────────────────────────────────────────────────────────────┘
+                         │
+                         ▼
+           ┌──────────────────────────┐
+           │    GuardrailProvider     │  ◄── pluggable: any class
+           │    (configured in YAML)  │      with evaluate/aevaluate
+           └────────────┬─────────────┘
+                        │
+              ┌─────────┼──────────────┐
+              │         │              │
+              ▼         ▼              ▼
+         Built-in   OAP Passport    Custom
+         Allowlist  Provider        Provider
+         (zero dep) (open standard) (your code)
+                        │
+                  Any implementation
+                  (e.g. APort, or
+                   your own evaluator)
+```
+
+The `GuardrailMiddleware` implements `wrap_tool_call` / `awrap_tool_call` (the same `AgentMiddleware` pattern used by `ToolErrorHandlingMiddleware`). It:
+
+1. Builds a `GuardrailRequest` with tool name, arguments, and passport reference
+2. Calls `provider.evaluate(request)` on whatever provider is configured
+3. If **deny**: returns `ToolMessage(status="error")` with the reason -- agent sees the denial and adapts
+4. If **allow**: passes through to the actual tool handler
+5. If **provider error** and `fail_closed=true` (default): blocks the call
+6. `GraphBubbleUp` exceptions (LangGraph control signals) are always propagated, never caught
+
+## Three Provider Options
+
+### Option 1: Built-in AllowlistProvider (Zero Dependencies)
+
+The simplest option. Ships with DeerFlow. Block or allow tools by name. No external packages, no passport, no network.
+
+**config.yaml:**
+```yaml
+guardrails:
+  enabled: true
+  provider:
+    use: deerflow.guardrails.builtin:AllowlistProvider
+    config:
+      denied_tools: ["bash", "write_file"]
+```
+
+This blocks `bash` and `write_file` for all requests. All other tools pass through.
+
+You can also use an allowlist (only these tools are permitted):
+```yaml
+guardrails:
+  enabled: true
+  provider:
+    use: deerflow.guardrails.builtin:AllowlistProvider
+    config:
+      allowed_tools: ["web_search", "read_file", "ls"]
+```
+
+**Try it:**
+1. Add the config above to your `config.yaml`
+2. Start DeerFlow: `make dev`
+3. Ask the agent: "Use bash to run echo hello"
+4. The agent sees: `Guardrail denied: tool 'bash' was blocked (oap.tool_not_allowed)`
+
+### Option 2: OAP Passport Provider (Policy-Based)
+
+For policy enforcement based on the [Open Agent Passport (OAP)](https://github.com/aporthq/aport-spec) open standard. An OAP passport is a JSON document that declares an agent's identity, capabilities, and operational limits. Any provider that reads an OAP passport and returns OAP-compliant decisions works with DeerFlow.
+
+```
+┌─────────────────────────────────────────────────────────────┐
+│                    OAP Passport (JSON)                        │
+│                   (open standard, any provider)              │
+│  {                                                           │
+│    "spec_version": "oap/1.0",                                │
+│    "status": "active",                                       │
+│    "capabilities": [                                         │
+│      {"id": "system.command.execute"},                       │
+│      {"id": "data.file.read"},                               │
+│      {"id": "data.file.write"},                              │
+│      {"id": "web.fetch"},                                    │
+│      {"id": "mcp.tool.execute"}                              │
+│    ],                                                        │
+│    "limits": {                                               │
+│      "system.command.execute": {                             │
+│        "allowed_commands": ["git", "npm", "node", "ls"],     │
+│        "blocked_patterns": ["rm -rf", "sudo", "chmod 777"]   │
+│      }                                                       │
+│    }                                                         │
+│  }                                                           │
+└──────────────────────────┬──────────────────────────────────┘
+                           │
+               Any OAP-compliant provider
+          ┌────────────────┼────────────────┐
+          │                │                │
+     Your own         APort (ref.      Other future
+     evaluator        implementation)  implementations
+```
+
+**Creating a passport manually:**
+
+An OAP passport is just a JSON file. You can create one by hand following the [OAP specification](https://github.com/aporthq/aport-spec/blob/main/oap/oap-spec.md) and validate it against the [JSON schema](https://github.com/aporthq/aport-spec/blob/main/oap/passport-schema.json). See the [examples](https://github.com/aporthq/aport-spec/tree/main/oap/examples) directory for templates.
+
+**Using APort as a reference implementation:**
+
+[APort Agent Guardrails](https://github.com/aporthq/aport-agent-guardrails) is one open-source (Apache 2.0) implementation of an OAP provider. It handles passport creation, local evaluation, and optional hosted API evaluation.
+
+```bash
+pip install aport-agent-guardrails
+aport setup --framework deerflow
+```
+
+This creates:
+- `~/.aport/deerflow/config.yaml` -- evaluator config (local or API mode)
+- `~/.aport/deerflow/aport/passport.json` -- OAP passport with capabilities and limits
+
+**config.yaml (using APort as the provider):**
+```yaml
+guardrails:
+  enabled: true
+  provider:
+    use: aport_guardrails.providers.generic:OAPGuardrailProvider
+```
+
+**config.yaml (using your own OAP provider):**
+```yaml
+guardrails:
+  enabled: true
+  provider:
+    use: my_oap_provider:MyOAPProvider
+    config:
+      passport_path: ./my-passport.json
+```
+
+Any provider that accepts `framework` as a kwarg and implements `evaluate`/`aevaluate` works. The OAP standard defines the passport format and decision codes; DeerFlow doesn't care which provider reads them.
+
+**What the passport controls:**
+
+| Passport field | What it does | Example |
+|---|---|---|
+| `capabilities[].id` | Which tool categories the agent can use | `system.command.execute`, `data.file.write` |
+| `limits.*.allowed_commands` | Which commands are allowed | `["git", "npm", "node"]` or `["*"]` for all |
+| `limits.*.blocked_patterns` | Patterns always denied | `["rm -rf", "sudo", "chmod 777"]` |
+| `status` | Kill switch | `active`, `suspended`, `revoked` |
+
+**Evaluation modes (provider-dependent):**
+
+OAP providers may support different evaluation modes. For example, the APort reference implementation supports:
+
+| Mode | How it works | Network | Latency |
+|---|---|---|---|
+| **Local** | Evaluates passport locally (bash script). | None | ~300ms |
+| **API** | Sends passport + context to a hosted evaluator. Signed decisions. | Yes | ~65ms |
+
+A custom OAP provider can implement any evaluation strategy -- the DeerFlow middleware doesn't care how the provider reaches its decision.
+
+**Try it:**
+1. Install and set up as above
+2. Start DeerFlow and ask: "Create a file called test.txt with content hello"
+3. Then ask: "Now delete it using bash rm -rf"
+4. Guardrail blocks it: `oap.blocked_pattern: Command contains blocked pattern: rm -rf`
+
+### Option 3: Custom Provider (Bring Your Own)
+
+Any Python class with `evaluate(request)` and `aevaluate(request)` methods works. No base class or inheritance needed -- it's a structural protocol.
+
+```python
+# my_guardrail.py
+
+class MyGuardrailProvider:
+    name = "my-company"
+
+    def evaluate(self, request):
+        from deerflow.guardrails.provider import GuardrailDecision, GuardrailReason
+
+        # Example: block any bash command containing "delete"
+        if request.tool_name == "bash" and "delete" in str(request.tool_input):
+            return GuardrailDecision(
+                allow=False,
+                reasons=[GuardrailReason(code="custom.blocked", message="delete not allowed")],
+                policy_id="custom.v1",
+            )
+        return GuardrailDecision(allow=True, reasons=[GuardrailReason(code="oap.allowed")])
+
+    async def aevaluate(self, request):
+        return self.evaluate(request)
+```
+
+**config.yaml:**
+```yaml
+guardrails:
+  enabled: true
+  provider:
+    use: my_guardrail:MyGuardrailProvider
+```
+
+Make sure `my_guardrail.py` is on the Python path (e.g. in the backend directory or installed as a package).
+
+**Try it:**
+1. Create `my_guardrail.py` in the backend directory
+2. Add the config
+3. Start DeerFlow and ask: "Use bash to delete test.txt"
+4. Your provider blocks it
+
+## Implementing a Provider
+
+### Required Interface
+
+```
+┌──────────────────────────────────────────────────┐
+│              GuardrailProvider Protocol            │
+│                                                   │
+│  name: str                                        │
+│                                                   │
+│  evaluate(request: GuardrailRequest)              │
+│      -> GuardrailDecision                         │
+│                                                   │
+│  aevaluate(request: GuardrailRequest)   (async)   │
+│      -> GuardrailDecision                         │
+└──────────────────────────────────────────────────┘
+
+┌──────────────────────────┐    ┌──────────────────────────┐
+│     GuardrailRequest      │    │    GuardrailDecision      │
+│                           │    │                           │
+│  tool_name: str           │    │  allow: bool              │
+│  tool_input: dict         │    │  reasons: [GuardrailReason]│
+│  agent_id: str | None     │    │  policy_id: str | None    │
+│  thread_id: str | None    │    │  metadata: dict           │
+│  is_subagent: bool        │    │                           │
+│  timestamp: str           │    │  GuardrailReason:         │
+│                           │    │    code: str              │
+└──────────────────────────┘    │    message: str           │
+                                └──────────────────────────┘
+```
+
+### DeerFlow Tool Names
+
+These are the tool names your provider will see in `request.tool_name`:
+
+| Tool | What it does |
+|---|---|
+| `bash` | Shell command execution |
+| `write_file` | Create/overwrite a file |
+| `str_replace` | Edit a file (find and replace) |
+| `read_file` | Read file content |
+| `ls` | List directory |
+| `web_search` | Web search query |
+| `web_fetch` | Fetch URL content |
+| `image_search` | Image search |
+| `present_file` | Present file to user |
+| `view_image` | Display image |
+| `ask_clarification` | Ask user a question |
+| `task` | Delegate to subagent |
+| `mcp__*` | MCP tools (dynamic) |
+
+### OAP Reason Codes
+
+Standard codes used by the [OAP specification](https://github.com/aporthq/aport-spec):
+
+| Code | Meaning |
+|---|---|
+| `oap.allowed` | Tool call authorized |
+| `oap.tool_not_allowed` | Tool not in allowlist |
+| `oap.command_not_allowed` | Command not in allowed_commands |
+| `oap.blocked_pattern` | Command matches a blocked pattern |
+| `oap.limit_exceeded` | Operation exceeds a limit |
+| `oap.passport_suspended` | Passport status is suspended/revoked |
+| `oap.evaluator_error` | Provider crashed (fail-closed) |
+
+### Provider Loading
+
+DeerFlow loads providers via `resolve_variable()` -- the same mechanism used for models, tools, and sandbox providers. The `use:` field is a Python class path: `package.module:ClassName`.
+
+The provider is instantiated with `**config` kwargs if `config:` is set, plus `framework="deerflow"` is always injected. Accept `**kwargs` to stay forward-compatible:
+
+```python
+class YourProvider:
+    def __init__(self, framework: str = "generic", **kwargs):
+        # framework="deerflow" tells you which config dir to use
+        ...
+```
+
+## Configuration Reference
+
+```yaml
+guardrails:
+  # Enable/disable guardrail middleware (default: false)
+  enabled: true
+
+  # Block tool calls if provider raises an exception (default: true)
+  fail_closed: true
+
+  # Passport reference -- passed as request.agent_id to the provider.
+  # File path, hosted agent ID, or null (provider resolves from its config).
+  passport: null
+
+  # Provider: loaded by class path via resolve_variable
+  provider:
+    use: deerflow.guardrails.builtin:AllowlistProvider
+    config:  # optional kwargs passed to provider.__init__
+      denied_tools: ["bash"]
+```
+
+## Testing
+
+```bash
+cd backend
+uv run python -m pytest tests/test_guardrail_middleware.py -v
+```
+
+25 tests covering:
+- AllowlistProvider: allow, deny, both allowlist+denylist, async
+- GuardrailMiddleware: allow passthrough, deny with OAP codes, fail-closed, fail-open, passport forwarding, empty reasons fallback, empty tool name, protocol isinstance check
+- Async paths: awrap_tool_call for allow, deny, fail-closed, fail-open
+- GraphBubbleUp: LangGraph control signals propagate through (not caught)
+- Config: defaults, from_dict, singleton load/reset
+
+## Files
+
+```
+packages/harness/deerflow/guardrails/
+    __init__.py              # Public exports
+    provider.py              # GuardrailProvider protocol, GuardrailRequest, GuardrailDecision
+    middleware.py             # GuardrailMiddleware (AgentMiddleware subclass)
+    builtin.py               # AllowlistProvider (zero deps)
+
+packages/harness/deerflow/config/
+    guardrails_config.py     # GuardrailsConfig Pydantic model + singleton
+
+packages/harness/deerflow/agents/middlewares/
+    tool_error_handling_middleware.py  # Registers GuardrailMiddleware in chain
+
+config.example.yaml          # Three provider options documented
+tests/test_guardrail_middleware.py  # 25 tests
+docs/GUARDRAILS.md           # This file
+```
--- a/deer-flow/backend/docs/HARNESS_APP_SPLIT.md
+++ b/deer-flow/backend/docs/HARNESS_APP_SPLIT.md
@@ -0,0 +1,343 @@
+# DeerFlow 后端拆分设计文档：Harness + App
+
+> 状态：Draft
+> 作者：DeerFlow Team
+> 日期：2026-03-13
+
+## 1. 背景与动机
+
+DeerFlow 后端当前是一个单一 Python 包（`src.*`），包含了从底层 agent 编排到上层用户产品的所有代码。随着项目发展，这种结构带来了几个问题：
+
+- **复用困难**：其他产品（CLI 工具、Slack bot、第三方集成）想用 agent 能力，必须依赖整个后端，包括 FastAPI、IM SDK 等不需要的依赖
+- **职责模糊**：agent 编排逻辑和用户产品逻辑混在同一个 `src/` 下，边界不清晰
+- **依赖膨胀**：LangGraph Server 运行时不需要 FastAPI/uvicorn/Slack SDK，但当前必须安装全部依赖
+
+本文档提出将后端拆分为两部分：**deerflow-harness**（可发布的 agent 框架包）和 **app**（不打包的用户产品代码）。
+
+## 2. 核心概念
+
+### 2.1 Harness（线束/框架层）
+
+Harness 是 agent 的构建与编排框架，回答 **"如何构建和运行 agent"** 的问题：
+
+- Agent 工厂与生命周期管理
+- Middleware pipeline
+- 工具系统（内置工具 + MCP + 社区工具）
+- 沙箱执行环境
+- 子 agent 委派
+- 记忆系统
+- 技能加载与注入
+- 模型工厂
+- 配置系统
+
+**Harness 是一个可发布的 Python 包**（`deerflow-harness`），可以独立安装和使用。
+
+**Harness 的设计原则**：对上层应用完全无感知。它不知道也不关心谁在调用它——可以是 Web App、CLI、Slack Bot、或者一个单元测试。
+
+### 2.2 App（应用层）
+
+App 是面向用户的产品代码，回答 **"如何将 agent 呈现给用户"** 的问题：
+
+- Gateway API（FastAPI REST 接口）
+- IM Channels（飞书、Slack、Telegram 集成）
+- Custom Agent 的 CRUD 管理
+- 文件上传/下载的 HTTP 接口
+
+**App 不打包、不发布**，它是 DeerFlow 项目内部的应用代码，直接运行。
+
+**App 依赖 Harness，但 Harness 不依赖 App。**
+
+### 2.3 边界划分
+
+| 模块 | 归属 | 说明 |
+|------|------|------|
+| `config/` | Harness | 配置系统是基础设施 |
+| `reflection/` | Harness | 动态模块加载工具 |
+| `utils/` | Harness | 通用工具函数 |
+| `agents/` | Harness | Agent 工厂、middleware、state、memory |
+| `subagents/` | Harness | 子 agent 委派系统 |
+| `sandbox/` | Harness | 沙箱执行环境 |
+| `tools/` | Harness | 工具注册与发现 |
+| `mcp/` | Harness | MCP 协议集成 |
+| `skills/` | Harness | 技能加载、解析、定义 schema |
+| `models/` | Harness | LLM 模型工厂 |
+| `community/` | Harness | 社区工具（tavily、jina 等） |
+| `client.py` | Harness | 嵌入式 Python 客户端 |
+| `gateway/` | App | FastAPI REST API |
+| `channels/` | App | IM 平台集成 |
+
+**关于 Custom Agents**：agent 定义格式（`config.yaml` + `SOUL.md` schema）由 Harness 层的 `config/agents_config.py` 定义，但文件的存储、CRUD、发现机制由 App 层的 `gateway/routers/agents.py` 负责。
+
+## 3. 目标架构
+
+### 3.1 目录结构
+
+```
+backend/
+├── packages/
+│   └── harness/
+│       ├── pyproject.toml          # deerflow-harness 包定义
+│       └── deerflow/               # Python 包根（import 前缀: deerflow.*）
+│           ├── __init__.py
+│           ├── config/
+│           ├── reflection/
+│           ├── utils/
+│           ├── agents/
+│           │   ├── lead_agent/
+│           │   ├── middlewares/
+│           │   ├── memory/
+│           │   ├── checkpointer/
+│           │   └── thread_state.py
+│           ├── subagents/
+│           ├── sandbox/
+│           ├── tools/
+│           ├── mcp/
+│           ├── skills/
+│           ├── models/
+│           ├── community/
+│           └── client.py
+├── app/                            # 不打包（import 前缀: app.*）
+│   ├── __init__.py
+│   ├── gateway/
+│   │   ├── __init__.py
+│   │   ├── app.py
+│   │   ├── config.py
+│   │   ├── path_utils.py
+│   │   └── routers/
+│   └── channels/
+│       ├── __init__.py
+│       ├── base.py
+│       ├── manager.py
+│       ├── service.py
+│       ├── store.py
+│       ├── message_bus.py
+│       ├── feishu.py
+│       ├── slack.py
+│       └── telegram.py
+├── pyproject.toml                  # uv workspace root
+├── langgraph.json
+├── tests/
+├── docs/
+└── Makefile
+```
+
+### 3.2 Import 规则
+
+两个层使用不同的 import 前缀，职责边界一目了然：
+
+```python
+# ---------------------------------------------------------------
+# Harness 内部互相引用（deerflow.* 前缀）
+# ---------------------------------------------------------------
+from deerflow.agents import make_lead_agent
+from deerflow.models import create_chat_model
+from deerflow.config import get_app_config
+from deerflow.tools import get_available_tools
+
+# ---------------------------------------------------------------
+# App 内部互相引用（app.* 前缀）
+# ---------------------------------------------------------------
+from app.gateway.app import app
+from app.gateway.routers.uploads import upload_files
+from app.channels.service import start_channel_service
+
+# ---------------------------------------------------------------
+# App 调用 Harness（单向依赖，Harness 永远不 import app）
+# ---------------------------------------------------------------
+from deerflow.agents import make_lead_agent
+from deerflow.models import create_chat_model
+from deerflow.skills import load_skills
+from deerflow.config.extensions_config import get_extensions_config
+```
+
+**App 调用 Harness 示例 — Gateway 中启动 agent**：
+
+```python
+# app/gateway/routers/chat.py
+from deerflow.agents.lead_agent.agent import make_lead_agent
+from deerflow.models import create_chat_model
+from deerflow.config import get_app_config
+
+async def create_chat_session(thread_id: str, model_name: str):
+    config = get_app_config()
+    model = create_chat_model(name=model_name)
+    agent = make_lead_agent(config=...)
+    # ... 使用 agent 处理用户消息
+```
+
+**App 调用 Harness 示例 — Channel 中查询 skills**：
+
+```python
+# app/channels/manager.py
+from deerflow.skills import load_skills
+from deerflow.agents.memory.updater import get_memory_data
+
+def handle_status_command():
+    skills = load_skills(enabled_only=True)
+    memory = get_memory_data()
+    return f"Skills: {len(skills)}, Memory facts: {len(memory.get('facts', []))}"
+```
+
+**禁止方向**：Harness 代码中绝不能出现 `from app.` 或 `import app.`。
+
+### 3.3 为什么 App 不打包
+
+| 方面 | 打包（放 packages/ 下） | 不打包（放 backend/app/） |
+|------|------------------------|--------------------------|
+| 命名空间 | 需要 pkgutil `extend_path` 合并，或独立前缀 | 天然独立，`app.*` vs `deerflow.*` |
+| 发布需求 | 没有——App 是项目内部代码 | 不需要 pyproject.toml |
+| 复杂度 | 需要管理两个包的构建、版本、依赖声明 | 直接运行，零额外配置 |
+| 运行方式 | `pip install deerflow-app` | `PYTHONPATH=. uvicorn app.gateway.app:app` |
+
+App 的唯一消费者是 DeerFlow 项目自身，没有独立发布的需求。放在 `backend/app/` 下作为普通 Python 包，通过 `PYTHONPATH` 或 editable install 让 Python 找到即可。
+
+### 3.4 依赖关系
+
+```
+┌─────────────────────────────────────┐
+│  app/  (不打包，直接运行)             │
+│  ├── fastapi, uvicorn               │
+│  ├── slack-sdk, lark-oapi, ...      │
+│  └── import deerflow.*              │
+└──────────────┬──────────────────────┘
+               │
+               ▼
+┌─────────────────────────────────────┐
+│  deerflow-harness  (可发布的包)       │
+│  ├── langgraph, langchain           │
+│  ├── markitdown, pydantic, ...      │
+│  └── 零 app 依赖                     │
+└─────────────────────────────────────┘
+```
+
+**依赖分类**：
+
+| 分类 | 依赖包 |
+|------|--------|
+| Harness only | agent-sandbox, langchain*, langgraph*, markdownify, markitdown, pydantic, pyyaml, readabilipy, tavily-python, firecrawl-py, tiktoken, ddgs, duckdb, httpx, kubernetes, dotenv |
+| App only | fastapi, uvicorn, sse-starlette, python-multipart, lark-oapi, slack-sdk, python-telegram-bot, markdown-to-mrkdwn |
+| Shared | langgraph-sdk（channels 用 HTTP client）, pydantic, httpx |
+
+### 3.5 Workspace 配置
+
+`backend/pyproject.toml`（workspace root）：
+
+```toml
+[project]
+name = "deer-flow"
+version = "0.1.0"
+requires-python = ">=3.12"
+dependencies = ["deerflow-harness"]
+
+[dependency-groups]
+dev = ["pytest>=8.0.0", "ruff>=0.14.11"]
+# App 的额外依赖（fastapi 等）也声明在 workspace root，因为 app 不打包
+app = ["fastapi", "uvicorn", "sse-starlette", "python-multipart"]
+channels = ["lark-oapi", "slack-sdk", "python-telegram-bot"]
+
+[tool.uv.workspace]
+members = ["packages/harness"]
+
+[tool.uv.sources]
+deerflow-harness = { workspace = true }
+```
+
+## 4. 当前的跨层依赖问题
+
+在拆分之前，需要先解决 `client.py` 中两处从 harness 到 app 的反向依赖：
+
+### 4.1 `_validate_skill_frontmatter`
+
+```python
+# client.py — harness 导入了 app 层代码
+from src.gateway.routers.skills import _validate_skill_frontmatter
+```
+
+**解决方案**：将该函数提取到 `deerflow/skills/validation.py`。这是一个纯逻辑函数（解析 YAML frontmatter、校验字段），与 FastAPI 无关。
+
+### 4.2 `CONVERTIBLE_EXTENSIONS` + `convert_file_to_markdown`
+
+```python
+# client.py — harness 导入了 app 层代码
+from src.gateway.routers.uploads import CONVERTIBLE_EXTENSIONS, convert_file_to_markdown
+```
+
+**解决方案**：将它们提取到 `deerflow/utils/file_conversion.py`。仅依赖 `markitdown` + `pathlib`，是通用工具函数。
+
+## 5. 基础设施变更
+
+### 5.1 LangGraph Server
+
+LangGraph Server 只需要 harness 包。`langgraph.json` 更新：
+
+```json
+{
+  "dependencies": ["./packages/harness"],
+  "graphs": {
+    "lead_agent": "deerflow.agents:make_lead_agent"
+  },
+  "checkpointer": {
+    "path": "./packages/harness/deerflow/agents/checkpointer/async_provider.py:make_checkpointer"
+  }
+}
+```
+
+### 5.2 Gateway API
+
+```bash
+# serve.sh / Makefile
+# PYTHONPATH 包含 backend/ 根目录，使 app.* 和 deerflow.* 都能被找到
+PYTHONPATH=. uvicorn app.gateway.app:app --host 0.0.0.0 --port 8001
+```
+
+### 5.3 Nginx
+
+无需变更（只做 URL 路由，不涉及 Python 模块路径）。
+
+### 5.4 Docker
+
+Dockerfile 中的 module 引用从 `src.` 改为 `deerflow.` / `app.`，`COPY` 命令需覆盖 `packages/` 和 `app/` 目录。
+
+## 6. 实施计划
+
+分 3 个 PR 递进执行：
+
+### PR 1：提取共享工具函数（Low Risk）
+
+1. 创建 `src/skills/validation.py`，从 `gateway/routers/skills.py` 提取 `_validate_skill_frontmatter`
+2. 创建 `src/utils/file_conversion.py`，从 `gateway/routers/uploads.py` 提取文件转换逻辑
+3. 更新 `client.py`、`gateway/routers/skills.py`、`gateway/routers/uploads.py` 的 import
+4. 运行全部测试确认无回归
+
+### PR 2：Rename + 物理拆分（High Risk，原子操作）
+
+1. 创建 `packages/harness/` 目录，创建 `pyproject.toml`
+2. `git mv` 将 harness 相关模块从 `src/` 移入 `packages/harness/deerflow/`
+3. `git mv` 将 app 相关模块从 `src/` 移入 `app/`
+4. 全局替换 import：
+   - harness 模块：`src.*` → `deerflow.*`（所有 `.py` 文件、`langgraph.json`、测试、文档）
+   - app 模块：`src.gateway.*` → `app.gateway.*`、`src.channels.*` → `app.channels.*`
+5. 更新 workspace root `pyproject.toml`
+6. 更新 `langgraph.json`、`Makefile`、`Dockerfile`
+7. `uv sync` + 全部测试 + 手动验证服务启动
+
+### PR 3：边界检查 + 文档（Low Risk）
+
+1. 添加 lint 规则：检查 harness 不 import app 模块
+2. 更新 `CLAUDE.md`、`README.md`
+
+## 7. 风险与缓解
+
+| 风险 | 影响 | 缓解措施 |
+|------|------|----------|
+| 全局 rename 误伤 | 字符串中的 `src` 被错误替换 | 正则精确匹配 `\bsrc\.`，review diff |
+| LangGraph Server 找不到模块 | 服务启动失败 | `langgraph.json` 的 `dependencies` 指向正确的 harness 包路径 |
+| App 的 `PYTHONPATH` 缺失 | Gateway/Channel 启动 import 报错 | Makefile/Docker 统一设置 `PYTHONPATH=.` |
+| `config.yaml` 中的 `use` 字段引用旧路径 | 运行时模块解析失败 | `config.yaml` 中的 `use` 字段同步更新为 `deerflow.*` |
+| 测试中 `sys.path` 混乱 | 测试失败 | 用 editable install（`uv sync`）确保 deerflow 可导入，`conftest.py` 中添加 `app/` 到 `sys.path` |
+
+## 8. 未来演进
+
+- **独立发布**：harness 可以发布到内部 PyPI，让其他项目直接 `pip install deerflow-harness`
+- **插件化 App**：不同的 app（web、CLI、bot）可以各自独立，都依赖同一个 harness
+- **更细粒度拆分**：如果 harness 内部模块继续增长，可以进一步拆分（如 `deerflow-sandbox`、`deerflow-mcp`）
--- a/deer-flow/backend/docs/MCP_SERVER.md
+++ b/deer-flow/backend/docs/MCP_SERVER.md
@@ -0,0 +1,65 @@
+# MCP (Model Context Protocol) Configuration
+
+DeerFlow supports configurable MCP servers and skills to extend its capabilities, which are loaded from a dedicated `extensions_config.json` file in the project root directory.
+
+## Setup
+
+1. Copy `extensions_config.example.json` to `extensions_config.json` in the project root directory.
+   ```bash
+   # Copy example configuration
+   cp extensions_config.example.json extensions_config.json
+   ```
+   
+2. Enable the desired MCP servers or skills by setting `"enabled": true`.
+3. Configure each server’s command, arguments, and environment variables as needed.
+4. Restart the application to load and register MCP tools.
+
+## OAuth Support (HTTP/SSE MCP Servers)
+
+For `http` and `sse` MCP servers, DeerFlow supports OAuth token acquisition and automatic token refresh.
+
+- Supported grants: `client_credentials`, `refresh_token`
+- Configure per-server `oauth` block in `extensions_config.json`
+- Secrets should be provided via environment variables (for example: `$MCP_OAUTH_CLIENT_SECRET`)
+
+Example:
+
+```json
+{
+   "mcpServers": {
+      "secure-http-server": {
+         "enabled": true,
+         "type": "http",
+         "url": "https://api.example.com/mcp",
+         "oauth": {
+            "enabled": true,
+            "token_url": "https://auth.example.com/oauth/token",
+            "grant_type": "client_credentials",
+            "client_id": "$MCP_OAUTH_CLIENT_ID",
+            "client_secret": "$MCP_OAUTH_CLIENT_SECRET",
+            "scope": "mcp.read",
+            "refresh_skew_seconds": 60
+         }
+      }
+   }
+}
+```
+
+## How It Works
+
+MCP servers expose tools that are automatically discovered and integrated into DeerFlow’s agent system at runtime. Once enabled, these tools become available to agents without additional code changes.
+
+## Example Capabilities
+
+MCP servers can provide access to:
+
+- **File systems**
+- **Databases** (e.g., PostgreSQL)
+- **External APIs** (e.g., GitHub, Brave Search)
+- **Browser automation** (e.g., Puppeteer)
+- **Custom MCP server implementations**
+
+## Learn More
+
+For detailed documentation about the Model Context Protocol, visit:  
+https://modelcontextprotocol.io
--- a/deer-flow/backend/docs/MEMORY_IMPROVEMENTS.md
+++ b/deer-flow/backend/docs/MEMORY_IMPROVEMENTS.md
@@ -0,0 +1,65 @@
+# Memory System Improvements
+
+This document tracks memory injection behavior and roadmap status.
+
+## Status (As Of 2026-03-10)
+
+Implemented in `main`:
+- Accurate token counting via `tiktoken` in `format_memory_for_injection`.
+- Facts are injected into prompt memory context.
+- Facts are ranked by confidence (descending).
+- Injection respects `max_injection_tokens` budget.
+
+Planned / not yet merged:
+- TF-IDF similarity-based fact retrieval.
+- `current_context` input for context-aware scoring.
+- Configurable similarity/confidence weights (`similarity_weight`, `confidence_weight`).
+- Middleware/runtime wiring for context-aware retrieval before each model call.
+
+## Current Behavior
+
+Function today:
+
+```python
+def format_memory_for_injection(memory_data: dict[str, Any], max_tokens: int = 2000) -> str:
+```
+
+Current injection format:
+- `User Context` section from `user.*.summary`
+- `History` section from `history.*.summary`
+- `Facts` section from `facts[]`, sorted by confidence, appended until token budget is reached
+
+Token counting:
+- Uses `tiktoken` (`cl100k_base`) when available
+- Falls back to `len(text) // 4` if tokenizer import fails
+
+## Known Gap
+
+Previous versions of this document described TF-IDF/context-aware retrieval as if it were already shipped.
+That was not accurate for `main` and caused confusion.
+
+Issue reference: `#1059`
+
+## Roadmap (Planned)
+
+Planned scoring strategy:
+
+```text
+final_score = (similarity * 0.6) + (confidence * 0.4)
+```
+
+Planned integration shape:
+1. Extract recent conversational context from filtered user/final-assistant turns.
+2. Compute TF-IDF cosine similarity between each fact and current context.
+3. Rank by weighted score and inject under token budget.
+4. Fall back to confidence-only ranking if context is unavailable.
+
+## Validation
+
+Current regression coverage includes:
+- facts inclusion in memory injection output
+- confidence ordering
+- token-budget-limited fact inclusion
+
+Tests:
+- `backend/tests/test_memory_prompt_injection.py`
--- a/deer-flow/backend/docs/MEMORY_IMPROVEMENTS_SUMMARY.md
+++ b/deer-flow/backend/docs/MEMORY_IMPROVEMENTS_SUMMARY.md
@@ -0,0 +1,38 @@
+# Memory System Improvements - Summary
+
+## Sync Note (2026-03-10)
+
+This summary is synchronized with the `main` branch implementation.
+TF-IDF/context-aware retrieval is **planned**, not merged yet.
+
+## Implemented
+
+- Accurate token counting with `tiktoken` in memory injection.
+- Facts are injected into `<memory>` prompt content.
+- Facts are ordered by confidence and bounded by `max_injection_tokens`.
+
+## Planned (Not Yet Merged)
+
+- TF-IDF cosine similarity recall based on recent conversation context.
+- `current_context` parameter for `format_memory_for_injection`.
+- Weighted ranking (`similarity` + `confidence`).
+- Runtime extraction/injection flow for context-aware fact selection.
+
+## Why This Sync Was Needed
+
+Earlier docs described TF-IDF behavior as already implemented, which did not match code in `main`.
+This mismatch is tracked in issue `#1059`.
+
+## Current API Shape
+
+```python
+def format_memory_for_injection(memory_data: dict[str, Any], max_tokens: int = 2000) -> str:
+```
+
+No `current_context` argument is currently available in `main`.
+
+## Verification Pointers
+
+- Implementation: `packages/harness/deerflow/agents/memory/prompt.py`
+- Prompt assembly: `packages/harness/deerflow/agents/lead_agent/prompt.py`
+- Regression tests: `backend/tests/test_memory_prompt_injection.py`
--- a/deer-flow/backend/docs/MEMORY_SETTINGS_REVIEW.md
+++ b/deer-flow/backend/docs/MEMORY_SETTINGS_REVIEW.md
@@ -0,0 +1,63 @@
+# Memory Settings Review
+
+Use this when reviewing the Memory Settings add/edit flow locally with the fewest possible manual steps.
+
+## Quick Review
+
+1. Start DeerFlow locally using any working development setup you already use.
+
+   Examples:
+
+   ```bash
+   make dev
+   ```
+
+   or
+
+   ```bash
+   make docker-start
+   ```
+
+   If you already have DeerFlow running locally, you can reuse that existing setup.
+
+2. Load the sample memory fixture.
+
+   ```bash
+   python scripts/load_memory_sample.py
+   ```
+
+3. Open `Settings > Memory`.
+
+   Default local URLs:
+   - App: `http://localhost:2026`
+   - Local frontend-only fallback: `http://localhost:3000`
+
+## Minimal Manual Test
+
+1. Click `Add fact`.
+2. Create a new fact with:
+   - Content: `Reviewer-added memory fact`
+   - Category: `testing`
+   - Confidence: `0.88`
+3. Confirm the new fact appears immediately and shows `Manual` as the source.
+4. Edit the sample fact `This sample fact is intended for edit testing.` and change it to:
+   - Content: `This sample fact was edited during manual review.`
+   - Category: `testing`
+   - Confidence: `0.91`
+5. Confirm the edited fact updates immediately.
+6. Refresh the page and confirm both the newly added fact and the edited fact still persist.
+
+## Optional Sanity Checks
+
+- Search `Reviewer-added` and confirm the new fact is matched.
+- Search `workflow` and confirm category text is searchable.
+- Switch between `All`, `Facts`, and `Summaries`.
+- Delete the disposable sample fact `Delete fact testing can target this disposable sample entry.` and confirm the list updates immediately.
+- Clear all memory and confirm the page enters the empty state.
+
+## Fixture Files
+
+- Sample fixture: `backend/docs/memory-settings-sample.json`
+- Default local runtime target: `backend/.deer-flow/memory.json`
+
+The loader script creates a timestamped backup automatically before overwriting an existing runtime memory file.
--- a/deer-flow/backend/docs/PATH_EXAMPLES.md
+++ b/deer-flow/backend/docs/PATH_EXAMPLES.md
@@ -0,0 +1,289 @@
+# 文件路径使用示例
+
+## 三种路径类型
+
+DeerFlow 的文件上传系统返回三种不同的路径，每种路径用于不同的场景：
+
+### 1. 实际文件系统路径 (path)
+
+```
+.deer-flow/threads/{thread_id}/user-data/uploads/document.pdf
+```
+
+**用途：**
+- 文件在服务器文件系统中的实际位置
+- 相对于 `backend/` 目录
+- 用于直接文件系统访问、备份、调试等
+
+**示例：**
+```python
+# Python 代码中直接访问
+from pathlib import Path
+file_path = Path("backend/.deer-flow/threads/abc123/user-data/uploads/document.pdf")
+content = file_path.read_bytes()
+```
+
+### 2. 虚拟路径 (virtual_path)
+
+```
+/mnt/user-data/uploads/document.pdf
+```
+
+**用途：**
+- Agent 在沙箱环境中使用的路径
+- 沙箱系统会自动映射到实际路径
+- Agent 的所有文件操作工具都使用这个路径
+
+**示例：**
+Agent 在对话中使用：
+```python
+# Agent 使用 read_file 工具
+read_file(path="/mnt/user-data/uploads/document.pdf")
+
+# Agent 使用 bash 工具
+bash(command="cat /mnt/user-data/uploads/document.pdf")
+```
+
+### 3. HTTP 访问 URL (artifact_url)
+
+```
+/api/threads/{thread_id}/artifacts/mnt/user-data/uploads/document.pdf
+```
+
+**用途：**
+- 前端通过 HTTP 访问文件
+- 用于下载、预览文件
+- 可以直接在浏览器中打开
+
+**示例：**
+```typescript
+// 前端 TypeScript/JavaScript 代码
+const threadId = 'abc123';
+const filename = 'document.pdf';
+
+// 下载文件
+const downloadUrl = `/api/threads/${threadId}/artifacts/mnt/user-data/uploads/${filename}?download=true`;
+window.open(downloadUrl);
+
+// 在新窗口预览
+const viewUrl = `/api/threads/${threadId}/artifacts/mnt/user-data/uploads/${filename}`;
+window.open(viewUrl, '_blank');
+
+// 使用 fetch API 获取
+const response = await fetch(viewUrl);
+const blob = await response.blob();
+```
+
+## 完整使用流程示例
+
+### 场景：前端上传文件并让 Agent 处理
+
+```typescript
+// 1. 前端上传文件
+async function uploadAndProcess(threadId: string, file: File) {
+  // 上传文件
+  const formData = new FormData();
+  formData.append('files', file);
+
+  const uploadResponse = await fetch(
+    `/api/threads/${threadId}/uploads`,
+    {
+      method: 'POST',
+      body: formData
+    }
+  );
+
+  const uploadData = await uploadResponse.json();
+  const fileInfo = uploadData.files[0];
+
+  console.log('文件信息：', fileInfo);
+  // {
+  //   filename: "report.pdf",
+  //   path: ".deer-flow/threads/abc123/user-data/uploads/report.pdf",
+  //   virtual_path: "/mnt/user-data/uploads/report.pdf",
+  //   artifact_url: "/api/threads/abc123/artifacts/mnt/user-data/uploads/report.pdf",
+  //   markdown_file: "report.md",
+  //   markdown_path: ".deer-flow/threads/abc123/user-data/uploads/report.md",
+  //   markdown_virtual_path: "/mnt/user-data/uploads/report.md",
+  //   markdown_artifact_url: "/api/threads/abc123/artifacts/mnt/user-data/uploads/report.md"
+  // }
+
+  // 2. 发送消息给 Agent
+  await sendMessage(threadId, "请分析刚上传的 PDF 文件");
+
+  // Agent 会自动看到文件列表，包含：
+  // - report.pdf (虚拟路径: /mnt/user-data/uploads/report.pdf)
+  // - report.md (虚拟路径: /mnt/user-data/uploads/report.md)
+
+  // 3. 前端可以直接访问转换后的 Markdown
+  const mdResponse = await fetch(fileInfo.markdown_artifact_url);
+  const markdownContent = await mdResponse.text();
+  console.log('Markdown 内容：', markdownContent);
+
+  // 4. 或者下载原始 PDF
+  const downloadLink = document.createElement('a');
+  downloadLink.href = fileInfo.artifact_url + '?download=true';
+  downloadLink.download = fileInfo.filename;
+  downloadLink.click();
+}
+```
+
+## 路径转换表
+
+| 场景 | 使用的路径类型 | 示例 |
+|------|---------------|------|
+| 服务器后端代码直接访问 | `path` | `.deer-flow/threads/abc123/user-data/uploads/file.pdf` |
+| Agent 工具调用 | `virtual_path` | `/mnt/user-data/uploads/file.pdf` |
+| 前端下载/预览 | `artifact_url` | `/api/threads/abc123/artifacts/mnt/user-data/uploads/file.pdf` |
+| 备份脚本 | `path` | `.deer-flow/threads/abc123/user-data/uploads/file.pdf` |
+| 日志记录 | `path` | `.deer-flow/threads/abc123/user-data/uploads/file.pdf` |
+
+## 代码示例集合
+
+### Python - 后端处理
+
+```python
+from pathlib import Path
+from deerflow.agents.middlewares.thread_data_middleware import THREAD_DATA_BASE_DIR
+
+def process_uploaded_file(thread_id: str, filename: str):
+    # 使用实际路径
+    base_dir = Path.cwd() / THREAD_DATA_BASE_DIR / thread_id / "user-data" / "uploads"
+    file_path = base_dir / filename
+
+    # 直接读取
+    with open(file_path, 'rb') as f:
+        content = f.read()
+
+    return content
+```
+
+### JavaScript - 前端访问
+
+```javascript
+// 列出已上传的文件
+async function listUploadedFiles(threadId) {
+  const response = await fetch(`/api/threads/${threadId}/uploads/list`);
+  const data = await response.json();
+
+  // 为每个文件创建下载链接
+  data.files.forEach(file => {
+    console.log(`文件: ${file.filename}`);
+    console.log(`下载: ${file.artifact_url}?download=true`);
+    console.log(`预览: ${file.artifact_url}`);
+
+    // 如果是文档，还有 Markdown 版本
+    if (file.markdown_artifact_url) {
+      console.log(`Markdown: ${file.markdown_artifact_url}`);
+    }
+  });
+
+  return data.files;
+}
+
+// 删除文件
+async function deleteFile(threadId, filename) {
+  const response = await fetch(
+    `/api/threads/${threadId}/uploads/${filename}`,
+    { method: 'DELETE' }
+  );
+  return response.json();
+}
+```
+
+### React 组件示例
+
+```tsx
+import React, { useState, useEffect } from 'react';
+
+interface UploadedFile {
+  filename: string;
+  size: number;
+  path: string;
+  virtual_path: string;
+  artifact_url: string;
+  extension: string;
+  modified: number;
+  markdown_artifact_url?: string;
+}
+
+function FileUploadList({ threadId }: { threadId: string }) {
+  const [files, setFiles] = useState<UploadedFile[]>([]);
+
+  useEffect(() => {
+    fetchFiles();
+  }, [threadId]);
+
+  async function fetchFiles() {
+    const response = await fetch(`/api/threads/${threadId}/uploads/list`);
+    const data = await response.json();
+    setFiles(data.files);
+  }
+
+  async function handleUpload(event: React.ChangeEvent<HTMLInputElement>) {
+    const fileList = event.target.files;
+    if (!fileList) return;
+
+    const formData = new FormData();
+    Array.from(fileList).forEach(file => {
+      formData.append('files', file);
+    });
+
+    await fetch(`/api/threads/${threadId}/uploads`, {
+      method: 'POST',
+      body: formData
+    });
+
+    fetchFiles(); // 刷新列表
+  }
+
+  async function handleDelete(filename: string) {
+    await fetch(`/api/threads/${threadId}/uploads/${filename}`, {
+      method: 'DELETE'
+    });
+    fetchFiles(); // 刷新列表
+  }
+
+  return (
+    <div>
+      <input type="file" multiple onChange={handleUpload} />
+
+      <ul>
+        {files.map(file => (
+          <li key={file.filename}>
+            <span>{file.filename}</span>
+            <a href={file.artifact_url} target="_blank">预览</a>
+            <a href={`${file.artifact_url}?download=true`}>下载</a>
+            {file.markdown_artifact_url && (
+              <a href={file.markdown_artifact_url} target="_blank">Markdown</a>
+            )}
+            <button onClick={() => handleDelete(file.filename)}>删除</button>
+          </li>
+        ))}
+      </ul>
+    </div>
+  );
+}
+```
+
+## 注意事项
+
+1. **路径安全性**
+   - 实际路径（`path`）包含线程 ID，确保隔离
+   - API 会验证路径，防止目录遍历攻击
+   - 前端不应直接使用 `path`，而应使用 `artifact_url`
+
+2. **Agent 使用**
+   - Agent 只能看到和使用 `virtual_path`
+   - 沙箱系统自动映射到实际路径
+   - Agent 不需要知道实际的文件系统结构
+
+3. **前端集成**
+   - 始终使用 `artifact_url` 访问文件
+   - 不要尝试直接访问文件系统路径
+   - 使用 `?download=true` 参数强制下载
+
+4. **Markdown 转换**
+   - 转换成功时，会返回额外的 `markdown_*` 字段
+   - 建议优先使用 Markdown 版本（更易处理）
+   - 原始文件始终保留
--- a/deer-flow/backend/docs/README.md
+++ b/deer-flow/backend/docs/README.md
@@ -0,0 +1,55 @@
+# Documentation
+
+This directory contains detailed documentation for the DeerFlow backend.
+
+## Quick Links
+
+| Document | Description |
+|----------|-------------|
+| [ARCHITECTURE.md](ARCHITECTURE.md) | System architecture overview |
+| [API.md](API.md) | Complete API reference |
+| [CONFIGURATION.md](CONFIGURATION.md) | Configuration options |
+| [SETUP.md](SETUP.md) | Quick setup guide |
+
+## Feature Documentation
+
+| Document | Description |
+|----------|-------------|
+| [STREAMING.md](STREAMING.md) | Token-level streaming design: Gateway vs DeerFlowClient paths, `stream_mode` semantics, per-id dedup |
+| [FILE_UPLOAD.md](FILE_UPLOAD.md) | File upload functionality |
+| [PATH_EXAMPLES.md](PATH_EXAMPLES.md) | Path types and usage examples |
+| [summarization.md](summarization.md) | Context summarization feature |
+| [plan_mode_usage.md](plan_mode_usage.md) | Plan mode with TodoList |
+| [AUTO_TITLE_GENERATION.md](AUTO_TITLE_GENERATION.md) | Automatic title generation |
+
+## Development
+
+| Document | Description |
+|----------|-------------|
+| [TODO.md](TODO.md) | Planned features and known issues |
+
+## Getting Started
+
+1. **New to DeerFlow?** Start with [SETUP.md](SETUP.md) for quick installation
+2. **Configuring the system?** See [CONFIGURATION.md](CONFIGURATION.md)
+3. **Understanding the architecture?** Read [ARCHITECTURE.md](ARCHITECTURE.md)
+4. **Building integrations?** Check [API.md](API.md) for API reference
+
+## Document Organization
+
+```
+docs/
+├── README.md                  # This file
+├── ARCHITECTURE.md            # System architecture
+├── API.md                     # API reference
+├── CONFIGURATION.md           # Configuration guide
+├── SETUP.md                   # Setup instructions
+├── FILE_UPLOAD.md             # File upload feature
+├── PATH_EXAMPLES.md           # Path usage examples
+├── summarization.md           # Summarization feature
+├── plan_mode_usage.md         # Plan mode feature
+├── STREAMING.md               # Token-level streaming design
+├── AUTO_TITLE_GENERATION.md   # Title generation
+├── TITLE_GENERATION_IMPLEMENTATION.md  # Title implementation details
+└── TODO.md                    # Roadmap and issues
+```
--- a/deer-flow/backend/docs/SETUP.md
+++ b/deer-flow/backend/docs/SETUP.md
@@ -0,0 +1,92 @@
+# Setup Guide
+
+Quick setup instructions for DeerFlow.
+
+## Configuration Setup
+
+DeerFlow uses a YAML configuration file that should be placed in the **project root directory**.
+
+### Steps
+
+1. **Navigate to project root**:
+   ```bash
+   cd /path/to/deer-flow
+   ```
+
+2. **Copy example configuration**:
+   ```bash
+   cp config.example.yaml config.yaml
+   ```
+
+3. **Edit configuration**:
+   ```bash
+   # Option A: Set environment variables (recommended)
+   export OPENAI_API_KEY="your-key-here"
+
+   # Option B: Edit config.yaml directly
+   vim config.yaml  # or your preferred editor
+   ```
+
+4. **Verify configuration**:
+   ```bash
+   cd backend
+   python -c "from deerflow.config import get_app_config; print('✓ Config loaded:', get_app_config().models[0].name)"
+   ```
+
+## Important Notes
+
+- **Location**: `config.yaml` should be in `deer-flow/` (project root), not `deer-flow/backend/`
+- **Git**: `config.yaml` is automatically ignored by git (contains secrets)
+- **Priority**: If both `backend/config.yaml` and `../config.yaml` exist, backend version takes precedence
+
+## Configuration File Locations
+
+The backend searches for `config.yaml` in this order:
+
+1. `DEER_FLOW_CONFIG_PATH` environment variable (if set)
+2. `backend/config.yaml` (current directory when running from backend/)
+3. `deer-flow/config.yaml` (parent directory - **recommended location**)
+
+**Recommended**: Place `config.yaml` in project root (`deer-flow/config.yaml`).
+
+## Sandbox Setup (Optional but Recommended)
+
+If you plan to use Docker/Container-based sandbox (configured in `config.yaml` under `sandbox.use: deerflow.community.aio_sandbox:AioSandboxProvider`), it's highly recommended to pre-pull the container image:
+
+```bash
+# From project root
+make setup-sandbox
+```
+
+**Why pre-pull?**
+- The sandbox image (~500MB+) is pulled on first use, causing a long wait
+- Pre-pulling provides clear progress indication
+- Avoids confusion when first using the agent
+
+If you skip this step, the image will be automatically pulled on first agent execution, which may take several minutes depending on your network speed.
+
+## Troubleshooting
+
+### Config file not found
+
+```bash
+# Check where the backend is looking
+cd deer-flow/backend
+python -c "from deerflow.config.app_config import AppConfig; print(AppConfig.resolve_config_path())"
+```
+
+If it can't find the config:
+1. Ensure you've copied `config.example.yaml` to `config.yaml`
+2. Verify you're in the correct directory
+3. Check the file exists: `ls -la ../config.yaml`
+
+### Permission denied
+
+```bash
+chmod 600 ../config.yaml  # Protect sensitive configuration
+```
+
+## See Also
+
+- [Configuration Guide](CONFIGURATION.md) - Detailed configuration options
+- [Architecture Overview](../CLAUDE.md) - System architecture
--- a/deer-flow/backend/docs/STREAMING.md
+++ b/deer-flow/backend/docs/STREAMING.md
@@ -0,0 +1,351 @@
+# DeerFlow 流式输出设计
+
+本文档解释 DeerFlow 是如何把 LangGraph agent 的事件流端到端送到两类消费者（HTTP 客户端、嵌入式 Python 调用方）的：两条路径为什么**必须**并存、它们各自的契约是什么、以及设计里那些 non-obvious 的不变式。
+
+---
+
+## TL;DR
+
+- DeerFlow 有**两条并行**的流式路径：**Gateway 路径**（async / HTTP SSE / JSON 序列化）服务浏览器和 IM 渠道；**DeerFlowClient 路径**（sync / in-process / 原生 LangChain 对象）服务 Jupyter、脚本、测试。它们**无法合并**——消费者模型不同。
+- 两条路径都从 `create_agent()` 工厂出发，核心都是订阅 LangGraph 的 `stream_mode=["values", "messages", "custom"]`。`values` 是节点级 state 快照，`messages` 是 LLM token 级 delta，`custom` 是显式 `StreamWriter` 事件。**这三种模式不是详细程度的梯度，是三个独立的事件源**，要 token 流就必须显式订阅 `messages`。
+- 嵌入式 client 为每个 `stream()` 调用维护三个 `set[str]`：`seen_ids` / `streamed_ids` / `counted_usage_ids`。三者看起来相似但管理**三个独立的不变式**，不能合并。
+
+---
+
+## 为什么有两条流式路径
+
+两条路径服务的消费者模型根本不同：
+
+| 维度 | Gateway 路径 | DeerFlowClient 路径 |
+|---|---|---|
+| 入口 | FastAPI `/runs/stream` endpoint | `DeerFlowClient.stream(message)` |
+| 触发层 | `runtime/runs/worker.py::run_agent` | `packages/harness/deerflow/client.py::DeerFlowClient.stream` |
+| 执行模型 | `async def` + `agent.astream()` | sync generator + `agent.stream()` |
+| 事件传输 | `StreamBridge`（asyncio Queue）+ `sse_consumer` | 直接 `yield` |
+| 序列化 | `serialize(chunk)` → 纯 JSON dict，匹配 LangGraph Platform wire 格式 | `StreamEvent.data`，携带原生 LangChain 对象 |
+| 消费者 | 前端 `useStream` React hook、飞书/Slack/Telegram channel、LangGraph SDK 客户端 | Jupyter notebook、集成测试、内部 Python 脚本 |
+| 生命周期管理 | `RunManager`：run_id 跟踪、disconnect 语义、multitask 策略、heartbeat | 无；函数返回即结束 |
+| 断连恢复 | `Last-Event-ID` SSE 重连 | 无需要 |
+
+**两条路径的存在是 DRY 的刻意妥协**：Gateway 的全部基础设施（async + Queue + JSON + RunManager）**都是为了跨网络边界把事件送给 HTTP 消费者**。当生产者（agent）和消费者（Python 调用栈）在同一个进程时，这整套东西都是纯开销。
+
+### 为什么不能让 DeerFlowClient 复用 Gateway
+
+曾经考虑过三种复用方案，都被否决：
+
+1. **让 `client.stream()` 变成 `async def client.astream()`**  
+   breaking change。用户用不上的 `async for` / `asyncio.run()` 要硬塞进 Jupyter notebook 和同步脚本。DeerFlowClient 的一大卖点（"把 agent 当普通函数调用"）直接消失。
+
+2. **在 `client.stream()` 内部起一个独立事件循环线程，用 `StreamBridge` 在 sync/async 之间做桥接**  
+   引入线程池、队列、信号量。为了"消除重复"，把**复杂度**代替代码行数引进来。是典型的"wrong abstraction"——开销高于复用收益。
+
+3. **让 `run_agent` 自己兼容 sync mode**  
+   给 Gateway 加一条用不到的死分支，污染 worker.py 的焦点。
+
+所以两条路径的事件处理逻辑会**相似但不共享**。这是刻意设计，不是疏忽。
+
+---
+
+## LangGraph `stream_mode` 三层语义
+
+LangGraph 的 `agent.stream(stream_mode=[...])` 是**多路复用**接口：一次订阅多个 mode，每个 mode 是一个独立的事件源。三种核心 mode：
+
+```mermaid
+flowchart LR
+    classDef values fill:#B8C5D1,stroke:#5A6B7A,color:#2C3E50
+    classDef messages fill:#C9B8A8,stroke:#7A6B5A,color:#2C3E50
+    classDef custom fill:#B5C4B1,stroke:#5A7A5A,color:#2C3E50
+
+    subgraph LG["LangGraph agent graph"]
+        direction TB
+        Node1["node: LLM call"]
+        Node2["node: tool call"]
+        Node3["node: reducer"]
+    end
+
+    LG -->|"每个节点完成后"| V["values: 完整 state 快照"]
+    Node1 -->|"LLM 每产生一个 token"| M["messages: (AIMessageChunk, meta)"]
+    Node1 -->|"StreamWriter.write()"| C["custom: 任意 dict"]
+
+    class V values
+    class M messages
+    class C custom
+```
+
+| Mode | 发射时机 | Payload | 粒度 |
+|---|---|---|---|
+| `values` | 每个 graph 节点完成后 | 完整 state dict（title、messages、artifacts）| 节点级 |
+| `messages` | LLM 每次 yield 一个 chunk；tool 节点完成时 | `(AIMessageChunk \| ToolMessage, metadata_dict)` | token 级 |
+| `custom` | 用户代码显式调用 `StreamWriter.write()` | 任意 dict | 应用定义 |
+
+### 两套命名的由来
+
+同一件事在**三个协议层**有三个名字：
+
+```
+Application                    HTTP / SSE                    LangGraph Graph
+┌──────────────┐               ┌──────────────┐              ┌──────────────┐
+│ frontend     │               │ LangGraph    │              │ agent.astream│
+│ useStream    │──"messages-   │ Platform SDK │──"messages"──│ graph.astream│
+│ Feishu IM    │   tuple"──────│ HTTP wire    │              │              │
+└──────────────┘               └──────────────┘              └──────────────┘
+```
+
+- **Graph 层**（`agent.stream` / `agent.astream`）：LangGraph Python 直接 API，mode 叫 **`"messages"`**。
+- **Platform SDK 层**（`langgraph-sdk` HTTP client）：跨进程 HTTP 契约，mode 叫 **`"messages-tuple"`**。
+- **Gateway worker** 显式做翻译：`if m == "messages-tuple": lg_modes.append("messages")`（`runtime/runs/worker.py:117-121`）。
+
+**后果**：`DeerFlowClient.stream()` 直接调 `agent.stream()`（Graph 层），所以必须传 `"messages"`。`app/channels/manager.py` 通过 `langgraph-sdk` 走 HTTP SDK，所以传 `"messages-tuple"`。**这两个字符串不能互相替代**，也不能抽成"一个共享常量"——它们是不同协议层的 type alias，共享只会让某一层说不是它母语的话。
+
+---
+
+## Gateway 路径：async + HTTP SSE
+
+```mermaid
+sequenceDiagram
+    participant Client as HTTP Client
+    participant API as FastAPI<br/>thread_runs.py
+    participant Svc as services.py<br/>start_run
+    participant Worker as worker.py<br/>run_agent (async)
+    participant Bridge as StreamBridge<br/>(asyncio.Queue)
+    participant Agent as LangGraph<br/>agent.astream
+    participant SSE as sse_consumer
+
+    Client->>API: POST /runs/stream
+    API->>Svc: start_run(body)
+    Svc->>Bridge: create bridge
+    Svc->>Worker: asyncio.create_task(run_agent(...))
+    Svc-->>API: StreamingResponse(sse_consumer)
+    API-->>Client: event-stream opens
+
+    par worker (producer)
+        Worker->>Agent: astream(stream_mode=lg_modes)
+        loop 每个 chunk
+            Agent-->>Worker: (mode, chunk)
+            Worker->>Bridge: publish(run_id, event, serialize(chunk))
+        end
+        Worker->>Bridge: publish_end(run_id)
+    and sse_consumer (consumer)
+        SSE->>Bridge: subscribe(run_id)
+        loop 每个 event
+            Bridge-->>SSE: StreamEvent
+            SSE-->>Client: "event: <name>\ndata: <json>\n\n"
+        end
+    end
+```
+
+关键组件：
+
+- `runtime/runs/worker.py::run_agent` — 在 `asyncio.Task` 里跑 `agent.astream()`，把每个 chunk 通过 `serialize(chunk, mode=mode)` 转成 JSON，再 `bridge.publish()`。
+- `runtime/stream_bridge` — 抽象 Queue。`publish/subscribe` 解耦生产者和消费者，支持 `Last-Event-ID` 重连、心跳、多订阅者 fan-out。
+- `app/gateway/services.py::sse_consumer` — 从 bridge 订阅，格式化为 SSE wire 帧。
+- `runtime/serialization.py::serialize` — mode-aware 序列化；`messages` mode 下 `serialize_messages_tuple` 把 `(chunk, metadata)` 转成 `[chunk.model_dump(), metadata]`。
+
+**`StreamBridge` 的存在价值**：当生产者（`run_agent` 任务）和消费者（HTTP 连接）在不同的 asyncio task 里运行时，需要一个可以跨 task 传递事件的中介。Queue 同时还承担断连重连的 buffer 和多订阅者的 fan-out。
+
+---
+
+## DeerFlowClient 路径：sync + in-process
+
+```mermaid
+sequenceDiagram
+    participant User as Python caller
+    participant Client as DeerFlowClient.stream
+    participant Agent as LangGraph<br/>agent.stream (sync)
+
+    User->>Client: for event in client.stream("hi"):
+    Client->>Agent: stream(stream_mode=["values","messages","custom"])
+    loop 每个 chunk
+        Agent-->>Client: (mode, chunk)
+        Client->>Client: 分发 mode<br/>构建 StreamEvent
+        Client-->>User: yield StreamEvent
+    end
+    Client-->>User: yield StreamEvent(type="end")
+```
+
+对比之下，sync 路径的每个环节都是显著更少的移动部件：
+
+- 没有 `RunManager` —— 一次 `stream()` 调用对应一次生命周期，无需 run_id。
+- 没有 `StreamBridge` —— 直接 `yield`，生产和消费在同一个 Python 调用栈，不需要跨 task 中介。
+- 没有 JSON 序列化 —— `StreamEvent.data` 直接装原生 LangChain 对象（`AIMessage.content`、`usage_metadata` 的 `UsageMetadata` TypedDict）。Jupyter 用户拿到的是真正的类型，不是匿名 dict。
+- 没有 asyncio —— 调用者可以直接 `for event in ...`，不必写 `async for`。
+
+---
+
+## 消费语义：delta vs cumulative
+
+LangGraph `messages` mode 给出的是 **delta**：每个 `AIMessageChunk.content` 只包含这一次新 yield 的 token，**不是**从头的累计文本。
+
+这个语义和 LangChain 的 `fs2 Stream` 风格一致：**上游发增量，下游负责累加**。Gateway 路径里前端 `useStream` React hook 自己维护累加器；DeerFlowClient 路径里 `chat()` 方法替调用者做累加。
+
+### `DeerFlowClient.chat()` 的 O(n) 累加器
+
+```python
+chunks: dict[str, list[str]] = {}
+last_id: str = ""
+for event in self.stream(message, thread_id=thread_id, **kwargs):
+    if event.type == "messages-tuple" and event.data.get("type") == "ai":
+        msg_id = event.data.get("id") or ""
+        delta = event.data.get("content", "")
+        if delta:
+            chunks.setdefault(msg_id, []).append(delta)
+            last_id = msg_id
+return "".join(chunks.get(last_id, ()))
+```
+
+**为什么不是 `buffers[id] = buffers.get(id,"") + delta`**：CPython 的字符串 in-place concat 优化仅在 refcount=1 且 LHS 是 local name 时生效；这里字符串存在 dict 里被 reassign，优化失效，每次都是 O(n) 拷贝 → 总体 O(n²)。实测 50 KB / 5000 chunk 的回复要 100-300ms 纯拷贝开销。用 `list` + `"".join()` 是 O(n)。
+
+---
+
+## 三个 id set 为什么不能合并
+
+`DeerFlowClient.stream()` 在一次调用生命周期内维护三个 `set[str]`：
+
+```python
+seen_ids: set[str] = set()           # values 路径内部 dedup
+streamed_ids: set[str] = set()       # messages → values 跨模式 dedup
+counted_usage_ids: set[str] = set()  # usage_metadata 幂等计数
+```
+
+乍看像是"三份几乎一样的东西"，实际每个管**不同的不变式**。
+
+| Set | 负责的不变式 | 被谁填充 | 被谁查询 |
+|---|---|---|---|
+| `seen_ids` | 连续两个 `values` 快照里同一条 message 只生成一个 `messages-tuple` 事件 | values 分支每处理一条消息就加入 | values 分支处理下一条消息前检查 |
+| `streamed_ids` | 如果一条消息已经通过 `messages` 模式 token 级流过，values 快照到达时**不要**再合成一次完整 `messages-tuple` | messages 分支每发一个 AI/tool 事件就加入 | values 分支看到消息时检查 |
+| `counted_usage_ids` | 同一个 `usage_metadata` 在 messages 末尾 chunk 和 values 快照的 final AIMessage 里各带一份，**累计总量只算一次** | `_account_usage()` 每次接受 usage 就加入 | `_account_usage()` 每次调用时检查 |
+
+### 为什么不能只用一个 set
+
+关键观察：**同一个 message id 在这三个 set 里的加入时机不同**。
+
+```mermaid
+sequenceDiagram
+    participant M as messages mode
+    participant V as values mode
+    participant SS as streamed_ids
+    participant SU as counted_usage_ids
+    participant SE as seen_ids
+
+    Note over M: 第一个 AI text chunk 到达
+    M->>SS: add(msg_id)
+    Note over M: 最后一个 chunk 带 usage
+    M->>SU: add(msg_id)
+    Note over V: snapshot 到达，包含同一条 AI message
+    V->>SE: add(msg_id)
+    V->>SS: 查询 → 已存在，跳过文本合成
+    V->>SU: 查询 → 已存在，不重复计数
+```
+
+- `seen_ids` **永远在 values 快照到达时**加入，所以它是 "values 已处理" 的标记。一条只出现在 messages 流里的消息（罕见但可能），`seen_ids` 里永远没有它。
+- `streamed_ids` **在 messages 流的第一个有效事件时**加入。一条只通过 values 快照到达的非 AI 消息（HumanMessage、被 truncate 的 tool 消息），`streamed_ids` 里永远没有它。
+- `counted_usage_ids` **只在看到非空 `usage_metadata` 时**加入。一条完全没有 usage 的消息（tool message、错误消息）永远不会进去。
+
+**集合包含关系**：`counted_usage_ids ⊆ (streamed_ids ∪ seen_ids)` 大致成立，但**不是严格子集**，因为一条消息可以在 messages 模式流完 text 但**在最后那个带 usage 的 chunk 之前**就被 values snapshot 赶上——此时它已经在 `streamed_ids` 里，但还不在 `counted_usage_ids` 里。把它们合并成一个 dict-of-flags 会让这个微妙的时序依赖**从类型系统里消失**，变成注释里的一句话。三个独立的 set 把不变式显式化了：每个 set 名对应一个可以口头回答的问题。
+
+---
+
+## 端到端：一次真实对话的事件时序
+
+假设调用 `client.stream("Count from 1 to 15")`，LLM 给出 "one\ntwo\n...\nfifteen"（88 字符），tokenizer 把它拆成 ~35 个 BPE chunk。下面是事件到达序列的精简版：
+
+```mermaid
+sequenceDiagram
+    participant U as User
+    participant C as DeerFlowClient
+    participant A as LangGraph<br/>agent.stream
+
+    U->>C: stream("Count ... 15")
+    C->>A: stream(mode=["values","messages","custom"])
+
+    A-->>C: ("values", {messages: [HumanMessage]})
+    C-->>U: StreamEvent(type="values", ...)
+
+    Note over A,C: LLM 开始 yield token
+    loop 35 次，约 476ms
+        A-->>C: ("messages", (AIMessageChunk(content="ele"), meta))
+        C->>C: streamed_ids.add(ai-1)
+        C-->>U: StreamEvent(type="messages-tuple",<br/>data={type:ai, content:"ele", id:ai-1})
+    end
+
+    Note over A: LLM finish_reason=stop，最后一个 chunk 带 usage
+    A-->>C: ("messages", (AIMessageChunk(content="", usage_metadata={...}), meta))
+    C->>C: counted_usage_ids.add(ai-1)<br/>(无文本，不 yield)
+
+    A-->>C: ("values", {messages: [..., AIMessage(complete)]})
+    C->>C: ai-1 in streamed_ids → 跳过合成
+    C->>C: 捕获 usage (已在 counted_usage_ids，no-op)
+    C-->>U: StreamEvent(type="values", ...)
+
+    C-->>U: StreamEvent(type="end", data={usage:{...}})
+```
+
+关键观察：
+
+1. 用户看到 **35 个 messages-tuple 事件**，跨越约 476ms，每个事件带一个 token delta 和同一个 `id=ai-1`。
+2. 最后一个 `values` 快照里的 `AIMessage` **不会**再触发一个完整的 `messages-tuple` 事件——因为 `ai-1 in streamed_ids` 跳过了合成。
+3. `end` 事件里的 `usage` 正好等于那一份 cumulative usage，**不是它的两倍**——`counted_usage_ids` 在 messages 末尾 chunk 上已经吸收了，values 分支的重复访问是 no-op。
+4. 消费者拿到的 `content` 是**增量**："ele" 只包含 3 个字符，不是 "one\ntwo\n...ele"。想要完整文本要按 `id` 累加，`chat()` 已经帮你做了。
+
+---
+
+## 为什么这个设计容易出 bug，以及测试策略
+
+本文档的直接起因是 bytedance/deer-flow#1969：`DeerFlowClient.stream()` 原本只订阅 `["values", "custom"]`，**漏了 `"messages"`**。结果 `client.stream("hello")` 等价于一次性返回，视觉上和 `chat()` 没区别。
+
+这类 bug 有三个结构性原因：
+
+1. **多协议层命名**：`messages` / `messages-tuple` / HTTP SSE `messages` 是同一概念的三个名字。在其中一层出错不会在另外两层报错。
+2. **多消费者模型**：Gateway 和 DeerFlowClient 是两套独立实现，**没有单一的"订阅哪些 mode"的 single source of truth**。前者订阅对了不代表后者也订阅对了。
+3. **mock 测试绕开了真实路径**：老测试用 `agent.stream.return_value = iter([dict_chunk, ...])` 喂 values 形状的 dict 模拟 state 快照。这样构造的输入**永远不会进入 `messages` mode 分支**，所以即使 `stream_mode` 里少一个元素，CI 依然全绿。
+
+### 防御手段
+
+真正的防线是**显式断言 "messages" mode 被订阅 + 用真实 chunk shape mock**：
+
+```python
+# tests/test_client.py::test_messages_mode_emits_token_deltas
+agent.stream.return_value = iter([
+    ("messages", (AIMessageChunk(content="Hel", id="ai-1"), {})),
+    ("messages", (AIMessageChunk(content="lo ", id="ai-1"), {})),
+    ("messages", (AIMessageChunk(content="world!", id="ai-1"), {})),
+    ("values", {"messages": [HumanMessage(...), AIMessage(content="Hello world!", id="ai-1")]}),
+])
+# ...
+assert [e.data["content"] for e in ai_text_events] == ["Hel", "lo ", "world!"]
+assert len(ai_text_events) == 3  # values snapshot must NOT re-synthesize
+assert "messages" in agent.stream.call_args.kwargs["stream_mode"]
+```
+
+**为什么这比"抽一个共享常量"更有效**：共享常量只能保证"用它的人写对字符串"，但新增消费者的人可能根本不知道常量在哪。行为断言强制任何改动都要穿过**实际执行路径**，改回 `["values", "custom"]` 会立刻让 `assert "messages" in ...` 失败。
+
+### 活体信号：BPE 子词边界
+
+回归的最终验证是让真实 LLM 数 1-15，然后看是否能在输出里看到 tokenizer 的子词切分：
+
+```
+[5.460s] 'ele' / 'ven'      eleven 被拆成两个 token
+[5.508s] 'tw'  / 'elve'     twelve 拆两个
+[5.568s] 'th'  / 'irteen'   thirteen 拆两个
+[5.623s] 'four'/ 'teen'     fourteen 拆两个
+[5.677s] 'f'   / 'if' / 'teen'  fifteen 拆三个
+```
+
+子词切分是 tokenizer 的外部事实，**无法伪造**。能看到它就说明数据流**逐 chunk** 地穿过了整条管道，没有被任何中间层缓冲成整段。这种"活体信号"在流式系统里是比单元测试更高置信度的证据。
+
+---
+
+## 相关源码定位
+
+| 关心什么 | 看这里 |
+|---|---|
+| DeerFlowClient 嵌入式流 | `packages/harness/deerflow/client.py::DeerFlowClient.stream` |
+| `chat()` 的 delta 累加器 | `packages/harness/deerflow/client.py::DeerFlowClient.chat` |
+| Gateway async 流 | `packages/harness/deerflow/runtime/runs/worker.py::run_agent` |
+| HTTP SSE 帧输出 | `app/gateway/services.py::sse_consumer` / `format_sse` |
+| 序列化到 wire 格式 | `packages/harness/deerflow/runtime/serialization.py` |
+| LangGraph mode 命名翻译 | `packages/harness/deerflow/runtime/runs/worker.py:117-121` |
+| 飞书渠道的增量卡片更新 | `app/channels/manager.py::_handle_streaming_chat` |
+| Channels 自带的 delta/cumulative 防御性累加 | `app/channels/manager.py::_merge_stream_text` |
+| Frontend useStream 支持的 mode 集合 | `frontend/src/core/api/stream-mode.ts` |
+| 核心回归测试 | `backend/tests/test_client.py::TestStream::test_messages_mode_emits_token_deltas` |
--- a/deer-flow/backend/docs/TITLE_GENERATION_IMPLEMENTATION.md
+++ b/deer-flow/backend/docs/TITLE_GENERATION_IMPLEMENTATION.md
@@ -0,0 +1,222 @@
+# 自动 Title 生成功能实现总结
+
+## ✅ 已完成的工作
+
+### 1. 核心实现文件
+
+#### [`packages/harness/deerflow/agents/thread_state.py`](../packages/harness/deerflow/agents/thread_state.py)
+- ✅ 添加 `title: str | None = None` 字段到 `ThreadState`
+
+#### [`packages/harness/deerflow/config/title_config.py`](../packages/harness/deerflow/config/title_config.py) (新建)
+- ✅ 创建 `TitleConfig` 配置类
+- ✅ 支持配置：enabled, max_words, max_chars, model_name, prompt_template
+- ✅ 提供 `get_title_config()` 和 `set_title_config()` 函数
+- ✅ 提供 `load_title_config_from_dict()` 从配置文件加载
+
+#### [`packages/harness/deerflow/agents/middlewares/title_middleware.py`](../packages/harness/deerflow/agents/middlewares/title_middleware.py) (新建)
+- ✅ 创建 `TitleMiddleware` 类
+- ✅ 实现 `_should_generate_title()` 检查是否需要生成
+- ✅ 实现 `_generate_title()` 调用 LLM 生成标题
+- ✅ 实现 `after_agent()` 钩子，在首次对话后自动触发
+- ✅ 包含 fallback 策略（LLM 失败时使用用户消息前几个词）
+
+#### [`packages/harness/deerflow/config/app_config.py`](../packages/harness/deerflow/config/app_config.py)
+- ✅ 导入 `load_title_config_from_dict`
+- ✅ 在 `from_file()` 中加载 title 配置
+
+#### [`packages/harness/deerflow/agents/lead_agent/agent.py`](../packages/harness/deerflow/agents/lead_agent/agent.py)
+- ✅ 导入 `TitleMiddleware`
+- ✅ 注册到 `middleware` 列表：`[SandboxMiddleware(), TitleMiddleware()]`
+
+### 2. 配置文件
+
+#### [`config.yaml`](../../config.example.yaml)
+- ✅ 添加 title 配置段：
+```yaml
+title:
+  enabled: true
+  max_words: 6
+  max_chars: 60
+  model_name: null
+```
+
+### 3. 文档
+
+#### [`docs/AUTO_TITLE_GENERATION.md`](../docs/AUTO_TITLE_GENERATION.md) (新建)
+- ✅ 完整的功能说明文档
+- ✅ 实现方式和架构设计
+- ✅ 配置说明
+- ✅ 客户端使用示例（TypeScript）
+- ✅ 工作流程图（Mermaid）
+- ✅ 故障排查指南
+- ✅ State vs Metadata 对比
+
+#### [`TODO.md`](TODO.md)
+- ✅ 添加功能完成记录
+
+### 4. 测试
+
+#### [`tests/test_title_generation.py`](../tests/test_title_generation.py) (新建)
+- ✅ 配置类测试
+- ✅ Middleware 初始化测试
+- ✅ TODO: 集成测试（需要 mock Runtime）
+
+---
+
+## 🎯 核心设计决策
+
+### 为什么使用 State 而非 Metadata？
+
+| 方面 | State (✅ 采用) | Metadata (❌ 未采用) |
+|------|----------------|---------------------|
+| **持久化** | 自动（通过 checkpointer） | 取决于实现，不可靠 |
+| **版本控制** | 支持时间旅行 | 不支持 |
+| **类型安全** | TypedDict 定义 | 任意字典 |
+| **标准化** | LangGraph 核心机制 | 扩展功能 |
+
+### 工作流程
+
+```
+用户发送首条消息
+  ↓
+Agent 处理并返回回复
+  ↓
+TitleMiddleware.after_agent() 触发
+  ↓
+检查：是否首次对话？是否已有 title？
+  ↓
+调用 LLM 生成 title
+  ↓
+返回 {"title": "..."} 更新 state
+  ↓
+Checkpointer 自动持久化（如果配置了）
+  ↓
+客户端从 state.values.title 读取
+```
+
+---
+
+## 📋 使用指南
+
+### 后端配置
+
+1. **启用/禁用功能**
+```yaml
+# config.yaml
+title:
+  enabled: true  # 设为 false 禁用
+```
+
+2. **自定义配置**
+```yaml
+title:
+  enabled: true
+  max_words: 8      # 标题最多 8 个词
+  max_chars: 80     # 标题最多 80 个字符
+  model_name: null  # 使用默认模型
+```
+
+3. **配置持久化（可选）**
+
+如果需要在本地开发时持久化 title：
+
+```python
+# checkpointer.py
+from langgraph.checkpoint.sqlite import SqliteSaver
+
+checkpointer = SqliteSaver.from_conn_string("checkpoints.db")
+```
+
+```json
+// langgraph.json
+{
+  "graphs": {
+    "lead_agent": "deerflow.agents:lead_agent"
+  },
+  "checkpointer": "checkpointer:checkpointer"
+}
+```
+
+### 客户端使用
+
+```typescript
+// 获取 thread title
+const state = await client.threads.getState(threadId);
+const title = state.values.title || "New Conversation";
+
+// 显示在对话列表
+<li>{title}</li>
+```
+
+**⚠️ 注意**：Title 在 `state.values.title`，而非 `thread.metadata.title`
+
+---
+
+## 🧪 测试
+
+```bash
+# 运行测试
+pytest tests/test_title_generation.py -v
+
+# 运行所有测试
+pytest
+```
+
+---
+
+## 🔍 故障排查
+
+### Title 没有生成？
+
+1. 检查配置：`title.enabled = true`
+2. 查看日志：搜索 "Generated thread title"
+3. 确认是首次对话（1 个用户消息 + 1 个助手回复）
+
+### Title 生成但看不到？
+
+1. 确认读取位置：`state.values.title`（不是 `thread.metadata.title`）
+2. 检查 API 响应是否包含 title
+3. 重新获取 state
+
+### Title 重启后丢失？
+
+1. 本地开发需要配置 checkpointer
+2. LangGraph Platform 会自动持久化
+3. 检查数据库确认 checkpointer 工作正常
+
+---
+
+## 📊 性能影响
+
+- **延迟增加**：约 0.5-1 秒（LLM 调用）
+- **并发安全**：在 `after_agent` 中运行，不阻塞主流程
+- **资源消耗**：每个 thread 只生成一次
+
+### 优化建议
+
+1. 使用更快的模型（如 `gpt-3.5-turbo`）
+2. 减少 `max_words` 和 `max_chars`
+3. 调整 prompt 使其更简洁
+
+---
+
+## 🚀 下一步
+
+- [ ] 添加集成测试（需要 mock LangGraph Runtime）
+- [ ] 支持自定义 prompt template
+- [ ] 支持多语言 title 生成
+- [ ] 添加 title 重新生成功能
+- [ ] 监控 title 生成成功率和延迟
+
+---
+
+## 📚 相关资源
+
+- [完整文档](../docs/AUTO_TITLE_GENERATION.md)
+- [LangGraph Middleware](https://langchain-ai.github.io/langgraph/concepts/middleware/)
+- [LangGraph State 管理](https://langchain-ai.github.io/langgraph/concepts/low_level/#state)
+- [LangGraph Checkpointer](https://langchain-ai.github.io/langgraph/concepts/persistence/)
+
+---
+
+*实现完成时间: 2026-01-14*
--- a/deer-flow/backend/docs/TODO.md
+++ b/deer-flow/backend/docs/TODO.md
@@ -0,0 +1,34 @@
+# TODO List
+
+## Completed Features
+
+- [x] Launch the sandbox only after the first file system or bash tool is called
+- [x] Add Clarification Process for the whole process
+- [x] Implement Context Summarization Mechanism to avoid context explosion
+- [x] Integrate MCP (Model Context Protocol) for extensible tools
+- [x] Add file upload support with automatic document conversion
+- [x] Implement automatic thread title generation
+- [x] Add Plan Mode with TodoList middleware
+- [x] Add vision model support with ViewImageMiddleware
+- [x] Skills system with SKILL.md format
+
+## Planned Features
+
+- [ ] Pooling the sandbox resources to reduce the number of sandbox containers
+- [ ] Add authentication/authorization layer
+- [ ] Implement rate limiting
+- [ ] Add metrics and monitoring
+- [ ] Support for more document formats in upload
+- [ ] Skill marketplace / remote skill installation
+- [ ] Optimize async concurrency in agent hot path (IM channels multi-task scenario)
+  - Replace `time.sleep(5)` with `asyncio.sleep()` in `packages/harness/deerflow/tools/builtins/task_tool.py` (subagent polling)
+  - Replace `subprocess.run()` with `asyncio.create_subprocess_shell()` in `packages/harness/deerflow/sandbox/local/local_sandbox.py`
+  - Replace sync `requests` with `httpx.AsyncClient` in community tools (tavily, jina_ai, firecrawl, infoquest, image_search)
+  - Replace sync `model.invoke()` with async `model.ainvoke()` in title_middleware and memory updater
+  - Consider `asyncio.to_thread()` wrapper for remaining blocking file I/O
+  - For production: use `langgraph up` (multi-worker) instead of `langgraph dev` (single-worker)
+
+## Resolved Issues
+
+- [x] Make sure that no duplicated files in `state.artifacts`
+- [x] Long thinking but with empty content (answer inside thinking process)
--- a/deer-flow/backend/docs/memory-settings-sample.json
+++ b/deer-flow/backend/docs/memory-settings-sample.json
@@ -0,0 +1,114 @@
+{
+  "version": "1.0",
+  "lastUpdated": "2026-03-28T10:30:00Z",
+  "user": {
+    "workContext": {
+      "summary": "Working on DeerFlow memory management UX, including local search, local filters, clear-all, and single-fact deletion in Settings > Memory.",
+      "updatedAt": "2026-03-28T10:30:00Z"
+    },
+    "personalContext": {
+      "summary": "Prefers Chinese during collaboration, but wants GitHub PR titles and bodies written in English with a Chinese translation provided alongside them.",
+      "updatedAt": "2026-03-28T10:28:00Z"
+    },
+    "topOfMind": {
+      "summary": "Wants reviewers to be able to reproduce the memory search and filter flow quickly with pre-populated sample data.",
+      "updatedAt": "2026-03-28T10:26:00Z"
+    }
+  },
+  "history": {
+    "recentMonths": {
+      "summary": "Recently contributed multiple DeerFlow pull requests covering memory, uploads, and compatibility fixes.",
+      "updatedAt": "2026-03-28T10:24:00Z"
+    },
+    "earlierContext": {
+      "summary": "Often prefers shipping smaller, reviewable changes with explicit validation notes.",
+      "updatedAt": "2026-03-28T10:22:00Z"
+    },
+    "longTermBackground": {
+      "summary": "Actively building open-source contribution experience and improving end-to-end delivery quality.",
+      "updatedAt": "2026-03-28T10:20:00Z"
+    }
+  },
+  "facts": [
+    {
+      "id": "fact_review_001",
+      "content": "User prefers Chinese for day-to-day collaboration.",
+      "category": "preference",
+      "confidence": 0.95,
+      "createdAt": "2026-03-28T09:50:00Z",
+      "source": "thread_pref_cn"
+    },
+    {
+      "id": "fact_review_002",
+      "content": "PR titles and bodies should be drafted in English and accompanied by a Chinese translation.",
+      "category": "workflow",
+      "confidence": 0.93,
+      "createdAt": "2026-03-28T09:52:00Z",
+      "source": "thread_pr_style"
+    },
+    {
+      "id": "fact_review_003",
+      "content": "User implemented memory search and filter improvements in the DeerFlow settings page.",
+      "category": "project",
+      "confidence": 0.91,
+      "createdAt": "2026-03-28T09:54:00Z",
+      "source": "thread_memory_filters"
+    },
+    {
+      "id": "fact_review_004",
+      "content": "User added clear-all memory support through the gateway memory API.",
+      "category": "project",
+      "confidence": 0.89,
+      "createdAt": "2026-03-28T09:56:00Z",
+      "source": "thread_memory_clear"
+    },
+    {
+      "id": "fact_review_005",
+      "content": "User added single-fact deletion support for persisted memory entries.",
+      "category": "project",
+      "confidence": 0.9,
+      "createdAt": "2026-03-28T09:58:00Z",
+      "source": "thread_memory_delete"
+    },
+    {
+      "id": "fact_review_006",
+      "content": "Reviewer can search for keyword memory to see multiple matching facts.",
+      "category": "testing",
+      "confidence": 0.84,
+      "createdAt": "2026-03-28T10:00:00Z",
+      "source": "thread_review_demo"
+    },
+    {
+      "id": "fact_review_007",
+      "content": "Reviewer can search for keyword Chinese to verify cross-category matching.",
+      "category": "testing",
+      "confidence": 0.82,
+      "createdAt": "2026-03-28T10:02:00Z",
+      "source": "thread_review_demo"
+    },
+    {
+      "id": "fact_review_008",
+      "content": "Reviewer can search for workflow to verify category text is included in local filtering.",
+      "category": "testing",
+      "confidence": 0.81,
+      "createdAt": "2026-03-28T10:04:00Z",
+      "source": "thread_review_demo"
+    },
+    {
+      "id": "fact_review_009",
+      "content": "Delete fact testing can target this disposable sample entry.",
+      "category": "testing",
+      "confidence": 0.78,
+      "createdAt": "2026-03-28T10:06:00Z",
+      "source": "thread_delete_demo"
+    },
+    {
+      "id": "fact_review_010",
+      "content": "This sample fact is intended for edit testing.",
+      "category": "testing",
+      "confidence": 0.8,
+      "createdAt": "2026-03-28T10:08:00Z",
+      "source": "manual"
+    }
+  ]
+}
--- a/deer-flow/backend/docs/middleware-execution-flow.md
+++ b/deer-flow/backend/docs/middleware-execution-flow.md
@@ -0,0 +1,291 @@
+# Middleware 执行流程
+
+## Middleware 列表
+
+`create_deerflow_agent` 通过 `RuntimeFeatures` 组装的完整 middleware 链（默认全开时）：
+
+| # | Middleware | `before_agent` | `before_model` | `after_model` | `after_agent` | `wrap_tool_call` | 主 Agent | Subagent | 来源 |
+|---|-----------|:-:|:-:|:-:|:-:|:-:|:-:|:-:|------|
+| 0 | ThreadDataMiddleware | ✓ | | | | | ✓ | ✓ | `sandbox` |
+| 1 | UploadsMiddleware | ✓ | | | | | ✓ | ✗ | `sandbox` |
+| 2 | SandboxMiddleware | ✓ | | | ✓ | | ✓ | ✓ | `sandbox` |
+| 3 | DanglingToolCallMiddleware | | | ✓ | | | ✓ | ✗ | 始终开启 |
+| 4 | GuardrailMiddleware | | | | | ✓ | ✓ | ✓ | *Phase 2 纳入* |
+| 5 | ToolErrorHandlingMiddleware | | | | | ✓ | ✓ | ✓ | 始终开启 |
+| 6 | SummarizationMiddleware | | | ✓ | | | ✓ | ✗ | `summarization` |
+| 7 | TodoMiddleware | | | ✓ | | | ✓ | ✗ | `plan_mode` 参数 |
+| 8 | TitleMiddleware | | | ✓ | | | ✓ | ✗ | `auto_title` |
+| 9 | MemoryMiddleware | | | | ✓ | | ✓ | ✗ | `memory` |
+| 10 | ViewImageMiddleware | | ✓ | | | | ✓ | ✗ | `vision` |
+| 11 | SubagentLimitMiddleware | | | ✓ | | | ✓ | ✗ | `subagent` |
+| 12 | LoopDetectionMiddleware | | | ✓ | | | ✓ | ✗ | 始终开启 |
+| 13 | ClarificationMiddleware | | | ✓ | | | ✓ | ✗ | 始终最后 |
+
+主 agent **14 个** middleware（`make_lead_agent`），subagent **4 个**（ThreadData、Sandbox、Guardrail、ToolErrorHandling）。`create_deerflow_agent` Phase 1 实现 **13 个**（Guardrail 仅支持自定义实例，无内置默认）。
+
+## 执行流程
+
+LangChain `create_agent` 的规则：
+- **`before_*` 正序执行**（列表位置 0 → N）
+- **`after_*` 反序执行**（列表位置 N → 0）
+
+```mermaid
+graph TB
+    START(["invoke"]) --> TD
+
+    subgraph BA ["<b>before_agent</b> 正序 0→N"]
+        direction TB
+        TD["[0] ThreadData<br/>创建线程目录"] --> UL["[1] Uploads<br/>扫描上传文件"] --> SB["[2] Sandbox<br/>获取沙箱"]
+    end
+
+    subgraph BM ["<b>before_model</b> 正序 0→N"]
+        direction TB
+        VI["[10] ViewImage<br/>注入图片 base64"]
+    end
+
+    SB --> VI
+    VI --> M["<b>MODEL</b>"]
+
+    subgraph AM ["<b>after_model</b> 反序 N→0"]
+        direction TB
+        CL["[13] Clarification<br/>拦截 ask_clarification"] --> LD["[12] LoopDetection<br/>检测循环"] --> SL["[11] SubagentLimit<br/>截断多余 task"] --> TI["[8] Title<br/>生成标题"] --> SM["[6] Summarization<br/>上下文压缩"] --> DTC["[3] DanglingToolCall<br/>补缺失 ToolMessage"]
+    end
+
+    M --> CL
+
+    subgraph AA ["<b>after_agent</b> 反序 N→0"]
+        direction TB
+        SBR["[2] Sandbox<br/>释放沙箱"] --> MEM["[9] Memory<br/>入队记忆"]
+    end
+
+    DTC --> SBR
+    MEM --> END(["response"])
+
+    classDef beforeNode fill:#a0a8b5,stroke:#636b7a,color:#2d3239
+    classDef modelNode fill:#b5a8a0,stroke:#7a6b63,color:#2d3239
+    classDef afterModelNode fill:#b5a0a8,stroke:#7a636b,color:#2d3239
+    classDef afterAgentNode fill:#a0b5a8,stroke:#637a6b,color:#2d3239
+    classDef terminalNode fill:#a8b5a0,stroke:#6b7a63,color:#2d3239
+
+    class TD,UL,SB,VI beforeNode
+    class M modelNode
+    class CL,LD,SL,TI,SM,DTC afterModelNode
+    class SBR,MEM afterAgentNode
+    class START,END terminalNode
+```
+
+## 时序图
+
+```mermaid
+sequenceDiagram
+    participant U as User
+    participant TD as ThreadDataMiddleware
+    participant UL as UploadsMiddleware
+    participant SB as SandboxMiddleware
+    participant VI as ViewImageMiddleware
+    participant M as MODEL
+    participant CL as ClarificationMiddleware
+    participant SL as SubagentLimitMiddleware
+    participant TI as TitleMiddleware
+    participant SM as SummarizationMiddleware
+    participant DTC as DanglingToolCallMiddleware
+    participant MEM as MemoryMiddleware
+
+    U ->> TD: invoke
+    activate TD
+    Note right of TD: before_agent 创建目录
+
+    TD ->> UL: before_agent
+    activate UL
+    Note right of UL: before_agent 扫描上传文件
+
+    UL ->> SB: before_agent
+    activate SB
+    Note right of SB: before_agent 获取沙箱
+
+    SB ->> VI: before_model
+    activate VI
+    Note right of VI: before_model 注入图片 base64
+
+    VI ->> M: messages + tools
+    activate M
+    M -->> CL: AI response
+    deactivate M
+
+    activate CL
+    Note right of CL: after_model 拦截 ask_clarification
+    CL -->> SL: after_model
+    deactivate CL
+
+    activate SL
+    Note right of SL: after_model 截断多余 task
+    SL -->> TI: after_model
+    deactivate SL
+
+    activate TI
+    Note right of TI: after_model 生成标题
+    TI -->> SM: after_model
+    deactivate TI
+
+    activate SM
+    Note right of SM: after_model 上下文压缩
+    SM -->> DTC: after_model
+    deactivate SM
+
+    activate DTC
+    Note right of DTC: after_model 补缺失 ToolMessage
+    DTC -->> VI: done
+    deactivate DTC
+
+    VI -->> SB: done
+    deactivate VI
+
+    Note right of SB: after_agent 释放沙箱
+    SB -->> UL: done
+    deactivate SB
+
+    UL -->> TD: done
+    deactivate UL
+
+    Note right of MEM: after_agent 入队记忆
+
+    TD -->> U: response
+    deactivate TD
+```
+
+## 洋葱模型
+
+列表位置决定在洋葱中的层级 — 位置 0 最外层，位置 N 最内层：
+
+```
+进入 before_*：   [0] → [1] → [2] → ... → [10] → MODEL
+退出 after_*：    MODEL → [13] → [11] → ... → [6] → [3] → [2] → [0]
+                          ↑ 最内层最先执行
+```
+
+> [!important] 核心规则
+> 列表最后的 middleware，其 `after_model` **最先执行**。
+> ClarificationMiddleware 在列表末尾，所以它第一个拦截 model 输出。
+
+## 对比：真正的洋葱 vs DeerFlow 的实际情况
+
+### 真正的洋葱（如 Koa/Express）
+
+每个 middleware 同时负责 before 和 after，形成对称嵌套：
+
+```mermaid
+sequenceDiagram
+    participant U as User
+    participant A as AuthMiddleware
+    participant L as LogMiddleware
+    participant R as RateLimitMiddleware
+    participant H as Handler
+
+    U ->> A: request
+    activate A
+    Note right of A: before: 校验 token
+
+    A ->> L: next()
+    activate L
+    Note right of L: before: 记录请求时间
+
+    L ->> R: next()
+    activate R
+    Note right of R: before: 检查频率
+
+    R ->> H: next()
+    activate H
+    H -->> R: result
+    deactivate H
+
+    Note right of R: after: 更新计数器
+    R -->> L: result
+    deactivate R
+
+    Note right of L: after: 记录耗时
+    L -->> A: result
+    deactivate L
+
+    Note right of A: after: 清理上下文
+    A -->> U: response
+    deactivate A
+```
+
+> [!tip] 洋葱特征
+> 每个 middleware 都有 before/after 对称操作，`activate` 跨越整个内层执行，形成完美嵌套。
+
+### DeerFlow 的实际情况
+
+不是洋葱，是管道。大部分 middleware 只用一个钩子，不存在对称嵌套。多轮对话时 before_model / after_model 循环执行：
+
+```mermaid
+sequenceDiagram
+    participant U as User
+    participant TD as ThreadData
+    participant UL as Uploads
+    participant SB as Sandbox
+    participant VI as ViewImage
+    participant M as MODEL
+    participant CL as Clarification
+    participant SL as SubagentLimit
+    participant TI as Title
+    participant SM as Summarization
+    participant MEM as Memory
+
+    U ->> TD: invoke
+    Note right of TD: before_agent 创建目录
+    TD ->> UL: .
+    Note right of UL: before_agent 扫描文件
+    UL ->> SB: .
+    Note right of SB: before_agent 获取沙箱
+
+    loop 每轮对话（tool call 循环）
+        SB ->> VI: .
+        Note right of VI: before_model 注入图片
+        VI ->> M: messages + tools
+        M -->> CL: AI response
+        Note right of CL: after_model 拦截 ask_clarification
+        CL -->> SL: .
+        Note right of SL: after_model 截断多余 task
+        SL -->> TI: .
+        Note right of TI: after_model 生成标题
+        TI -->> SM: .
+        Note right of SM: after_model 上下文压缩
+    end
+
+    Note right of SB: after_agent 释放沙箱
+    SB -->> MEM: .
+    Note right of MEM: after_agent 入队记忆
+    MEM -->> U: response
+```
+
+> [!warning] 不是洋葱
+> 14 个 middleware 中只有 SandboxMiddleware 有 before/after 对称（获取/释放）。其余都是单向的：要么只在 `before_*` 做事，要么只在 `after_*` 做事。`before_agent` / `after_agent` 只跑一次，`before_model` / `after_model` 每轮循环都跑。
+
+硬依赖只有 2 处：
+
+1. **ThreadData 在 Sandbox 之前** — sandbox 需要线程目录
+2. **Clarification 在列表最后** — `after_model` 反序时最先执行，第一个拦截 `ask_clarification`
+
+### 结论
+
+| | 真正的洋葱 | DeerFlow 实际 |
+|---|---|---|
+| 每个 middleware | before + after 对称 | 大多只用一个钩子 |
+| 激活条 | 嵌套（外长内短） | 不嵌套（串行） |
+| 反序的意义 | 清理与初始化配对 | 仅影响 after_model 的执行优先级 |
+| 典型例子 | Auth: 校验 token / 清理上下文 | ThreadData: 只创建目录，没有清理 |
+
+## 关键设计点
+
+### ClarificationMiddleware 为什么在列表最后？
+
+位置最后 = `after_model` 最先执行。它需要**第一个**看到 model 输出，检查是否有 `ask_clarification` tool call。如果有，立即中断（`Command(goto=END)`），后续 middleware 的 `after_model` 不再执行。
+
+### SandboxMiddleware 的对称性
+
+`before_agent`（正序第 3 个）获取沙箱，`after_agent`（反序第 1 个）释放沙箱。外层进入 → 外层退出，天然的洋葱对称。
+
+### 大部分 middleware 只用一个钩子
+
+14 个 middleware 中，只有 SandboxMiddleware 同时用了 `before_agent` + `after_agent`（获取/释放）。其余都只在一个阶段执行。洋葱模型的反序特性主要影响 `after_model` 阶段的执行顺序。
--- a/deer-flow/backend/docs/plan_mode_usage.md
+++ b/deer-flow/backend/docs/plan_mode_usage.md
@@ -0,0 +1,204 @@
+# Plan Mode with TodoList Middleware
+
+This document describes how to enable and use the Plan Mode feature with TodoList middleware in DeerFlow 2.0.
+
+## Overview
+
+Plan Mode adds a TodoList middleware to the agent, which provides a `write_todos` tool that helps the agent:
+- Break down complex tasks into smaller, manageable steps
+- Track progress as work progresses
+- Provide visibility to users about what's being done
+
+The TodoList middleware is built on LangChain's `TodoListMiddleware`.
+
+## Configuration
+
+### Enabling Plan Mode
+
+Plan mode is controlled via **runtime configuration** through the `is_plan_mode` parameter in the `configurable` section of `RunnableConfig`. This allows you to dynamically enable or disable plan mode on a per-request basis.
+
+```python
+from langchain_core.runnables import RunnableConfig
+from deerflow.agents.lead_agent.agent import make_lead_agent
+
+# Enable plan mode via runtime configuration
+config = RunnableConfig(
+    configurable={
+        "thread_id": "example-thread",
+        "thinking_enabled": True,
+        "is_plan_mode": True,  # Enable plan mode
+    }
+)
+
+# Create agent with plan mode enabled
+agent = make_lead_agent(config)
+```
+
+### Configuration Options
+
+- **is_plan_mode** (bool): Whether to enable plan mode with TodoList middleware. Default: `False`
+  - Pass via `config.get("configurable", {}).get("is_plan_mode", False)`
+  - Can be set dynamically for each agent invocation
+  - No global configuration needed
+
+## Default Behavior
+
+When plan mode is enabled with default settings, the agent will have access to a `write_todos` tool with the following behavior:
+
+### When to Use TodoList
+
+The agent will use the todo list for:
+1. Complex multi-step tasks (3+ distinct steps)
+2. Non-trivial tasks requiring careful planning
+3. When user explicitly requests a todo list
+4. When user provides multiple tasks
+
+### When NOT to Use TodoList
+
+The agent will skip using the todo list for:
+1. Single, straightforward tasks
+2. Trivial tasks (< 3 steps)
+3. Purely conversational or informational requests
+
+### Task States
+
+- **pending**: Task not yet started
+- **in_progress**: Currently working on (can have multiple parallel tasks)
+- **completed**: Task finished successfully
+
+## Usage Examples
+
+### Basic Usage
+
+```python
+from langchain_core.runnables import RunnableConfig
+from deerflow.agents.lead_agent.agent import make_lead_agent
+
+# Create agent with plan mode ENABLED
+config_with_plan_mode = RunnableConfig(
+    configurable={
+        "thread_id": "example-thread",
+        "thinking_enabled": True,
+        "is_plan_mode": True,  # TodoList middleware will be added
+    }
+)
+agent_with_todos = make_lead_agent(config_with_plan_mode)
+
+# Create agent with plan mode DISABLED (default)
+config_without_plan_mode = RunnableConfig(
+    configurable={
+        "thread_id": "another-thread",
+        "thinking_enabled": True,
+        "is_plan_mode": False,  # No TodoList middleware
+    }
+)
+agent_without_todos = make_lead_agent(config_without_plan_mode)
+```
+
+### Dynamic Plan Mode per Request
+
+You can enable/disable plan mode dynamically for different conversations or tasks:
+
+```python
+from langchain_core.runnables import RunnableConfig
+from deerflow.agents.lead_agent.agent import make_lead_agent
+
+def create_agent_for_task(task_complexity: str):
+    """Create agent with plan mode based on task complexity."""
+    is_complex = task_complexity in ["high", "very_high"]
+
+    config = RunnableConfig(
+        configurable={
+            "thread_id": f"task-{task_complexity}",
+            "thinking_enabled": True,
+            "is_plan_mode": is_complex,  # Enable only for complex tasks
+        }
+    )
+
+    return make_lead_agent(config)
+
+# Simple task - no TodoList needed
+simple_agent = create_agent_for_task("low")
+
+# Complex task - TodoList enabled for better tracking
+complex_agent = create_agent_for_task("high")
+```
+
+## How It Works
+
+1. When `make_lead_agent(config)` is called, it extracts `is_plan_mode` from `config.configurable`
+2. The config is passed to `_build_middlewares(config)`
+3. `_build_middlewares()` reads `is_plan_mode` and calls `_create_todo_list_middleware(is_plan_mode)`
+4. If `is_plan_mode=True`, a `TodoListMiddleware` instance is created and added to the middleware chain
+5. The middleware automatically adds a `write_todos` tool to the agent's toolset
+6. The agent can use this tool to manage tasks during execution
+7. The middleware handles the todo list state and provides it to the agent
+
+## Architecture
+
+```
+make_lead_agent(config)
+  │
+  ├─> Extracts: is_plan_mode = config.configurable.get("is_plan_mode", False)
+  │
+  └─> _build_middlewares(config)
+        │
+        ├─> ThreadDataMiddleware
+        ├─> SandboxMiddleware
+        ├─> SummarizationMiddleware (if enabled via global config)
+        ├─> TodoListMiddleware (if is_plan_mode=True) ← NEW
+        ├─> TitleMiddleware
+        └─> ClarificationMiddleware
+```
+
+## Implementation Details
+
+### Agent Module
+- **Location**: `packages/harness/deerflow/agents/lead_agent/agent.py`
+- **Function**: `_create_todo_list_middleware(is_plan_mode: bool)` - Creates TodoListMiddleware if plan mode is enabled
+- **Function**: `_build_middlewares(config: RunnableConfig)` - Builds middleware chain based on runtime config
+- **Function**: `make_lead_agent(config: RunnableConfig)` - Creates agent with appropriate middlewares
+
+### Runtime Configuration
+Plan mode is controlled via the `is_plan_mode` parameter in `RunnableConfig.configurable`:
+```python
+config = RunnableConfig(
+    configurable={
+        "is_plan_mode": True,  # Enable plan mode
+        # ... other configurable options
+    }
+)
+```
+
+## Key Benefits
+
+1. **Dynamic Control**: Enable/disable plan mode per request without global state
+2. **Flexibility**: Different conversations can have different plan mode settings
+3. **Simplicity**: No need for global configuration management
+4. **Context-Aware**: Plan mode decision can be based on task complexity, user preferences, etc.
+
+## Custom Prompts
+
+DeerFlow uses custom `system_prompt` and `tool_description` for the TodoListMiddleware that match the overall DeerFlow prompt style:
+
+### System Prompt Features
+- Uses XML tags (`<todo_list_system>`) for structure consistency with DeerFlow's main prompt
+- Emphasizes CRITICAL rules and best practices
+- Clear "When to Use" vs "When NOT to Use" guidelines
+- Focuses on real-time updates and immediate task completion
+
+### Tool Description Features
+- Detailed usage scenarios with examples
+- Strong emphasis on NOT using for simple tasks
+- Clear task state definitions (pending, in_progress, completed)
+- Comprehensive best practices section
+- Task completion requirements to prevent premature marking
+
+The custom prompts are defined in `_create_todo_list_middleware()` in `/Users/hetao/workspace/deer-flow/backend/packages/harness/deerflow/agents/lead_agent/agent.py:57`.
+
+## Notes
+
+- TodoList middleware uses LangChain's built-in `TodoListMiddleware` with **custom DeerFlow-style prompts**
+- Plan mode is **disabled by default** (`is_plan_mode=False`) to maintain backward compatibility
+- The middleware is positioned before `ClarificationMiddleware` to allow todo management during clarification flows
+- Custom prompts emphasize the same principles as DeerFlow's main system prompt (clarity, action-oriented, critical rules)
--- a/deer-flow/backend/docs/rfc-create-deerflow-agent.md
+++ b/deer-flow/backend/docs/rfc-create-deerflow-agent.md
@@ -0,0 +1,503 @@
+# RFC: `create_deerflow_agent` — 纯参数的 SDK 工厂 API
+
+## 1. 问题
+
+当前 harness 的唯一公开入口是 `make_lead_agent(config: RunnableConfig)`。它内部：
+
+```
+make_lead_agent
+  ├─ get_app_config()          ← 读 config.yaml
+  ├─ _resolve_model_name()     ← 读 config.yaml
+  ├─ load_agent_config()       ← 读 agents/{name}/config.yaml
+  ├─ create_chat_model(name)   ← 读 config.yaml（反射加载 model class）
+  ├─ get_available_tools()     ← 读 config.yaml + extensions_config.json
+  ├─ apply_prompt_template()   ← 读 skills 目录 + memory.json
+  └─ _build_middlewares()      ← 读 config.yaml（summarization、model vision）
+```
+
+**6 处隐式 I/O** — 全部依赖文件系统。如果你想把 `deerflow-harness` 当 Python 库嵌入自己的应用，你必须准备 `config.yaml` + `extensions_config.json` + skills 目录。这对 SDK 用户是不可接受的。
+
+### 对比
+
+| | `langchain.create_agent` | `make_lead_agent` | `DeerFlowClient`（增强后） |
+|---|---|---|---|
+| 定位 | 底层原语 | 内部工厂 | **唯一公开 API** |
+| 配置来源 | 纯参数 | YAML 文件 | **参数优先，config fallback** |
+| 内置能力 | 无 | Sandbox/Memory/Skills/Subagent/... | **按需组合 + 管理 API** |
+| 用户接口 | `graph.invoke(state)` | 内部使用 | **`client.chat("hello")`** |
+| 适合谁 | 写 LangChain 的人 | 内部使用 | **所有 DeerFlow 用户** |
+
+## 2. 设计原则
+
+### Python 中的 DI 最佳实践
+
+1. **函数参数即注入** — 不读全局状态，所有依赖通过参数传入
+2. **Protocol 定义契约** — 不依赖具体类，依赖行为接口
+3. **合理默认值** — `sandbox=True` 等价于 `sandbox=LocalSandboxProvider()`
+4. **分层 API** — 简单用法一行搞定，复杂用法有逃生舱
+
+### 分层架构
+
+```
+    ┌──────────────────────┐
+    │   DeerFlowClient     │  ← 唯一公开 API（chat/stream + 管理）
+    └──────────┬───────────┘
+    ┌──────────▼───────────┐
+    │   make_lead_agent    │  ← 内部：配置驱动工厂
+    └──────────┬───────────┘
+    ┌──────────▼───────────┐
+    │  create_deerflow_agent   │  ← 内部：纯参数工厂
+    └──────────┬───────────┘
+    ┌──────────▼───────────┐
+    │ langchain.create_agent│  ← 底层原语
+    └──────────────────────┘
+```
+
+`DeerFlowClient` 是唯一公开 API。`create_deerflow_agent` 和 `make_lead_agent` 都是内部实现。
+
+用户通过 `DeerFlowClient` 三个参数控制行为：
+
+| 参数 | 类型 | 职责 |
+|------|------|------|
+| `config` | `dict` | 覆盖 config.yaml 的任意配置项 |
+| `features` | `RuntimeFeatures` | 替换内置 middleware 实现 |
+| `extra_middleware` | `list[AgentMiddleware]` | 新增用户 middleware |
+
+不传参数 → 读 config.yaml（现有行为，完全兼容）。
+
+### 核心约束
+
+- **配置覆盖** — `config` dict > config.yaml > 默认值
+- **三层不重叠** — config 传参数，features 传实例，extra_middleware 传新增
+- **向前兼容** — 现有 `DeerFlowClient()` 无参构造行为不变
+- **harness 边界合规** — 不 import `app.*`（`test_harness_boundary.py` 强制）
+
+## 3. API 设计
+
+### 3.1 `DeerFlowClient` — 唯一公开 API
+
+在现有构造函数上增加三个可选参数：
+
+```python
+from deerflow.client import DeerFlowClient
+from deerflow.agents.features import RuntimeFeatures
+
+client = DeerFlowClient(
+    # 1. config — 覆盖 config.yaml 的任意 key（结构和 yaml 一致）
+    config={
+        "models": [{"name": "gpt-4o", "use": "langchain_openai:ChatOpenAI", "model": "gpt-4o", "api_key": "sk-..."}],
+        "memory": {"max_facts": 50, "enabled": True},
+        "title": {"enabled": False},
+        "summarization": {"enabled": True, "trigger": [{"type": "tokens", "value": 10000}]},
+        "sandbox": {"use": "deerflow.sandbox.local:LocalSandboxProvider"},
+    },
+
+    # 2. features — 替换内置 middleware 实现
+    features=RuntimeFeatures(
+        memory=MyMemoryMiddleware(),
+        auto_title=MyTitleMiddleware(),
+    ),
+
+    # 3. extra_middleware — 新增用户 middleware
+    extra_middleware=[
+        MyAuditMiddleware(),       # @Next(SandboxMiddleware)
+        MyFilterMiddleware(),      # @Prev(ClarificationMiddleware)
+    ],
+)
+```
+
+三种典型用法：
+
+```python
+# 用法 1：全读 config.yaml（现有行为，不变）
+client = DeerFlowClient()
+
+# 用法 2：只改参数，不换实现
+client = DeerFlowClient(config={"memory": {"max_facts": 50}})
+
+# 用法 3：替换 middleware 实现
+client = DeerFlowClient(features=RuntimeFeatures(auto_title=MyTitleMiddleware()))
+
+# 用法 4：添加自定义 middleware
+client = DeerFlowClient(extra_middleware=[MyAuditMiddleware()])
+
+# 用法 5：纯 SDK（无 config.yaml）
+client = DeerFlowClient(config={
+    "models": [{"name": "gpt-4o", "use": "langchain_openai:ChatOpenAI", ...}],
+    "tools": [{"name": "bash", "use": "deerflow.sandbox.tools:bash_tool", "group": "bash"}],
+    "memory": {"enabled": True},
+})
+```
+
+内部实现：`final_config = deep_merge(file_config, code_config)`
+
+### 3.2 `create_deerflow_agent` — 内部工厂（不公开）
+
+```python
+def create_deerflow_agent(
+    model: BaseChatModel,
+    tools: list[BaseTool] | None = None,
+    *,
+    system_prompt: str | None = None,
+    middleware: list[AgentMiddleware] | None = None,
+    features: RuntimeFeatures | None = None,
+    state_schema: type | None = None,
+    checkpointer: BaseCheckpointSaver | None = None,
+    name: str = "default",
+) -> CompiledStateGraph:
+    ...
+```
+
+`DeerFlowClient` 内部调用此函数。
+
+### 3.3 `RuntimeFeatures` — 内置 Middleware 替换
+
+只做一件事：用自定义实例替换内置 middleware。不管配置参数（参数走 `config` dict）。
+
+```python
+@dataclass
+class RuntimeFeatures:
+    sandbox: bool | AgentMiddleware = True
+    memory: bool | AgentMiddleware = False
+    summarization: bool | AgentMiddleware = False
+    subagent: bool | AgentMiddleware = False
+    vision: bool | AgentMiddleware = False
+    auto_title: bool | AgentMiddleware = False
+```
+
+| 值 | 含义 |
+|---|---|
+| `True` | 使用默认 middleware（参数从 config 读） |
+| `False` | 关闭该功能 |
+| `AgentMiddleware` 实例 | 替换整个实现 |
+
+不再有 `MemoryOptions`、`TitleOptions` 等。参数调整走 `config` dict：
+
+```python
+# 改 memory 参数 → config
+client = DeerFlowClient(config={"memory": {"max_facts": 50}})
+
+# 换 memory 实现 → features
+client = DeerFlowClient(features=RuntimeFeatures(memory=MyMemoryMiddleware()))
+
+# 两者组合 — config 参数给默认 middleware，但 title 换实现
+client = DeerFlowClient(
+    config={"memory": {"max_facts": 50}},
+    features=RuntimeFeatures(auto_title=MyTitleMiddleware()),
+)
+```
+
+### 3.4 Middleware 链组装
+
+不使用 priority 数字排序。按固定顺序 append 构建列表：
+
+```python
+def _resolve(spec, default_cls):
+    """bool → 默认实现 / AgentMiddleware → 替换"""
+    if isinstance(spec, AgentMiddleware):
+        return spec
+    return default_cls()
+
+def _assemble_from_features(feat: RuntimeFeatures, config: AppConfig) -> tuple[list, list]:
+    chain = []
+    extra_tools = []
+
+    if feat.sandbox:
+        chain.append(_resolve(feat.sandbox, ThreadDataMiddleware))
+        chain.append(UploadsMiddleware())
+        chain.append(_resolve(feat.sandbox, SandboxMiddleware))
+
+    chain.append(DanglingToolCallMiddleware())
+    chain.append(ToolErrorHandlingMiddleware())
+
+    if feat.summarization:
+        chain.append(_resolve(feat.summarization, SummarizationMiddleware))
+    if config.title.enabled and feat.auto_title is not False:
+        chain.append(_resolve(feat.auto_title, TitleMiddleware))
+    if feat.memory:
+        chain.append(_resolve(feat.memory, MemoryMiddleware))
+    if feat.vision:
+        chain.append(ViewImageMiddleware())
+        extra_tools.append(view_image_tool)
+    if feat.subagent:
+        chain.append(_resolve(feat.subagent, SubagentLimitMiddleware))
+        extra_tools.append(task_tool)
+    if feat.loop_detection:
+        chain.append(_resolve(feat.loop_detection, LoopDetectionMiddleware))
+
+    # 插入 extra_middleware（按 @Next/@Prev 声明定位）
+    _insert_extra(chain, extra_middleware)
+
+    # Clarification 永远最后
+    chain.append(ClarificationMiddleware())
+    extra_tools.append(ask_clarification_tool)
+
+    return chain, extra_tools
+```
+
+### 3.6 Middleware 排序策略
+
+**两阶段排序：内置固定 + 外置插入**
+
+1. **内置链固定顺序** — 按代码中的 append 顺序确定，不参与 @Next/@Prev
+2. **外置 middleware 插入** — `extra_middleware` 中的 middleware 通过 @Next/@Prev 声明锚点，自由锚定任意 middleware（内置或其他外置均可）
+3. **冲突检测** — 两个外置 middleware 如果 @Next 或 @Prev 同一个目标 → `ValueError`
+
+**这不是全排序。** 内置链的顺序在代码中已确定，外置 middleware 只做插入操作。这样可以避免内置和外置同时竞争同一个位置的问题。
+
+### 3.7 `@Next` / `@Prev` 装饰器
+
+用户自定义 middleware 通过装饰器声明在链中的位置，类型安全：
+
+```python
+from deerflow.agents import Next, Prev
+
+@Next(SandboxMiddleware)
+class MyAuditMiddleware(AgentMiddleware):
+    """排在 SandboxMiddleware 后面"""
+    def before_agent(self, state, runtime):
+        ...
+
+@Prev(ClarificationMiddleware)
+class MyFilterMiddleware(AgentMiddleware):
+    """排在 ClarificationMiddleware 前面"""
+    def after_model(self, state, runtime):
+        ...
+```
+
+实现：
+
+```python
+def Next(anchor: type[AgentMiddleware]):
+    """装饰器：声明本 middleware 排在 anchor 的下一个位置。"""
+    def decorator(cls: type[AgentMiddleware]) -> type[AgentMiddleware]:
+        cls._next_anchor = anchor
+        return cls
+    return decorator
+
+def Prev(anchor: type[AgentMiddleware]):
+    """装饰器：声明本 middleware 排在 anchor 的前一个位置。"""
+    def decorator(cls: type[AgentMiddleware]) -> type[AgentMiddleware]:
+        cls._prev_anchor = anchor
+        return cls
+    return decorator
+```
+
+`_insert_extra` 算法：
+
+1. 遍历 `extra_middleware`，读取每个 middleware 的 `_next_anchor` / `_prev_anchor`
+2. **冲突检测**：如果两个外置 middleware 的锚点相同（同方向同目标），抛出 `ValueError`
+3. 有锚点的 middleware 插入到目标位置（@Next → 目标之后，@Prev → 目标之前）
+4. 无声明的 middleware 追加到 Clarification 之前
+
+## 4. Middleware 执行模型
+
+### LangChain 的执行规则
+
+```
+before_agent 正序 →  [0] → [1] → ... → [N]
+before_model 正序 →  [0] → [1] → ... → [N]  ← 每轮循环
+         MODEL
+after_model 反序 ←   [N] → [N-1] → ... → [0]  ← 每轮循环
+after_agent 反序 ←   [N] → [N-1] → ... → [0]
+```
+
+`before_agent` / `after_agent` 只跑一次。`before_model` / `after_model` 每轮 tool call 循环都跑。
+
+### DeerFlow 的实际情况
+
+**不是洋葱，是管道。** 11 个 middleware 中只有 SandboxMiddleware 有 before/after 对称（获取/释放），其余只用一个钩子。
+
+硬依赖只有 2 处：
+1. **ThreadData 在 Sandbox 之前** — sandbox 需要线程目录
+2. **Clarification 在列表最后** — after_model 反序时最先执行，第一个拦截 `ask_clarification`
+
+详见 [middleware-execution-flow.md](middleware-execution-flow.md)。
+
+## 5. 使用示例
+
+### 5.1 全读 config.yaml（现有行为不变）
+
+```python
+from deerflow.client import DeerFlowClient
+
+client = DeerFlowClient()
+response = client.chat("Hello")
+```
+
+### 5.2 覆盖配置参数
+
+```python
+client = DeerFlowClient(config={
+    "memory": {"max_facts": 50},
+    "title": {"enabled": False},
+    "summarization": {"trigger": [{"type": "tokens", "value": 10000}]},
+})
+```
+
+### 5.3 纯 SDK（无 config.yaml）
+
+```python
+client = DeerFlowClient(config={
+    "models": [{"name": "gpt-4o", "use": "langchain_openai:ChatOpenAI", "model": "gpt-4o", "api_key": "sk-..."}],
+    "tools": [
+        {"name": "bash", "group": "bash", "use": "deerflow.sandbox.tools:bash_tool"},
+        {"name": "web_search", "group": "web", "use": "deerflow.community.tavily.tools:web_search_tool"},
+    ],
+    "memory": {"enabled": True, "max_facts": 50},
+    "sandbox": {"use": "deerflow.sandbox.local:LocalSandboxProvider"},
+})
+```
+
+### 5.4 替换内置 middleware
+
+```python
+from deerflow.agents.features import RuntimeFeatures
+
+client = DeerFlowClient(
+    features=RuntimeFeatures(
+        memory=MyMemoryMiddleware(),       # 替换
+        auto_title=MyTitleMiddleware(),    # 替换
+        vision=False,                      # 关闭
+    ),
+)
+```
+
+### 5.5 插入自定义 middleware
+
+```python
+from deerflow.agents import Next, Prev
+from deerflow.sandbox.middleware import SandboxMiddleware
+from deerflow.agents.middlewares.clarification_middleware import ClarificationMiddleware
+
+@Next(SandboxMiddleware)
+class MyAuditMiddleware(AgentMiddleware):
+    def before_agent(self, state, runtime):
+        log_sandbox_acquired(state)
+
+@Prev(ClarificationMiddleware)
+class MyFilterMiddleware(AgentMiddleware):
+    def after_model(self, state, runtime):
+        filter_sensitive_output(state)
+
+client = DeerFlowClient(
+    extra_middleware=[MyAuditMiddleware(), MyFilterMiddleware()],
+)
+```
+
+## 6. Phase 1 限制
+
+当前实现中以下 middleware 内部仍读 `config.yaml`，SDK 用户需注意：
+
+| Middleware | 读取内容 | Phase 2 解决方案 |
+|------------|---------|-----------------|
+| TitleMiddleware | `get_title_config()` + `create_chat_model()` | `TitleOptions(model=...)` 参数覆盖 |
+| MemoryMiddleware | `get_memory_config()` | `MemoryOptions(...)` 参数覆盖 |
+| SandboxMiddleware | `get_sandbox_provider()` | `SandboxProvider` 实例直传 |
+
+Phase 1 中 `auto_title` 默认为 `False` 以避免无 config 时崩溃。其他有 config 依赖的 feature 默认也为 `False`。
+
+## 7. 迁移路径
+
+```
+Phase 1（当前 PR #1203）:
+  ✓ 新增 create_deerflow_agent + RuntimeFeatures（内部 API）
+  ✓ 不改 DeerFlowClient 和 make_lead_agent
+  ✗ middleware 内部仍读 config（已知限制）
+
+Phase 2（#1380）:
+  - DeerFlowClient 构造函数增加可选参数（model, tools, features, system_prompt）
+  - Options 参数覆盖 config（MemoryOptions, TitleOptions 等）
+  - @Next/@Prev 装饰器
+  - 补缺失 middleware（Guardrail, TokenUsage, DeferredToolFilter）
+  - make_lead_agent 改为薄壳调 create_deerflow_agent
+
+Phase 3:
+  - SDK 文档和示例
+  - deerflow.client 稳定 API
+```
+
+## 8. 设计决议
+
+| 问题 | 决议 | 理由 |
+|------|------|------|
+| 公开 API | `DeerFlowClient` 唯一入口 | 自顶向下，先改现有 API 再抽底层 |
+| create_deerflow_agent | 内部实现，不公开 | 用户不需要接触 CompiledStateGraph |
+| 配置覆盖 | `config` dict，和 config.yaml 结构一致 | 无新概念，deep merge 覆盖 |
+| middleware 替换 | `features=RuntimeFeatures(memory=MyMW())` | bool 开关 + 实例替换 |
+| middleware 扩展 | `extra_middleware` 独立参数 | 和内置 features 分开 |
+| middleware 定位 | `@Next/@Prev` 装饰器 | 类型安全，不暴露排序细节 |
+| 排序机制 | 顺序 append + @Next/@Prev | priority 数字无功能意义 |
+| 运行时开关 | 保留 `RunnableConfig` | plan_mode、thread_id 等按请求切换 |
+
+## 9. 附录：Middleware 链
+
+```mermaid
+graph TB
+    subgraph BA ["before_agent 正序"]
+        direction TB
+        TD["ThreadData<br/>创建目录"] --> UL["Uploads<br/>扫描文件"] --> SB["Sandbox<br/>获取沙箱"]
+    end
+
+    subgraph BM ["before_model 正序 每轮"]
+        direction TB
+        VI["ViewImage<br/>注入图片"]
+    end
+
+    SB --> VI
+    VI --> M["MODEL"]
+
+    subgraph AM ["after_model 反序 每轮"]
+        direction TB
+        CL["Clarification<br/>拦截中断"] --> LD["LoopDetection<br/>检测循环"] --> SL["SubagentLimit<br/>截断 task"] --> TI["Title<br/>生成标题"] --> DTC["DanglingToolCall<br/>补缺失消息"]
+    end
+
+    M --> CL
+
+    subgraph AA ["after_agent 反序"]
+        direction TB
+        SBR["Sandbox<br/>释放沙箱"] --> MEM["Memory<br/>入队记忆"]
+    end
+
+    DTC --> SBR
+
+    classDef beforeNode fill:#a0a8b5,stroke:#636b7a,color:#2d3239
+    classDef modelNode fill:#b5a8a0,stroke:#7a6b63,color:#2d3239
+    classDef afterModelNode fill:#b5a0a8,stroke:#7a636b,color:#2d3239
+    classDef afterAgentNode fill:#a0b5a8,stroke:#637a6b,color:#2d3239
+
+    class TD,UL,SB,VI beforeNode
+    class M modelNode
+    class CL,LD,SL,TI,DTC afterModelNode
+    class SBR,MEM afterAgentNode
+```
+
+硬依赖：
+- ThreadData → Uploads → Sandbox（before_agent 阶段）
+- Clarification 必须在列表最后（after_model 反序时最先执行）
+
+## 10. 主 Agent 与 Subagent 的 Middleware 差异
+
+主 agent 和 subagent 共享基础 middleware 链（`_build_runtime_middlewares`），subagent 在此基础上做精简：
+
+| Middleware | 主 Agent | Subagent | 说明 |
+|------------|:-------:|:--------:|------|
+| ThreadDataMiddleware | ✓ | ✓ | 共享：创建线程目录 |
+| UploadsMiddleware | ✓ | ✗ | 主 agent 独有：扫描上传文件 |
+| SandboxMiddleware | ✓ | ✓ | 共享：获取/释放沙箱 |
+| DanglingToolCallMiddleware | ✓ | ✗ | 主 agent 独有：补缺失 ToolMessage |
+| GuardrailMiddleware | ✓ | ✓ | 共享：工具调用授权（可选） |
+| ToolErrorHandlingMiddleware | ✓ | ✓ | 共享：工具异常处理 |
+| SummarizationMiddleware | ✓ | ✗ | |
+| TodoMiddleware | ✓ | ✗ | |
+| TitleMiddleware | ✓ | ✗ | |
+| MemoryMiddleware | ✓ | ✗ | |
+| ViewImageMiddleware | ✓ | ✗ | |
+| SubagentLimitMiddleware | ✓ | ✗ | |
+| LoopDetectionMiddleware | ✓ | ✗ | |
+| ClarificationMiddleware | ✓ | ✗ | |
+
+**设计原则**：
+- `RuntimeFeatures`、`@Next/@Prev`、排序机制只作用于**主 agent**
+- Subagent 链短且固定（4 个），不需要动态组装
+- `extra_middleware` 当前只影响主 agent，不传递给 subagent
--- a/deer-flow/backend/docs/rfc-extract-shared-modules.md
+++ b/deer-flow/backend/docs/rfc-extract-shared-modules.md
@@ -0,0 +1,190 @@
+# RFC: Extract Shared Skill Installer and Upload Manager into Harness
+
+## 1. Problem
+
+Gateway (`app/gateway/routers/skills.py`, `uploads.py`) and Client (`deerflow/client.py`) each independently implement the same business logic:
+
+### Skill Installation
+
+| Logic | Gateway (`skills.py`) | Client (`client.py`) |
+|-------|----------------------|---------------------|
+| Zip safety check | `_is_unsafe_zip_member()` | Inline `Path(info.filename).is_absolute()` |
+| Symlink filtering | `_is_symlink_member()` | `p.is_symlink()` post-extraction delete |
+| Zip bomb defence | `total_size += info.file_size` (declared) | `total_size > 100MB` (declared) |
+| macOS metadata filter | `_should_ignore_archive_entry()` | None |
+| Frontmatter validation | `_validate_skill_frontmatter()` | `_validate_skill_frontmatter()` |
+| Duplicate detection | `HTTPException(409)` | `ValueError` |
+
+**Two implementations, inconsistent behaviour**: Gateway streams writes and tracks real decompressed size; Client sums declared `file_size`. Gateway skips symlinks during extraction; Client extracts everything then walks and deletes symlinks.
+
+### Upload Management
+
+| Logic | Gateway (`uploads.py`) | Client (`client.py`) |
+|-------|----------------------|---------------------|
+| Directory access | `get_uploads_dir()` + `mkdir` | `_get_uploads_dir()` + `mkdir` |
+| Filename safety | Inline `Path(f).name` + manual checks | No checks, uses `src_path.name` directly |
+| Duplicate handling | None (overwrites) | None (overwrites) |
+| Listing | Inline `iterdir()` | Inline `os.scandir()` |
+| Deletion | Inline `unlink()` + traversal check | Inline `unlink()` + traversal check |
+| Path traversal | `resolve().relative_to()` | `resolve().relative_to()` |
+
+**The same traversal check is written twice** — any security fix must be applied to both locations.
+
+## 2. Design Principles
+
+### Dependency Direction
+
+```
+app.gateway.routers.skills  ──┐
+app.gateway.routers.uploads ──┤── calls ──→  deerflow.skills.installer
+deerflow.client             ──┘              deerflow.uploads.manager
+```
+
+- Shared modules live in the harness layer (`deerflow.*`), pure business logic, no FastAPI dependency
+- Gateway handles HTTP adaptation (`UploadFile` → bytes, exceptions → `HTTPException`)
+- Client handles local adaptation (`Path` → copy, exceptions → Python exceptions)
+- Satisfies `test_harness_boundary.py` constraint: harness never imports app
+
+### Exception Strategy
+
+| Shared Layer Exception | Gateway Maps To | Client |
+|----------------------|-----------------|--------|
+| `FileNotFoundError` | `HTTPException(404)` | Propagates |
+| `ValueError` | `HTTPException(400)` | Propagates |
+| `SkillAlreadyExistsError` | `HTTPException(409)` | Propagates |
+| `PermissionError` | `HTTPException(403)` | Propagates |
+
+Replaces stringly-typed routing (`"already exists" in str(e)`) with typed exception matching (`SkillAlreadyExistsError`).
+
+## 3. New Modules
+
+### 3.1 `deerflow.skills.installer`
+
+```python
+# Safety checks
+is_unsafe_zip_member(info: ZipInfo) -> bool     # Absolute path / .. traversal
+is_symlink_member(info: ZipInfo) -> bool         # Unix symlink detection
+should_ignore_archive_entry(path: Path) -> bool  # __MACOSX / dotfiles
+
+# Extraction
+safe_extract_skill_archive(zip_ref, dest_path, max_total_size=512MB)
+  # Streaming write, accumulates real bytes (vs declared file_size)
+  # Dual traversal check: member-level + resolve-level
+
+# Directory resolution
+resolve_skill_dir_from_archive(temp_path: Path) -> Path
+  # Auto-enters single directory, filters macOS metadata
+
+# Install entry point
+install_skill_from_archive(zip_path, *, skills_root=None) -> dict
+  # is_file() pre-check before extension validation
+  # SkillAlreadyExistsError replaces ValueError
+
+# Exception
+class SkillAlreadyExistsError(ValueError)
+```
+
+### 3.2 `deerflow.uploads.manager`
+
+```python
+# Directory management
+get_uploads_dir(thread_id: str) -> Path      # Pure path, no side effects
+ensure_uploads_dir(thread_id: str) -> Path   # Creates directory (for write paths)
+
+# Filename safety
+normalize_filename(filename: str) -> str
+  # Path.name extraction + rejects ".." / "." / backslash / >255 bytes
+deduplicate_filename(name: str, seen: set) -> str
+  # _N suffix increment for dedup, mutates seen in place
+
+# Path safety
+validate_path_traversal(path: Path, base: Path) -> None
+  # resolve().relative_to(), raises PermissionError on failure
+
+# File operations
+list_files_in_dir(directory: Path) -> dict
+  # scandir with stat inside context (no re-stat)
+  # follow_symlinks=False to prevent metadata leakage
+  # Non-existent directory returns empty list
+delete_file_safe(base_dir: Path, filename: str) -> dict
+  # Validates traversal first, then unlinks
+
+# URL helpers
+upload_artifact_url(thread_id, filename) -> str   # Percent-encoded for HTTP safety
+upload_virtual_path(filename) -> str               # Sandbox-internal path
+enrich_file_listing(result, thread_id) -> dict     # Adds URLs, stringifies sizes
+```
+
+## 4. Changes
+
+### 4.1 Gateway Slimming
+
+**`app/gateway/routers/skills.py`**:
+- Remove `_is_unsafe_zip_member`, `_is_symlink_member`, `_safe_extract_skill_archive`, `_should_ignore_archive_entry`, `_resolve_skill_dir_from_archive_root` (~80 lines)
+- `install_skill` route becomes a single call to `install_skill_from_archive(path)`
+- Exception mapping: `SkillAlreadyExistsError → 409`, `ValueError → 400`, `FileNotFoundError → 404`
+
+**`app/gateway/routers/uploads.py`**:
+- Remove inline `get_uploads_dir` (replaced by `ensure_uploads_dir`/`get_uploads_dir`)
+- `upload_files` uses `normalize_filename()` instead of inline safety checks
+- `list_uploaded_files` uses `list_files_in_dir()` + enrichment
+- `delete_uploaded_file` uses `delete_file_safe()` + companion markdown cleanup
+
+### 4.2 Client Slimming
+
+**`deerflow/client.py`**:
+- Remove `_get_uploads_dir` static method
+- Remove ~50 lines of inline zip handling in `install_skill`
+- `install_skill` delegates to `install_skill_from_archive()`
+- `upload_files` uses `deduplicate_filename()` + `ensure_uploads_dir()`
+- `list_uploads` uses `get_uploads_dir()` + `list_files_in_dir()`
+- `delete_upload` uses `get_uploads_dir()` + `delete_file_safe()`
+- `update_mcp_config` / `update_skill` now reset `_agent_config_key = None`
+
+### 4.3 Read/Write Path Separation
+
+| Operation | Function | Creates dir? |
+|-----------|----------|:------------:|
+| upload (write) | `ensure_uploads_dir()` | Yes |
+| list (read) | `get_uploads_dir()` | No |
+| delete (read) | `get_uploads_dir()` | No |
+
+Read paths no longer have `mkdir` side effects — non-existent directories return empty lists.
+
+## 5. Security Improvements
+
+| Improvement | Before | After |
+|-------------|--------|-------|
+| Zip bomb detection | Sum of declared `file_size` | Streaming write, accumulates real bytes |
+| Symlink handling | Gateway skips / Client deletes post-extract | Unified skip + log |
+| Traversal check | Member-level only | Member-level + `resolve().is_relative_to()` |
+| Filename backslash | Gateway checks / Client doesn't | Unified rejection |
+| Filename length | No check | Reject > 255 bytes (OS limit) |
+| thread_id validation | None | Reject unsafe filesystem characters |
+| Listing symlink leak | `follow_symlinks=True` (default) | `follow_symlinks=False` |
+| 409 status routing | `"already exists" in str(e)` | `SkillAlreadyExistsError` type match |
+| Artifact URL encoding | Raw filename in URL | `urllib.parse.quote()` |
+
+## 6. Alternatives Considered
+
+| Alternative | Why Not |
+|-------------|---------|
+| Keep logic in Gateway, Client calls Gateway via HTTP | Adds network dependency to embedded Client; defeats the purpose of `DeerFlowClient` as an in-process API |
+| Abstract base class with Gateway/Client subclasses | Over-engineered for what are pure functions; no polymorphism needed |
+| Move everything into `client.py` and have Gateway import it | Violates harness/app boundary — Client is in harness, but Gateway-specific models (Pydantic response types) should stay in app layer |
+| Merge Gateway and Client into one module | They serve different consumers (HTTP vs in-process) with different adaptation needs |
+
+## 7. Breaking Changes
+
+**None.** All public APIs (Gateway HTTP endpoints, `DeerFlowClient` methods) retain their existing signatures and return formats. The `SkillAlreadyExistsError` is a subclass of `ValueError`, so existing `except ValueError` handlers still catch it.
+
+## 8. Tests
+
+| Module | Test File | Count |
+|--------|-----------|:-----:|
+| `skills.installer` | `tests/test_skills_installer.py` | 22 |
+| `uploads.manager` | `tests/test_uploads_manager.py` | 20 |
+| `client` hardening | `tests/test_client.py` (new cases) | ~40 |
+| `client` e2e | `tests/test_client_e2e.py` (new file) | ~20 |
+
+Coverage: unsafe zip / symlink / zip bomb / frontmatter / duplicate / extension / macOS filter / normalize / deduplicate / traversal / list / delete / agent invalidation / upload lifecycle / thread isolation / URL encoding / config pollution.
--- a/deer-flow/backend/docs/rfc-grep-glob-tools.md
+++ b/deer-flow/backend/docs/rfc-grep-glob-tools.md
@@ -0,0 +1,446 @@
+# [RFC] 在 DeerFlow 中增加 `grep` 与 `glob` 文件搜索工具
+
+## Summary
+
+我认为这个方向是对的，而且值得做。
+
+如果 DeerFlow 想更接近 Claude Code 这类 coding agent 的实际工作流，仅有 `ls` / `read_file` / `write_file` / `str_replace` 还不够。模型在进入修改前，通常还需要两类能力：
+
+- `glob`: 快速按路径模式找文件
+- `grep`: 快速按内容模式找候选位置
+
+这两类工具的价值，不是“功能上 bash 也能做”，而是它们能以更低 token 成本、更强约束、更稳定的输出格式，替代模型频繁走 `bash find` / `bash grep` / `rg` 的习惯。
+
+但前提是实现方式要对：**它们应该是只读、结构化、受限、可审计的原生工具，而不是对 shell 命令的简单包装。**
+
+## Problem
+
+当前 DeerFlow 的文件工具层主要覆盖：
+
+- `ls`: 浏览目录结构
+- `read_file`: 读取文件内容
+- `write_file`: 写文件
+- `str_replace`: 做局部字符串替换
+- `bash`: 兜底执行命令
+
+这套能力能完成任务，但在代码库探索阶段效率不高。
+
+典型问题：
+
+1. 模型想找 “所有 `*.tsx` 的 page 文件” 时，只能反复 `ls` 多层目录，或者退回 `bash find`
+2. 模型想找 “某个 symbol / 文案 / 配置键在哪里出现” 时，只能逐文件 `read_file`，或者退回 `bash grep` / `rg`
+3. 一旦退回 `bash`，工具调用就失去结构化输出，结果也更难做裁剪、分页、审计和跨 sandbox 一致化
+4. 对没有开启 host bash 的本地模式，`bash` 甚至可能不可用，此时缺少足够强的只读检索能力
+
+结论：DeerFlow 现在缺的不是“再多一个 shell 命令”，而是**文件系统检索层**。
+
+## Goals
+
+- 为 agent 提供稳定的路径搜索和内容搜索能力
+- 减少对 `bash` 的依赖，特别是在仓库探索阶段
+- 保持与现有 sandbox 安全模型一致
+- 输出格式结构化，便于模型后续串联 `read_file` / `str_replace`
+- 让本地 sandbox、容器 sandbox、未来 MCP 文件系统工具都能遵守同一语义
+
+## Non-Goals
+
+- 不做通用 shell 兼容层
+- 不暴露完整 grep/find/rg CLI 语法
+- 不在第一版支持二进制检索、复杂 PCRE 特性、上下文窗口高亮渲染等重功能
+- 不把它做成“任意磁盘搜索”，仍然只允许在 DeerFlow 已授权的路径内执行
+
+## Why This Is Worth Doing
+
+参考 Claude Code 这一类 agent 的设计思路，`glob` 和 `grep` 的核心价值不是新能力本身，而是把“探索代码库”的常见动作从开放式 shell 降到受控工具层。
+
+这样有几个直接收益：
+
+1. **更低的模型负担**
+   模型不需要自己拼 `find`, `grep`, `rg`, `xargs`, quoting 等命令细节。
+
+2. **更稳定的跨环境行为**
+   本地、Docker、AIO sandbox 不必依赖容器里是否装了 `rg`，也不会因为 shell 差异导致行为漂移。
+
+3. **更强的安全与审计**
+   调用参数就是“搜索什么、在哪搜、最多返回多少”，天然比任意命令更容易审计和限流。
+
+4. **更好的 token 效率**
+   `grep` 返回的是命中摘要而不是整段文件，模型只对少数候选路径再调用 `read_file`。
+
+5. **对 `tool_search` 友好**
+   当 DeerFlow 持续扩展工具集时，`grep` / `glob` 会成为非常高频的基础工具，值得保留为 built-in，而不是让模型总是退回通用 bash。
+
+## Proposal
+
+增加两个 built-in sandbox tools：
+
+- `glob`
+- `grep`
+
+推荐继续放在：
+
+- `backend/packages/harness/deerflow/sandbox/tools.py`
+
+并在 `config.example.yaml` 中默认加入 `file:read` 组。
+
+### 1. `glob` 工具
+
+用途：按路径模式查找文件或目录。
+
+建议 schema：
+
+```python
+@tool("glob", parse_docstring=True)
+def glob_tool(
+    runtime: ToolRuntime[ContextT, ThreadState],
+    description: str,
+    pattern: str,
+    path: str,
+    include_dirs: bool = False,
+    max_results: int = 200,
+) -> str:
+    ...
+```
+
+参数语义：
+
+- `description`: 与现有工具保持一致
+- `pattern`: glob 模式，例如 `**/*.py`、`src/**/test_*.ts`
+- `path`: 搜索根目录，必须是绝对路径
+- `include_dirs`: 是否返回目录
+- `max_results`: 最大返回条数，防止一次性打爆上下文
+
+建议返回格式：
+
+```text
+Found 3 paths under /mnt/user-data/workspace
+1. /mnt/user-data/workspace/backend/app.py
+2. /mnt/user-data/workspace/backend/tests/test_app.py
+3. /mnt/user-data/workspace/scripts/build.py
+```
+
+如果后续想更适合前端消费，也可以改成 JSON 字符串；但第一版为了兼容现有工具风格，返回可读文本即可。
+
+### 2. `grep` 工具
+
+用途：按内容模式搜索文件，返回命中位置摘要。
+
+建议 schema：
+
+```python
+@tool("grep", parse_docstring=True)
+def grep_tool(
+    runtime: ToolRuntime[ContextT, ThreadState],
+    description: str,
+    pattern: str,
+    path: str,
+    glob: str | None = None,
+    literal: bool = False,
+    case_sensitive: bool = False,
+    max_results: int = 100,
+) -> str:
+    ...
+```
+
+参数语义：
+
+- `pattern`: 搜索词或正则
+- `path`: 搜索根目录，必须是绝对路径
+- `glob`: 可选路径过滤，例如 `**/*.py`
+- `literal`: 为 `True` 时按普通字符串匹配，不解释为正则
+- `case_sensitive`: 是否大小写敏感
+- `max_results`: 最大返回命中数，不是文件数
+
+建议返回格式：
+
+```text
+Found 4 matches under /mnt/user-data/workspace
+/mnt/user-data/workspace/backend/config.py:12: TOOL_GROUPS = [...]
+/mnt/user-data/workspace/backend/config.py:48: def load_tool_config(...):
+/mnt/user-data/workspace/backend/tools.py:91: "tool_groups"
+/mnt/user-data/workspace/backend/tests/test_config.py:22: assert "tool_groups" in data
+```
+
+第一版建议只返回：
+
+- 文件路径
+- 行号
+- 命中行摘要
+
+不返回上下文块，避免结果过大。模型如果需要上下文，再调用 `read_file(path, start_line, end_line)`。
+
+## Design Principles
+
+### A. 不做 shell wrapper
+
+不建议把 `grep` 实现为：
+
+```python
+subprocess.run("grep ...")
+```
+
+也不建议在容器里直接拼 `find` / `rg` 命令。
+
+原因：
+
+- 会引入 shell quoting 和注入面
+- 会依赖不同 sandbox 内镜像是否安装同一套命令
+- Windows / macOS / Linux 行为不一致
+- 很难稳定控制输出条数与格式
+
+正确方向是：
+
+- `glob` 使用 Python 标准库路径遍历
+- `grep` 使用 Python 逐文件扫描
+- 输出由 DeerFlow 自己格式化
+
+如果未来为了性能考虑要优先调用 `rg`，也应该封装在 provider 内部，并保证外部语义不变，而不是把 CLI 暴露给模型。
+
+### B. 继续沿用 DeerFlow 的路径权限模型
+
+这两个工具必须复用当前 `ls` / `read_file` 的路径校验逻辑：
+
+- 本地模式走 `validate_local_tool_path(..., read_only=True)`
+- 支持 `/mnt/skills/...`
+- 支持 `/mnt/acp-workspace/...`
+- 支持 thread workspace / uploads / outputs 的虚拟路径解析
+- 明确拒绝越权路径与 path traversal
+
+也就是说，它们属于 **file:read**，不是 `bash` 的替代越权入口。
+
+### C. 结果必须硬限制
+
+没有硬限制的 `glob` / `grep` 很容易炸上下文。
+
+建议第一版至少限制：
+
+- `glob.max_results` 默认 200，最大 1000
+- `grep.max_results` 默认 100，最大 500
+- 单行摘要最大长度，例如 200 字符
+- 二进制文件跳过
+- 超大文件跳过，例如单文件大于 1 MB 或按配置控制
+
+此外，命中数超过阈值时应返回：
+
+- 已展示的条数
+- 被截断的事实
+- 建议用户缩小搜索范围
+
+例如：
+
+```text
+Found more than 100 matches, showing first 100. Narrow the path or add a glob filter.
+```
+
+### D. 工具语义要彼此互补
+
+推荐模型工作流应该是：
+
+1. `glob` 找候选文件
+2. `grep` 找候选位置
+3. `read_file` 读局部上下文
+4. `str_replace` / `write_file` 执行修改
+
+这样工具边界清晰，也更利于 prompt 中教模型形成稳定习惯。
+
+## Implementation Approach
+
+## Option A: 直接在 `sandbox/tools.py` 实现第一版
+
+这是我推荐的起步方案。
+
+做法：
+
+- 在 `sandbox/tools.py` 新增 `glob_tool` 与 `grep_tool`
+- 在 local sandbox 场景直接使用 Python 文件系统 API
+- 在非 local sandbox 场景，优先也通过 DeerFlow 自己控制的路径访问层实现
+
+优点：
+
+- 改动小
+- 能尽快验证 agent 效果
+- 不需要先改 `Sandbox` 抽象
+
+缺点：
+
+- `tools.py` 会继续变胖
+- 如果未来想在 provider 侧做性能优化，需要再抽象一次
+
+## Option B: 先扩展 `Sandbox` 抽象
+
+例如新增：
+
+```python
+class Sandbox(ABC):
+    def glob(self, path: str, pattern: str, include_dirs: bool = False, max_results: int = 200) -> list[str]:
+        ...
+
+    def grep(
+        self,
+        path: str,
+        pattern: str,
+        *,
+        glob: str | None = None,
+        literal: bool = False,
+        case_sensitive: bool = False,
+        max_results: int = 100,
+    ) -> list[GrepMatch]:
+        ...
+```
+
+优点：
+
+- 抽象更干净
+- 容器 / 远程 sandbox 可以各自优化
+
+缺点：
+
+- 首次引入成本更高
+- 需要同步改所有 sandbox provider
+
+结论：
+
+**第一版建议走 Option A，等工具价值验证后再下沉到 `Sandbox` 抽象层。**
+
+## Detailed Behavior
+
+### `glob` 行为
+
+- 输入根目录不存在：返回清晰错误
+- 根路径不是目录：返回清晰错误
+- 模式非法：返回清晰错误
+- 结果为空：返回 `No files matched`
+- 默认忽略项应尽量与当前 `list_dir` 对齐，例如：
+  - `.git`
+  - `node_modules`
+  - `__pycache__`
+  - `.venv`
+  - 构建产物目录
+
+这里建议抽一个共享 ignore 集，避免 `ls` 与 `glob` 结果风格不一致。
+
+### `grep` 行为
+
+- 默认只扫描文本文件
+- 检测到二进制文件直接跳过
+- 对超大文件直接跳过或只扫前 N KB
+- regex 编译失败时返回参数错误
+- 输出中的路径继续使用虚拟路径，而不是暴露宿主真实路径
+- 建议默认按文件路径、行号排序，保持稳定输出
+
+## Prompting Guidance
+
+如果引入这两个工具，建议同步更新系统提示中的文件操作建议：
+
+- 查找文件名模式时优先用 `glob`
+- 查找代码符号、配置项、文案时优先用 `grep`
+- 只有在工具不足以完成目标时才退回 `bash`
+
+否则模型仍会习惯性先调用 `bash`。
+
+## Risks
+
+### 1. 与 `bash` 能力重叠
+
+这是事实，但不是问题。
+
+`ls` 和 `read_file` 也都能被 `bash` 替代，但我们仍然保留它们，因为结构化工具更适合 agent。
+
+### 2. 性能问题
+
+在大仓库上，纯 Python `grep` 可能比 `rg` 慢。
+
+缓解方式：
+
+- 第一版先加结果上限和文件大小上限
+- 路径上强制要求 root path
+- 提供 `glob` 过滤缩小扫描范围
+- 后续如有必要，在 provider 内部做 `rg` 优化，但保持同一 schema
+
+### 3. 忽略规则不一致
+
+如果 `ls` 能看到的路径，`glob` 却看不到，模型会困惑。
+
+缓解方式：
+
+- 统一 ignore 规则
+- 在文档里明确“默认跳过常见依赖和构建目录”
+
+### 4. 正则搜索过于复杂
+
+如果第一版就支持大量 grep 方言，边界会很乱。
+
+缓解方式：
+
+- 第一版只支持 Python `re`
+- 并提供 `literal=True` 的简单模式
+
+## Alternatives Considered
+
+### A. 不增加工具，完全依赖 `bash`
+
+不推荐。
+
+这会让 DeerFlow 在代码探索体验上持续落后，也削弱无 bash 或受限 bash 场景下的能力。
+
+### B. 只加 `glob`，不加 `grep`
+
+不推荐。
+
+只解决“找文件”，没有解决“找位置”。模型最终还是会退回 `bash grep`。
+
+### C. 只加 `grep`，不加 `glob`
+
+也不推荐。
+
+`grep` 缺少路径模式过滤时，扫描范围经常太大；`glob` 是它的天然前置工具。
+
+### D. 直接接入 MCP filesystem server 的搜索能力
+
+短期不推荐作为主路径。
+
+MCP 可以是补充，但 `glob` / `grep` 作为 DeerFlow 的基础 coding tool，最好仍然是 built-in，这样才能在默认安装中稳定可用。
+
+## Acceptance Criteria
+
+- `config.example.yaml` 中可默认启用 `glob` 与 `grep`
+- 两个工具归属 `file:read` 组
+- 本地 sandbox 下严格遵守现有路径权限
+- 输出不泄露宿主机真实路径
+- 大结果集会被截断并明确提示
+- 模型可以通过 `glob -> grep -> read_file -> str_replace` 完成典型改码流
+- 在禁用 host bash 的本地模式下，仓库探索能力明显提升
+
+## Rollout Plan
+
+1. 在 `sandbox/tools.py` 中实现 `glob_tool` 与 `grep_tool`
+2. 抽取与 `list_dir` 一致的 ignore 规则，避免行为漂移
+3. 在 `config.example.yaml` 默认加入工具配置
+4. 为本地路径校验、虚拟路径映射、结果截断、二进制跳过补测试
+5. 更新 README / backend docs / prompt guidance
+6. 收集实际 agent 调用数据，再决定是否下沉到 `Sandbox` 抽象
+
+## Suggested Config
+
+```yaml
+tools:
+  - name: glob
+    group: file:read
+    use: deerflow.sandbox.tools:glob_tool
+
+  - name: grep
+    group: file:read
+    use: deerflow.sandbox.tools:grep_tool
+```
+
+## Final Recommendation
+
+结论是：**可以加，而且应该加。**
+
+但我会明确卡三个边界：
+
+1. `grep` / `glob` 必须是 built-in 的只读结构化工具
+2. 第一版不要做 shell wrapper，不要把 CLI 方言直接暴露给模型
+3. 先在 `sandbox/tools.py` 验证价值，再考虑是否下沉到 `Sandbox` provider 抽象
+
+如果按这个方向做，它会明显提升 DeerFlow 在 coding / repo exploration 场景下的可用性，而且风险可控。
--- a/deer-flow/backend/docs/summarization.md
+++ b/deer-flow/backend/docs/summarization.md
@@ -0,0 +1,353 @@
+# Conversation Summarization
+
+DeerFlow includes automatic conversation summarization to handle long conversations that approach model token limits. When enabled, the system automatically condenses older messages while preserving recent context.
+
+## Overview
+
+The summarization feature uses LangChain's `SummarizationMiddleware` to monitor conversation history and trigger summarization based on configurable thresholds. When activated, it:
+
+1. Monitors message token counts in real-time
+2. Triggers summarization when thresholds are met
+3. Keeps recent messages intact while summarizing older exchanges
+4. Maintains AI/Tool message pairs together for context continuity
+5. Injects the summary back into the conversation
+
+## Configuration
+
+Summarization is configured in `config.yaml` under the `summarization` key:
+
+```yaml
+summarization:
+  enabled: true
+  model_name: null  # Use default model or specify a lightweight model
+
+  # Trigger conditions (OR logic - any condition triggers summarization)
+  trigger:
+    - type: tokens
+      value: 4000
+    # Additional triggers (optional)
+    # - type: messages
+    #   value: 50
+    # - type: fraction
+    #   value: 0.8  # 80% of model's max input tokens
+
+  # Context retention policy
+  keep:
+    type: messages
+    value: 20
+
+  # Token trimming for summarization call
+  trim_tokens_to_summarize: 4000
+
+  # Custom summary prompt (optional)
+  summary_prompt: null
+```
+
+### Configuration Options
+
+#### `enabled`
+- **Type**: Boolean
+- **Default**: `false`
+- **Description**: Enable or disable automatic summarization
+
+#### `model_name`
+- **Type**: String or null
+- **Default**: `null` (uses default model)
+- **Description**: Model to use for generating summaries. Recommended to use a lightweight, cost-effective model like `gpt-4o-mini` or equivalent.
+
+#### `trigger`
+- **Type**: Single `ContextSize` or list of `ContextSize` objects
+- **Required**: At least one trigger must be specified when enabled
+- **Description**: Thresholds that trigger summarization. Uses OR logic - summarization runs when ANY threshold is met.
+
+**ContextSize Types:**
+
+1. **Token-based trigger**: Activates when token count reaches the specified value
+   ```yaml
+   trigger:
+     type: tokens
+     value: 4000
+   ```
+
+2. **Message-based trigger**: Activates when message count reaches the specified value
+   ```yaml
+   trigger:
+     type: messages
+     value: 50
+   ```
+
+3. **Fraction-based trigger**: Activates when token usage reaches a percentage of the model's maximum input tokens
+   ```yaml
+   trigger:
+     type: fraction
+     value: 0.8  # 80% of max input tokens
+   ```
+
+**Multiple Triggers:**
+```yaml
+trigger:
+  - type: tokens
+    value: 4000
+  - type: messages
+    value: 50
+```
+
+#### `keep`
+- **Type**: `ContextSize` object
+- **Default**: `{type: messages, value: 20}`
+- **Description**: Specifies how much recent conversation history to preserve after summarization.
+
+**Examples:**
+```yaml
+# Keep most recent 20 messages
+keep:
+  type: messages
+  value: 20
+
+# Keep most recent 3000 tokens
+keep:
+  type: tokens
+  value: 3000
+
+# Keep most recent 30% of model's max input tokens
+keep:
+  type: fraction
+  value: 0.3
+```
+
+#### `trim_tokens_to_summarize`
+- **Type**: Integer or null
+- **Default**: `4000`
+- **Description**: Maximum tokens to include when preparing messages for the summarization call itself. Set to `null` to skip trimming (not recommended for very long conversations).
+
+#### `summary_prompt`
+- **Type**: String or null
+- **Default**: `null` (uses LangChain's default prompt)
+- **Description**: Custom prompt template for generating summaries. The prompt should guide the model to extract the most important context.
+
+**Default Prompt Behavior:**
+The default LangChain prompt instructs the model to:
+- Extract highest quality/most relevant context
+- Focus on information critical to the overall goal
+- Avoid repeating completed actions
+- Return only the extracted context
+
+## How It Works
+
+### Summarization Flow
+
+1. **Monitoring**: Before each model call, the middleware counts tokens in the message history
+2. **Trigger Check**: If any configured threshold is met, summarization is triggered
+3. **Message Partitioning**: Messages are split into:
+   - Messages to summarize (older messages beyond the `keep` threshold)
+   - Messages to preserve (recent messages within the `keep` threshold)
+4. **Summary Generation**: The model generates a concise summary of the older messages
+5. **Context Replacement**: The message history is updated:
+   - All old messages are removed
+   - A single summary message is added
+   - Recent messages are preserved
+6. **AI/Tool Pair Protection**: The system ensures AI messages and their corresponding tool messages stay together
+
+### Token Counting
+
+- Uses approximate token counting based on character count
+- For Anthropic models: ~3.3 characters per token
+- For other models: Uses LangChain's default estimation
+- Can be customized with a custom `token_counter` function
+
+### Message Preservation
+
+The middleware intelligently preserves message context:
+
+- **Recent Messages**: Always kept intact based on `keep` configuration
+- **AI/Tool Pairs**: Never split - if a cutoff point falls within tool messages, the system adjusts to keep the entire AI + Tool message sequence together
+- **Summary Format**: Summary is injected as a HumanMessage with the format:
+  ```
+  Here is a summary of the conversation to date:
+
+  [Generated summary text]
+  ```
+
+## Best Practices
+
+### Choosing Trigger Thresholds
+
+1. **Token-based triggers**: Recommended for most use cases
+   - Set to 60-80% of your model's context window
+   - Example: For 8K context, use 4000-6000 tokens
+
+2. **Message-based triggers**: Useful for controlling conversation length
+   - Good for applications with many short messages
+   - Example: 50-100 messages depending on average message length
+
+3. **Fraction-based triggers**: Ideal when using multiple models
+   - Automatically adapts to each model's capacity
+   - Example: 0.8 (80% of model's max input tokens)
+
+### Choosing Retention Policy (`keep`)
+
+1. **Message-based retention**: Best for most scenarios
+   - Preserves natural conversation flow
+   - Recommended: 15-25 messages
+
+2. **Token-based retention**: Use when precise control is needed
+   - Good for managing exact token budgets
+   - Recommended: 2000-4000 tokens
+
+3. **Fraction-based retention**: For multi-model setups
+   - Automatically scales with model capacity
+   - Recommended: 0.2-0.4 (20-40% of max input)
+
+### Model Selection
+
+- **Recommended**: Use a lightweight, cost-effective model for summaries
+  - Examples: `gpt-4o-mini`, `claude-haiku`, or equivalent
+  - Summaries don't require the most powerful models
+  - Significant cost savings on high-volume applications
+
+- **Default**: If `model_name` is `null`, uses the default model
+  - May be more expensive but ensures consistency
+  - Good for simple setups
+
+### Optimization Tips
+
+1. **Balance triggers**: Combine token and message triggers for robust handling
+   ```yaml
+   trigger:
+     - type: tokens
+       value: 4000
+     - type: messages
+       value: 50
+   ```
+
+2. **Conservative retention**: Keep more messages initially, adjust based on performance
+   ```yaml
+   keep:
+     type: messages
+     value: 25  # Start higher, reduce if needed
+   ```
+
+3. **Trim strategically**: Limit tokens sent to summarization model
+   ```yaml
+   trim_tokens_to_summarize: 4000  # Prevents expensive summarization calls
+   ```
+
+4. **Monitor and iterate**: Track summary quality and adjust configuration
+
+## Troubleshooting
+
+### Summary Quality Issues
+
+**Problem**: Summaries losing important context
+
+**Solutions**:
+1. Increase `keep` value to preserve more messages
+2. Decrease trigger thresholds to summarize earlier
+3. Customize `summary_prompt` to emphasize key information
+4. Use a more capable model for summarization
+
+### Performance Issues
+
+**Problem**: Summarization calls taking too long
+
+**Solutions**:
+1. Use a faster model for summaries (e.g., `gpt-4o-mini`)
+2. Reduce `trim_tokens_to_summarize` to send less context
+3. Increase trigger thresholds to summarize less frequently
+
+### Token Limit Errors
+
+**Problem**: Still hitting token limits despite summarization
+
+**Solutions**:
+1. Lower trigger thresholds to summarize earlier
+2. Reduce `keep` value to preserve fewer messages
+3. Check if individual messages are very large
+4. Consider using fraction-based triggers
+
+## Implementation Details
+
+### Code Structure
+
+- **Configuration**: `packages/harness/deerflow/config/summarization_config.py`
+- **Integration**: `packages/harness/deerflow/agents/lead_agent/agent.py`
+- **Middleware**: Uses `langchain.agents.middleware.SummarizationMiddleware`
+
+### Middleware Order
+
+Summarization runs after ThreadData and Sandbox initialization but before Title and Clarification:
+
+1. ThreadDataMiddleware
+2. SandboxMiddleware
+3. **SummarizationMiddleware** ← Runs here
+4. TitleMiddleware
+5. ClarificationMiddleware
+
+### State Management
+
+- Summarization is stateless - configuration is loaded once at startup
+- Summaries are added as regular messages in the conversation history
+- The checkpointer persists the summarized history automatically
+
+## Example Configurations
+
+### Minimal Configuration
+```yaml
+summarization:
+  enabled: true
+  trigger:
+    type: tokens
+    value: 4000
+  keep:
+    type: messages
+    value: 20
+```
+
+### Production Configuration
+```yaml
+summarization:
+  enabled: true
+  model_name: gpt-4o-mini  # Lightweight model for cost efficiency
+  trigger:
+    - type: tokens
+      value: 6000
+    - type: messages
+      value: 75
+  keep:
+    type: messages
+    value: 25
+  trim_tokens_to_summarize: 5000
+```
+
+### Multi-Model Configuration
+```yaml
+summarization:
+  enabled: true
+  model_name: gpt-4o-mini
+  trigger:
+    type: fraction
+    value: 0.7  # 70% of model's max input
+  keep:
+    type: fraction
+    value: 0.3  # Keep 30% of max input
+  trim_tokens_to_summarize: 4000
+```
+
+### Conservative Configuration (High Quality)
+```yaml
+summarization:
+  enabled: true
+  model_name: gpt-4  # Use full model for high-quality summaries
+  trigger:
+    type: tokens
+    value: 8000
+  keep:
+    type: messages
+    value: 40  # Keep more context
+  trim_tokens_to_summarize: null  # No trimming
+```
+
+## References
+
+- [LangChain Summarization Middleware Documentation](https://docs.langchain.com/oss/python/langchain/middleware/built-in#summarization)
+- [LangChain Source Code](https://github.com/langchain-ai/langchain)
--- a/deer-flow/backend/docs/task_tool_improvements.md
+++ b/deer-flow/backend/docs/task_tool_improvements.md
@@ -0,0 +1,174 @@
+# Task Tool Improvements
+
+## Overview
+
+The task tool has been improved to eliminate wasteful LLM polling. Previously, when using background tasks, the LLM had to repeatedly call `task_status` to poll for completion, causing unnecessary API requests.
+
+## Changes Made
+
+### 1. Removed `run_in_background` Parameter
+
+The `run_in_background` parameter has been removed from the `task` tool. All subagent tasks now run asynchronously by default, but the tool handles completion automatically.
+
+**Before:**
+```python
+# LLM had to manage polling
+task_id = task(
+    subagent_type="bash",
+    prompt="Run tests",
+    description="Run tests",
+    run_in_background=True
+)
+# Then LLM had to poll repeatedly:
+while True:
+    status = task_status(task_id)
+    if completed:
+        break
+```
+
+**After:**
+```python
+# Tool blocks until complete, polling happens in backend
+result = task(
+    subagent_type="bash",
+    prompt="Run tests",
+    description="Run tests"
+)
+# Result is available immediately after the call returns
+```
+
+### 2. Backend Polling
+
+The `task_tool` now:
+- Starts the subagent task asynchronously
+- Polls for completion in the backend (every 2 seconds)
+- Blocks the tool call until completion
+- Returns the final result directly
+
+This means:
+- ✅ LLM makes only ONE tool call
+- ✅ No wasteful LLM polling requests
+- ✅ Backend handles all status checking
+- ✅ Timeout protection (5 minutes max)
+
+### 3. Removed `task_status` from LLM Tools
+
+The `task_status_tool` is no longer exposed to the LLM. It's kept in the codebase for potential internal/debugging use, but the LLM cannot call it.
+
+### 4. Updated Documentation
+
+- Updated `SUBAGENT_SECTION` in `prompt.py` to remove all references to background tasks and polling
+- Simplified usage examples
+- Made it clear that the tool automatically waits for completion
+
+## Implementation Details
+
+### Polling Logic
+
+Located in `packages/harness/deerflow/tools/builtins/task_tool.py`:
+
+```python
+# Start background execution
+task_id = executor.execute_async(prompt)
+
+# Poll for task completion in backend
+while True:
+    result = get_background_task_result(task_id)
+
+    # Check if task completed or failed
+    if result.status == SubagentStatus.COMPLETED:
+        return f"[Subagent: {subagent_type}]\n\n{result.result}"
+    elif result.status == SubagentStatus.FAILED:
+        return f"[Subagent: {subagent_type}] Task failed: {result.error}"
+
+    # Wait before next poll
+    time.sleep(2)
+
+    # Timeout protection (5 minutes)
+    if poll_count > 150:
+        return "Task timed out after 5 minutes"
+```
+
+### Execution Timeout
+
+In addition to polling timeout, subagent execution now has a built-in timeout mechanism:
+
+**Configuration** (`packages/harness/deerflow/subagents/config.py`):
+```python
+@dataclass
+class SubagentConfig:
+    # ...
+    timeout_seconds: int = 300  # 5 minutes default
+```
+
+**Thread Pool Architecture**:
+
+To avoid nested thread pools and resource waste, we use two dedicated thread pools:
+
+1. **Scheduler Pool** (`_scheduler_pool`):
+   - Max workers: 4
+   - Purpose: Orchestrates background task execution
+   - Runs `run_task()` function that manages task lifecycle
+
+2. **Execution Pool** (`_execution_pool`):
+   - Max workers: 8 (larger to avoid blocking)
+   - Purpose: Actual subagent execution with timeout support
+   - Runs `execute()` method that invokes the agent
+
+**How it works**:
+```python
+# In execute_async():
+_scheduler_pool.submit(run_task)  # Submit orchestration task
+
+# In run_task():
+future = _execution_pool.submit(self.execute, task)  # Submit execution
+exec_result = future.result(timeout=timeout_seconds)  # Wait with timeout
+```
+
+**Benefits**:
+- ✅ Clean separation of concerns (scheduling vs execution)
+- ✅ No nested thread pools
+- ✅ Timeout enforcement at the right level
+- ✅ Better resource utilization
+
+**Two-Level Timeout Protection**:
+1. **Execution Timeout**: Subagent execution itself has a 5-minute timeout (configurable in SubagentConfig)
+2. **Polling Timeout**: Tool polling has a 5-minute timeout (30 polls × 10 seconds)
+
+This ensures that even if subagent execution hangs, the system won't wait indefinitely.
+
+### Benefits
+
+1. **Reduced API Costs**: No more repeated LLM requests for polling
+2. **Simpler UX**: LLM doesn't need to manage polling logic
+3. **Better Reliability**: Backend handles all status checking consistently
+4. **Timeout Protection**: Two-level timeout prevents infinite waiting (execution + polling)
+
+## Testing
+
+To verify the changes work correctly:
+
+1. Start a subagent task that takes a few seconds
+2. Verify the tool call blocks until completion
+3. Verify the result is returned directly
+4. Verify no `task_status` calls are made
+
+Example test scenario:
+```python
+# This should block for ~10 seconds then return result
+result = task(
+    subagent_type="bash",
+    prompt="sleep 10 && echo 'Done'",
+    description="Test task"
+)
+# result should contain "Done"
+```
+
+## Migration Notes
+
+For users/code that previously used `run_in_background=True`:
+- Simply remove the parameter
+- Remove any polling logic
+- The tool will automatically wait for completion
+
+No other changes needed - the API is backward compatible (minus the removed parameter).