Initial commit: hardened DeerFlow factory

Vendored deer-flow upstream (bytedance/deer-flow) plus prompt-injection hardening: - New deerflow.security package: content_delimiter, html_cleaner, sanitizer (8 layers — invisible chars, control chars, symbols, NFC, PUA, tag chars, horizontal whitespace collapse with newline/tab preservation, length cap) - New deerflow.community.searx package: web_search, web_fetch, image_search backed by a private SearX instance, every external string sanitized and wrapped in <<<EXTERNAL_UNTRUSTED_CONTENT>>> delimiters - All native community web providers (ddg_search, tavily, exa, firecrawl, jina_ai, infoquest, image_search) replaced with hard-fail stubs that raise NativeWebToolDisabledError at import time, so a misconfigured tool.use path fails loud rather than silently falling back to unsanitized output - Native client back-doors (jina_client.py, infoquest_client.py) stubbed too - Native-tool tests quarantined under tests/_disabled_native/ (collect_ignore_glob via local conftest.py) - Sanitizer Layer 7 fix: only collapse horizontal whitespace, preserve newlines and tabs so list/table structure survives - Hardened runtime config.yaml references only the searx-backed tools - Factory overlay (backend/) kept in sync with deer-flow tree as a reference / source See HARDENING.md for the full audit trail and verification steps.
2026-04-12 14:23:57 +02:00
commit 6de0bf9f5b
889 changed files with 173052 additions and 0 deletions
--- a/deer-flow/CONTRIBUTING.md
+++ b/deer-flow/CONTRIBUTING.md
@@ -0,0 +1,335 @@
+# Contributing to DeerFlow
+
+Thank you for your interest in contributing to DeerFlow! This guide will help you set up your development environment and understand our development workflow.
+
+## Development Environment Setup
+
+We offer two development environments. **Docker is recommended** for the most consistent and hassle-free experience.
+
+### Option 1: Docker Development (Recommended)
+
+Docker provides a consistent, isolated environment with all dependencies pre-configured. No need to install Node.js, Python, or nginx on your local machine.
+
+#### Prerequisites
+
+- Docker Desktop or Docker Engine
+- pnpm (for caching optimization)
+
+#### Setup Steps
+
+1. **Configure the application**:
+   ```bash
+   # Copy example configuration
+   cp config.example.yaml config.yaml
+
+   # Set your API keys
+   export OPENAI_API_KEY="your-key-here"
+   # or edit config.yaml directly
+   ```
+
+2. **Initialize Docker environment** (first time only):
+   ```bash
+   make docker-init
+   ```
+   This will:
+   - Build Docker images
+   - Install frontend dependencies (pnpm)
+   - Install backend dependencies (uv)
+   - Share pnpm cache with host for faster builds
+
+3. **Start development services**:
+   ```bash
+   make docker-start
+   ```
+   `make docker-start` reads `config.yaml` and starts `provisioner` only for provisioner/Kubernetes sandbox mode.
+
+   All services will start with hot-reload enabled:
+   - Frontend changes are automatically reloaded
+   - Backend changes trigger automatic restart
+   - LangGraph server supports hot-reload
+
+4. **Access the application**:
+   - Web Interface: http://localhost:2026
+   - API Gateway: http://localhost:2026/api/*
+   - LangGraph: http://localhost:2026/api/langgraph/*
+
+#### Docker Commands
+
+```bash
+# Build the custom k3s image (with pre-cached sandbox image)
+make docker-init
+# Start Docker services (mode-aware, localhost:2026)
+make docker-start
+# Stop Docker development services
+make docker-stop
+# View Docker development logs
+make docker-logs
+# View Docker frontend logs
+make docker-logs-frontend
+# View Docker gateway logs
+make docker-logs-gateway
+```
+
+If Docker builds are slow in your network, you can override the default package registries before running `make docker-init` or `make docker-start`:
+
+```bash
+export UV_INDEX_URL=https://pypi.org/simple
+export NPM_REGISTRY=https://registry.npmjs.org
+```
+
+#### Recommended host resources
+
+Use these as practical starting points for development and review environments:
+
+| Scenario | Starting point | Recommended | Notes |
+|---------|-----------|------------|-------|
+| `make dev` on one machine | 4 vCPU, 8 GB RAM | 8 vCPU, 16 GB RAM | Best when DeerFlow uses hosted model APIs. |
+| `make docker-start` review environment | 4 vCPU, 8 GB RAM | 8 vCPU, 16 GB RAM | Docker image builds and sandbox containers need extra headroom. |
+| Shared Linux test server | 8 vCPU, 16 GB RAM | 16 vCPU, 32 GB RAM | Prefer this for heavier multi-agent runs or multiple reviewers. |
+
+`2 vCPU / 4 GB` environments often fail to start reliably or become unresponsive under normal DeerFlow workloads.
+
+#### Linux: Docker daemon permission denied
+
+If `make docker-init`, `make docker-start`, or `make docker-stop` fails on Linux with an error like below, your current user likely does not have permission to access the Docker daemon socket:
+
+```text
+unable to get image 'deer-flow-dev-langgraph': permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock
+```
+
+Recommended fix: add your current user to the `docker` group so Docker commands work without `sudo`.
+
+1. Confirm the `docker` group exists:
+   ```bash
+   getent group docker
+   ```
+2. Add your current user to the `docker` group:
+   ```bash
+   sudo usermod -aG docker $USER
+   ```
+3. Apply the new group membership. The most reliable option is to log out completely and then log back in. If you want to refresh the current shell session instead, run:
+   ```bash
+   newgrp docker
+   ```
+4. Verify Docker access:
+   ```bash
+   docker ps
+   ```
+5. Retry the DeerFlow command:
+   ```bash
+   make docker-stop
+   make docker-start
+   ```
+
+If `docker ps` still reports a permission error after `usermod`, fully log out and log back in before retrying.
+
+#### Docker Architecture
+
+```
+Host Machine
+  ↓
+Docker Compose (deer-flow-dev)
+  ├→ nginx (port 2026) ← Reverse proxy
+  ├→ web (port 3000) ← Frontend with hot-reload
+  ├→ api (port 8001) ← Gateway API with hot-reload
+   ├→ langgraph (port 2024) ← LangGraph server with hot-reload
+   └→ provisioner (optional, port 8002) ← Started only in provisioner/K8s sandbox mode
+```
+
+**Benefits of Docker Development**:
+- ✅ Consistent environment across different machines
+- ✅ No need to install Node.js, Python, or nginx locally
+- ✅ Isolated dependencies and services
+- ✅ Easy cleanup and reset
+- ✅ Hot-reload for all services
+- ✅ Production-like environment
+
+### Option 2: Local Development
+
+If you prefer to run services directly on your machine:
+
+#### Prerequisites
+
+Check that you have all required tools installed:
+
+```bash
+make check
+```
+
+Required tools:
+- Node.js 22+
+- pnpm
+- uv (Python package manager)
+- nginx
+
+#### Setup Steps
+
+1. **Configure the application** (same as Docker setup above)
+
+2. **Install dependencies**:
+   ```bash
+   make install
+   ```
+
+3. **Run development server** (starts all services with nginx):
+   ```bash
+   make dev
+   ```
+
+4. **Access the application**:
+   - Web Interface: http://localhost:2026
+   - All API requests are automatically proxied through nginx
+
+#### Manual Service Control
+
+If you need to start services individually:
+
+1. **Start backend services**:
+   ```bash
+   # Terminal 1: Start LangGraph Server (port 2024)
+   cd backend
+   make dev
+
+   # Terminal 2: Start Gateway API (port 8001)
+   cd backend
+   make gateway
+
+   # Terminal 3: Start Frontend (port 3000)
+   cd frontend
+   pnpm dev
+   ```
+
+2. **Start nginx**:
+   ```bash
+   make nginx
+   # or directly: nginx -c $(pwd)/docker/nginx/nginx.local.conf -g 'daemon off;'
+   ```
+
+3. **Access the application**:
+   - Web Interface: http://localhost:2026
+
+#### Nginx Configuration
+
+The nginx configuration provides:
+- Unified entry point on port 2026
+- Routes `/api/langgraph/*` to LangGraph Server (2024)
+- Routes other `/api/*` endpoints to Gateway API (8001)
+- Routes non-API requests to Frontend (3000)
+- Centralized CORS handling
+- SSE/streaming support for real-time agent responses
+- Optimized timeouts for long-running operations
+
+## Project Structure
+
+```
+deer-flow/
+├── config.example.yaml      # Configuration template
+├── extensions_config.example.json  # MCP and Skills configuration template
+├── Makefile                 # Build and development commands
+├── scripts/
+│   └── docker.sh           # Docker management script
+├── docker/
+│   ├── docker-compose-dev.yaml  # Docker Compose configuration
+│   └── nginx/
+│       ├── nginx.conf      # Nginx config for Docker
+│       └── nginx.local.conf # Nginx config for local dev
+├── backend/                 # Backend application
+│   ├── src/
+│   │   ├── gateway/        # Gateway API (port 8001)
+│   │   ├── agents/         # LangGraph agents (port 2024)
+│   │   ├── mcp/            # Model Context Protocol integration
+│   │   ├── skills/         # Skills system
+│   │   └── sandbox/        # Sandbox execution
+│   ├── docs/               # Backend documentation
+│   └── Makefile            # Backend commands
+├── frontend/               # Frontend application
+│   └── Makefile            # Frontend commands
+└── skills/                 # Agent skills
+    ├── public/             # Public skills
+    └── custom/             # Custom skills
+```
+
+## Architecture
+
+```
+Browser
+  ↓
+Nginx (port 2026) ← Unified entry point
+  ├→ Frontend (port 3000) ← / (non-API requests)
+  ├→ Gateway API (port 8001) ← /api/models, /api/mcp, /api/skills, /api/threads/*/artifacts
+  └→ LangGraph Server (port 2024) ← /api/langgraph/* (agent interactions)
+```
+
+## Development Workflow
+
+1. **Create a feature branch**:
+   ```bash
+   git checkout -b feature/your-feature-name
+   ```
+
+2. **Make your changes** with hot-reload enabled
+
+3. **Format and lint your code** (CI will reject unformatted code):
+   ```bash
+   # Backend
+   cd backend
+   make format   # ruff check --fix + ruff format
+
+   # Frontend
+   cd frontend
+   pnpm format:write   # Prettier
+   ```
+
+4. **Test your changes** thoroughly
+
+5. **Commit your changes**:
+   ```bash
+   git add .
+   git commit -m "feat: description of your changes"
+   ```
+
+6. **Push and create a Pull Request**:
+   ```bash
+   git push origin feature/your-feature-name
+   ```
+
+## Testing
+
+```bash
+# Backend tests
+cd backend
+uv run pytest
+
+# Frontend checks
+cd frontend
+pnpm check
+```
+
+### PR Regression Checks
+
+Every pull request runs the backend regression workflow at [.github/workflows/backend-unit-tests.yml](.github/workflows/backend-unit-tests.yml), including:
+
+- `tests/test_provisioner_kubeconfig.py`
+- `tests/test_docker_sandbox_mode_detection.py`
+
+## Code Style
+
+- **Backend (Python)**: We use `ruff` for linting and formatting. Run `make format` before committing.
+- **Frontend (TypeScript)**: We use ESLint and Prettier. Run `pnpm format:write` before committing.
+- CI enforces formatting — PRs with unformatted code will fail the lint check.
+
+## Documentation
+
+- [Configuration Guide](backend/docs/CONFIGURATION.md) - Setup and configuration
+- [Architecture Overview](backend/CLAUDE.md) - Technical architecture
+- [MCP Setup Guide](backend/docs/MCP_SERVER.md) - Model Context Protocol configuration
+
+## Need Help?
+
+- Check existing [Issues](https://github.com/bytedance/deer-flow/issues)
+- Read the [Documentation](backend/docs/)
+- Ask questions in [Discussions](https://github.com/bytedance/deer-flow/discussions)
+
+## License
+
+By contributing to DeerFlow, you agree that your contributions will be licensed under the [MIT License](./LICENSE).