Vendored deer-flow upstream (bytedance/deer-flow) plus prompt-injection hardening: - New deerflow.security package: content_delimiter, html_cleaner, sanitizer (8 layers — invisible chars, control chars, symbols, NFC, PUA, tag chars, horizontal whitespace collapse with newline/tab preservation, length cap) - New deerflow.community.searx package: web_search, web_fetch, image_search backed by a private SearX instance, every external string sanitized and wrapped in <<<EXTERNAL_UNTRUSTED_CONTENT>>> delimiters - All native community web providers (ddg_search, tavily, exa, firecrawl, jina_ai, infoquest, image_search) replaced with hard-fail stubs that raise NativeWebToolDisabledError at import time, so a misconfigured tool.use path fails loud rather than silently falling back to unsanitized output - Native client back-doors (jina_client.py, infoquest_client.py) stubbed too - Native-tool tests quarantined under tests/_disabled_native/ (collect_ignore_glob via local conftest.py) - Sanitizer Layer 7 fix: only collapse horizontal whitespace, preserve newlines and tabs so list/table structure survives - Hardened runtime config.yaml references only the searx-backed tools - Factory overlay (backend/) kept in sync with deer-flow tree as a reference / source See HARDENING.md for the full audit trail and verification steps.
1.9 KiB
1.9 KiB
Memory System Improvements
This document tracks memory injection behavior and roadmap status.
Status (As Of 2026-03-10)
Implemented in main:
- Accurate token counting via
tiktokeninformat_memory_for_injection. - Facts are injected into prompt memory context.
- Facts are ranked by confidence (descending).
- Injection respects
max_injection_tokensbudget.
Planned / not yet merged:
- TF-IDF similarity-based fact retrieval.
current_contextinput for context-aware scoring.- Configurable similarity/confidence weights (
similarity_weight,confidence_weight). - Middleware/runtime wiring for context-aware retrieval before each model call.
Current Behavior
Function today:
def format_memory_for_injection(memory_data: dict[str, Any], max_tokens: int = 2000) -> str:
Current injection format:
User Contextsection fromuser.*.summaryHistorysection fromhistory.*.summaryFactssection fromfacts[], sorted by confidence, appended until token budget is reached
Token counting:
- Uses
tiktoken(cl100k_base) when available - Falls back to
len(text) // 4if tokenizer import fails
Known Gap
Previous versions of this document described TF-IDF/context-aware retrieval as if it were already shipped.
That was not accurate for main and caused confusion.
Issue reference: #1059
Roadmap (Planned)
Planned scoring strategy:
final_score = (similarity * 0.6) + (confidence * 0.4)
Planned integration shape:
- Extract recent conversational context from filtered user/final-assistant turns.
- Compute TF-IDF cosine similarity between each fact and current context.
- Rank by weighted score and inject under token budget.
- Fall back to confidence-only ranking if context is unavailable.
Validation
Current regression coverage includes:
- facts inclusion in memory injection output
- confidence ordering
- token-budget-limited fact inclusion
Tests:
backend/tests/test_memory_prompt_injection.py