Initial commit: hardened DeerFlow factory

Vendored deer-flow upstream (bytedance/deer-flow) plus prompt-injection hardening: - New deerflow.security package: content_delimiter, html_cleaner, sanitizer (8 layers — invisible chars, control chars, symbols, NFC, PUA, tag chars, horizontal whitespace collapse with newline/tab preservation, length cap) - New deerflow.community.searx package: web_search, web_fetch, image_search backed by a private SearX instance, every external string sanitized and wrapped in <<<EXTERNAL_UNTRUSTED_CONTENT>>> delimiters - All native community web providers (ddg_search, tavily, exa, firecrawl, jina_ai, infoquest, image_search) replaced with hard-fail stubs that raise NativeWebToolDisabledError at import time, so a misconfigured tool.use path fails loud rather than silently falling back to unsanitized output - Native client back-doors (jina_client.py, infoquest_client.py) stubbed too - Native-tool tests quarantined under tests/_disabled_native/ (collect_ignore_glob via local conftest.py) - Sanitizer Layer 7 fix: only collapse horizontal whitespace, preserve newlines and tabs so list/table structure survives - Hardened runtime config.yaml references only the searx-backed tools - Factory overlay (backend/) kept in sync with deer-flow tree as a reference / source See HARDENING.md for the full audit trail and verification steps.
2026-04-12 14:23:57 +02:00
commit 6de0bf9f5b
889 changed files with 173052 additions and 0 deletions
--- a/deer-flow/skills/public/bootstrap/references/conversation-guide.md
+++ b/deer-flow/skills/public/bootstrap/references/conversation-guide.md
@@ -0,0 +1,82 @@
+# Conversation Guide
+
+Detailed strategies for each onboarding phase. Read this before your first response.
+
+## Phase 1 — Hello
+
+**Goal:** Establish preferred language. That's it. Keep it light.
+
+Open with a brief multilingual greeting (3–5 languages), then ask one question: what language should we use? Don't add anything else — let the user settle in.
+
+Once they choose, switch immediately and seamlessly. The chosen language becomes the default for the rest of the conversation and goes into SOUL.md.
+
+**Extraction:** Preferred language.
+
+## Phase 2 — You
+
+**Goal:** Learn who the user is, what they need, and what to call the AI.
+
+This phase typically takes 2 rounds:
+
+**Round A — Identity & Pain.** Ask who they are and what drains them. Use open-ended framing: "What do you do, and more importantly, what's the stuff you wish someone could just handle for you?" The pain points reveal what the AI should *do*. Their word choices reveal who they *are*.
+
+**Round B — Name & Relationship.** Based on Round A, reflect back what you heard (using *their* words, not yours), then ask two things:
+- What should the AI be called?
+- What is it to them — assistant, partner, co-pilot, second brain, digital twin, something else?
+
+The relationship framing is critical. "Assistant" and "partner" produce very different SOUL.md files. Pay attention to the emotional undertone.
+
+**Merge opportunity:** If the user volunteers their role, pain points, and a name all at once, skip Round B and move to Phase 3.
+
+**Extraction:** User's name, role, pain points, AI name, relationship framing.
+
+## Phase 3 — Personality
+
+**Goal:** Define how the AI behaves and communicates.
+
+This is the meatiest phase. Typically 2 rounds:
+
+**Round A — Traits & Pushback.** By now you've observed the user's own style. Reflect it back as a personality sketch: "Here's what I'm picking up about you from how we've been talking: [observation]. Am I off?" Then ask the big question: should the AI ever disagree with them?
+
+This is where you get:
+- Core personality traits (as behavioral rules)
+- Honesty / pushback preferences
+- Any "never do X" boundaries
+
+**Round B — Voice & Language.** Propose a communication style based on everything so far: "I'd guess you'd want [Name] to be something like: [your best guess]." Let them correct. Also ask about language-switching rules — e.g., technical docs in English, casual chat in another language.
+
+**Merge opportunity:** Direct users often answer both in one shot. If they do, move on.
+
+**Extraction:** Core traits, communication style, pushback preference, language rules, autonomy level.
+
+## Phase 4 — Depth
+
+**Goal:** Aspirations, failure philosophy, and anything else.
+
+This phase is adaptive. Pick 1–2 questions from:
+
+- **Autonomy & risk:** How much freedom should the AI have? Play safe or go big?
+- **Failure philosophy:** When it makes a mistake — fix quietly, explain what happened, or never repeat it?
+- **Big picture:** What are they building toward? Where does all this lead?
+- **Blind spots:** Any weakness they'd want the AI to quietly compensate for?
+- **Dealbreakers:** Any "if [Name] ever does this, we're done" moments?
+- **Personal layer:** Anything beyond work that the AI should know?
+
+Don't ask all of these. Pick based on what's still missing from the extraction tracker and what feels natural in the flow.
+
+**Extraction:** Failure philosophy, long-term vision, blind spots, boundaries.
+
+## Conversation Techniques
+
+**Mirroring.** Use the user's own words when reflecting back. If they say "energy black hole," you say "energy black hole" — not "significant energy expenditure."
+
+**Genuine reactions.** Don't just extract data. React: "That's interesting because..." / "I didn't expect that" / "So basically you want [Name] to be the person who..."
+
+**Observation-based proposals.** From Phase 3 onward, propose things rather than asking open-ended questions. "Based on how we've been talking, I'd say..." is more effective than "What personality do you want?"
+
+**Pacing signals.** Watch for:
+- Short answers → they want to move faster. Probe once, then advance.
+- Long, detailed answers → they're invested. Acknowledge the richness, distill the key points.
+- "I don't know" → offer 2–3 concrete options to choose from.
+
+**Graceful skipping.** If the user says "I don't care about that" or gives a minimal answer to a non-required field, move on without pressure.