Initial commit: hardened DeerFlow factory

Vendored deer-flow upstream (bytedance/deer-flow) plus prompt-injection
hardening:

- New deerflow.security package: content_delimiter, html_cleaner,
  sanitizer (8 layers — invisible chars, control chars, symbols, NFC,
  PUA, tag chars, horizontal whitespace collapse with newline/tab
  preservation, length cap)
- New deerflow.community.searx package: web_search, web_fetch,
  image_search backed by a private SearX instance, every external
  string sanitized and wrapped in <<<EXTERNAL_UNTRUSTED_CONTENT>>>
  delimiters
- All native community web providers (ddg_search, tavily, exa,
  firecrawl, jina_ai, infoquest, image_search) replaced with hard-fail
  stubs that raise NativeWebToolDisabledError at import time, so a
  misconfigured tool.use path fails loud rather than silently falling
  back to unsanitized output
- Native client back-doors (jina_client.py, infoquest_client.py)
  stubbed too
- Native-tool tests quarantined under tests/_disabled_native/
  (collect_ignore_glob via local conftest.py)
- Sanitizer Layer 7 fix: only collapse horizontal whitespace, preserve
  newlines and tabs so list/table structure survives
- Hardened runtime config.yaml references only the searx-backed tools
- Factory overlay (backend/) kept in sync with deer-flow tree as a
  reference / source

See HARDENING.md for the full audit trail and verification steps.
This commit is contained in:
2026-04-12 14:23:57 +02:00
commit 6de0bf9f5b
889 changed files with 173052 additions and 0 deletions

View File

@@ -0,0 +1,192 @@
[!NOTE] Generate this report in user's own language.
# {TITLE}
- **Research Date:** {DATE}
- **Timestamp:** {TIMESTAMP}
- **Confidence Level:** {CONFIDENCE_LEVEL}
- **Subject:** {SUBJECT_DESCRIPTION}
---
## Repository Information
- **Name:** {REPOSITORY_NAME}
- **Description:** {REPOSITORY_DESCRIPTION}
- **URL:** {REPOSITORY_URL}
- **Stars:** {REPOSITORY_STARS}
- **Forks:** {REPOSITORY_FORKS}
- **Open Issues:** {REPOSITORY_OPEN_ISSUES}
- **Language(s):** {REPOSITORY_LANGUAGES}
- **License:** {REPOSITORY_LICENSE}
- **Created At:** {REPOSITORY_CREATED_AT}
- **Updated At:** {REPOSITORY_UPDATED_AT}
- **Pushed At:** {REPOSITORY_PUSHED_AT}
- **Topics:** {REPOSITORY_TOPICS}
---
## Executive Summary
{EXECUTIVE_SUMMARY}
**IMPORTANT**: Include inline citations using `[citation:Title](URL)` format after each claim. Example:
"The project gained 10k stars in 3 months [citation:GitHub Stats](https://github.com/owner/repo)."
---
## Complete Chronological Timeline
### PHASE 1: {PHASE_1_NAME}
#### {PHASE_1_PERIOD}
{PHASE_1_CONTENT}
### PHASE 2: {PHASE_2_NAME}
#### {PHASE_2_PERIOD}
{PHASE_2_CONTENT}
### PHASE 3: {PHASE_3_NAME}
#### {PHASE_3_PERIOD}
{PHASE_3_CONTENT}
---
## Key Analysis
**IMPORTANT**: Support each analysis point with inline citations `[citation:Title](URL)`.
### {ANALYSIS_SECTION_1_TITLE}
{ANALYSIS_SECTION_1_CONTENT}
### {ANALYSIS_SECTION_2_TITLE}
{ANALYSIS_SECTION_2_CONTENT}
---
## Architecture / System Overview
```mermaid
flowchart TD
A[Component A] --> B[Component B]
B --> C[Component C]
C --> D[Component D]
```
{ARCHITECTURE_DESCRIPTION}
---
## Metrics & Impact Analysis
### Growth Trajectory
```
{METRICS_TIMELINE}
```
### Key Metrics
| Metric | Value | Assessment |
|--------|-------|------------|
| {METRIC_1} | {VALUE_1} | {ASSESSMENT_1} |
| {METRIC_2} | {VALUE_2} | {ASSESSMENT_2} |
| {METRIC_3} | {VALUE_3} | {ASSESSMENT_3} |
---
## Comparative Analysis
### Feature Comparison
| Feature | {SUBJECT} | {COMPETITOR_1} | {COMPETITOR_2} |
|---------|-----------|----------------|----------------|
| {FEATURE_1} | {SUBJ_F1} | {COMP1_F1} | {COMP2_F1} |
| {FEATURE_2} | {SUBJ_F2} | {COMP1_F2} | {COMP2_F2} |
| {FEATURE_3} | {SUBJ_F3} | {COMP1_F3} | {COMP2_F3} |
### Market Positioning
{MARKET_POSITIONING}
---
## Strengths & Weaknesses
### Strengths
{STRENGTHS}
### Areas for Improvement
{WEAKNESSES}
---
## Key Success Factors
{SUCCESS_FACTORS}
---
## Sources
### Primary Sources
{PRIMARY_SOURCES}
### Media Coverage
{MEDIA_SOURCES}
### Academic / Technical Sources
{ACADEMIC_SOURCES}
### Community Sources
{COMMUNITY_SOURCES}
---
## Confidence Assessment
**High Confidence (90%+) Claims:**
{HIGH_CONFIDENCE_CLAIMS}
**Medium Confidence (70-89%) Claims:**
{MEDIUM_CONFIDENCE_CLAIMS}
**Lower Confidence (50-69%) Claims:**
{LOW_CONFIDENCE_CLAIMS}
---
## Research Methodology
This report was compiled using:
1. **Multi-source web search** - Broad discovery and targeted queries
2. **GitHub repository analysis** - Commits, issues, PRs, activity metrics
3. **Content extraction** - Official docs, technical articles, media coverage
4. **Cross-referencing** - Verification across independent sources
5. **Chronological reconstruction** - Timeline from timestamped data
6. **Confidence scoring** - Claims weighted by source reliability
**Research Depth:** {RESEARCH_DEPTH}
**Time Scope:** {TIME_SCOPE}
**Geographic Scope:** {GEOGRAPHIC_SCOPE}
---
**Report Prepared By:** Github Deep Research by DeerFlow
**Date:** {REPORT_DATE}
**Report Version:** 1.0
**Status:** Complete