Initial commit: hardened DeerFlow factory
Vendored deer-flow upstream (bytedance/deer-flow) plus prompt-injection hardening: - New deerflow.security package: content_delimiter, html_cleaner, sanitizer (8 layers — invisible chars, control chars, symbols, NFC, PUA, tag chars, horizontal whitespace collapse with newline/tab preservation, length cap) - New deerflow.community.searx package: web_search, web_fetch, image_search backed by a private SearX instance, every external string sanitized and wrapped in <<<EXTERNAL_UNTRUSTED_CONTENT>>> delimiters - All native community web providers (ddg_search, tavily, exa, firecrawl, jina_ai, infoquest, image_search) replaced with hard-fail stubs that raise NativeWebToolDisabledError at import time, so a misconfigured tool.use path fails loud rather than silently falling back to unsanitized output - Native client back-doors (jina_client.py, infoquest_client.py) stubbed too - Native-tool tests quarantined under tests/_disabled_native/ (collect_ignore_glob via local conftest.py) - Sanitizer Layer 7 fix: only collapse horizontal whitespace, preserve newlines and tabs so list/table structure survives - Hardened runtime config.yaml references only the searx-backed tools - Factory overlay (backend/) kept in sync with deer-flow tree as a reference / source See HARDENING.md for the full audit trail and verification steps.
This commit is contained in:
166
deer-flow/skills/public/github-deep-research/SKILL.md
Normal file
166
deer-flow/skills/public/github-deep-research/SKILL.md
Normal file
@@ -0,0 +1,166 @@
|
||||
---
|
||||
name: github-deep-research
|
||||
description: Conduct multi-round deep research on any GitHub Repo. Use when users request comprehensive analysis, timeline reconstruction, competitive analysis, or in-depth investigation of GitHub. Produces structured markdown reports with executive summaries, chronological timelines, metrics analysis, and Mermaid diagrams. Triggers on Github repository URL or open source projects.
|
||||
---
|
||||
|
||||
# GitHub Deep Research Skill
|
||||
|
||||
Multi-round research combining GitHub API, web_search, web_fetch to produce comprehensive markdown reports.
|
||||
|
||||
## Research Workflow
|
||||
|
||||
- Round 1: GitHub API
|
||||
- Round 2: Discovery
|
||||
- Round 3: Deep Investigation
|
||||
- Round 4: Deep Dive
|
||||
|
||||
## Core Methodology
|
||||
|
||||
### Query Strategy
|
||||
|
||||
**Broad to Narrow**: Start with GitHub API, then general queries, refine based on findings.
|
||||
|
||||
```
|
||||
Round 1: GitHub API
|
||||
Round 2: "{topic} overview"
|
||||
Round 3: "{topic} architecture", "{topic} vs alternatives"
|
||||
Round 4: "{topic} issues", "{topic} roadmap", "site:github.com {topic}"
|
||||
```
|
||||
|
||||
**Source Prioritization**:
|
||||
1. Official docs/repos (highest weight)
|
||||
2. Technical blogs (Medium, Dev.to)
|
||||
3. News articles (verified outlets)
|
||||
4. Community discussions (Reddit, HN)
|
||||
5. Social media (lowest weight, for sentiment)
|
||||
|
||||
### Research Rounds
|
||||
|
||||
**Round 1 - GitHub API**
|
||||
Directly execute `scripts/github_api.py` without `read_file()`:
|
||||
```bash
|
||||
python /path/to/skill/scripts/github_api.py <owner> <repo> summary
|
||||
python /path/to/skill/scripts/github_api.py <owner> <repo> readme
|
||||
python /path/to/skill/scripts/github_api.py <owner> <repo> tree
|
||||
```
|
||||
|
||||
**Available commands (the last argument of `github_api.py`):**
|
||||
- summary
|
||||
- info
|
||||
- readme
|
||||
- tree
|
||||
- languages
|
||||
- contributors
|
||||
- commits
|
||||
- issues
|
||||
- prs
|
||||
- releases
|
||||
|
||||
**Round 2 - Discovery (3-5 web_search)**
|
||||
- Get overview and identify key terms
|
||||
- Find official website/repo
|
||||
- Identify main players/competitors
|
||||
|
||||
**Round 3 - Deep Investigation (5-10 web_search + web_fetch)**
|
||||
- Technical architecture details
|
||||
- Timeline of key events
|
||||
- Community sentiment
|
||||
- Use web_fetch on valuable URLs for full content
|
||||
|
||||
**Round 4 - Deep Dive**
|
||||
- Analyze commit history for timeline
|
||||
- Review issues/PRs for feature evolution
|
||||
- Check contributor activity
|
||||
|
||||
## Report Structure
|
||||
|
||||
Follow template in `assets/report_template.md`:
|
||||
|
||||
1. **Metadata Block** - Date, confidence level, subject
|
||||
2. **Executive Summary** - 2-3 sentence overview with key metrics
|
||||
3. **Chronological Timeline** - Phased breakdown with dates
|
||||
4. **Key Analysis Sections** - Topic-specific deep dives
|
||||
5. **Metrics & Comparisons** - Tables, growth charts
|
||||
6. **Strengths & Weaknesses** - Balanced assessment
|
||||
7. **Sources** - Categorized references
|
||||
8. **Confidence Assessment** - Claims by confidence level
|
||||
9. **Methodology** - Research approach used
|
||||
|
||||
### Mermaid Diagrams
|
||||
|
||||
Include diagrams where helpful:
|
||||
|
||||
**Timeline (Gantt)**:
|
||||
```mermaid
|
||||
gantt
|
||||
title Project Timeline
|
||||
dateFormat YYYY-MM-DD
|
||||
section Phase 1
|
||||
Development :2025-01-01, 2025-03-01
|
||||
section Phase 2
|
||||
Launch :2025-03-01, 2025-04-01
|
||||
```
|
||||
|
||||
**Architecture (Flowchart)**:
|
||||
```mermaid
|
||||
flowchart TD
|
||||
A[User] --> B[Coordinator]
|
||||
B --> C[Planner]
|
||||
C --> D[Research Team]
|
||||
D --> E[Reporter]
|
||||
```
|
||||
|
||||
**Comparison (Pie/Bar)**:
|
||||
```mermaid
|
||||
pie title Market Share
|
||||
"Project A" : 45
|
||||
"Project B" : 30
|
||||
"Others" : 25
|
||||
```
|
||||
|
||||
## Confidence Scoring
|
||||
|
||||
Assign confidence based on source quality:
|
||||
|
||||
| Confidence | Criteria |
|
||||
|------------|----------|
|
||||
| High (90%+) | Official docs, GitHub data, multiple corroborating sources |
|
||||
| Medium (70-89%) | Single reliable source, recent articles |
|
||||
| Low (50-69%) | Social media, unverified claims, outdated info |
|
||||
|
||||
## Output
|
||||
|
||||
Save report as: `research_{topic}_{YYYYMMDD}.md`
|
||||
|
||||
### Formatting Rules
|
||||
|
||||
- Chinese content: Use full-width punctuation(,。:;!?)
|
||||
- Technical terms: Provide Wiki/doc URL on first mention
|
||||
- Tables: Use for metrics, comparisons
|
||||
- Code blocks: For technical examples
|
||||
- Mermaid: For architecture, timelines, flows
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Start with official sources** - Repo, docs, company blog
|
||||
2. **Verify dates from commits/PRs** - More reliable than articles
|
||||
3. **Triangulate claims** - 2+ independent sources
|
||||
4. **Note conflicting info** - Don't hide contradictions
|
||||
5. **Distinguish fact vs opinion** - Label speculation clearly
|
||||
6. **CRITICAL: Always include inline citations** - Use `[citation:Title](URL)` format immediately after each claim from external sources
|
||||
7. **Extract URLs from search results** - web_search returns {title, url, snippet} - always use the URL field
|
||||
8. **Update as you go** - Don't wait until end to synthesize
|
||||
|
||||
### Citation Examples
|
||||
|
||||
**Good - With inline citations:**
|
||||
```markdown
|
||||
The project gained 10,000 stars within 3 months of launch [citation:GitHub Stats](https://github.com/owner/repo).
|
||||
The architecture uses LangGraph for workflow orchestration [citation:LangGraph Docs](https://langchain.com/langgraph).
|
||||
```
|
||||
|
||||
**Bad - Without citations:**
|
||||
```markdown
|
||||
The project gained 10,000 stars within 3 months of launch.
|
||||
The architecture uses LangGraph for workflow orchestration.
|
||||
```
|
||||
Reference in New Issue
Block a user