Vendored deer-flow upstream (bytedance/deer-flow) plus prompt-injection hardening: - New deerflow.security package: content_delimiter, html_cleaner, sanitizer (8 layers — invisible chars, control chars, symbols, NFC, PUA, tag chars, horizontal whitespace collapse with newline/tab preservation, length cap) - New deerflow.community.searx package: web_search, web_fetch, image_search backed by a private SearX instance, every external string sanitized and wrapped in <<<EXTERNAL_UNTRUSTED_CONTENT>>> delimiters - All native community web providers (ddg_search, tavily, exa, firecrawl, jina_ai, infoquest, image_search) replaced with hard-fail stubs that raise NativeWebToolDisabledError at import time, so a misconfigured tool.use path fails loud rather than silently falling back to unsanitized output - Native client back-doors (jina_client.py, infoquest_client.py) stubbed too - Native-tool tests quarantined under tests/_disabled_native/ (collect_ignore_glob via local conftest.py) - Sanitizer Layer 7 fix: only collapse horizontal whitespace, preserve newlines and tabs so list/table structure survives - Hardened runtime config.yaml references only the searx-backed tools - Factory overlay (backend/) kept in sync with deer-flow tree as a reference / source See HARDENING.md for the full audit trail and verification steps.
167 lines
4.9 KiB
Markdown
167 lines
4.9 KiB
Markdown
---
|
||
name: github-deep-research
|
||
description: Conduct multi-round deep research on any GitHub Repo. Use when users request comprehensive analysis, timeline reconstruction, competitive analysis, or in-depth investigation of GitHub. Produces structured markdown reports with executive summaries, chronological timelines, metrics analysis, and Mermaid diagrams. Triggers on Github repository URL or open source projects.
|
||
---
|
||
|
||
# GitHub Deep Research Skill
|
||
|
||
Multi-round research combining GitHub API, web_search, web_fetch to produce comprehensive markdown reports.
|
||
|
||
## Research Workflow
|
||
|
||
- Round 1: GitHub API
|
||
- Round 2: Discovery
|
||
- Round 3: Deep Investigation
|
||
- Round 4: Deep Dive
|
||
|
||
## Core Methodology
|
||
|
||
### Query Strategy
|
||
|
||
**Broad to Narrow**: Start with GitHub API, then general queries, refine based on findings.
|
||
|
||
```
|
||
Round 1: GitHub API
|
||
Round 2: "{topic} overview"
|
||
Round 3: "{topic} architecture", "{topic} vs alternatives"
|
||
Round 4: "{topic} issues", "{topic} roadmap", "site:github.com {topic}"
|
||
```
|
||
|
||
**Source Prioritization**:
|
||
1. Official docs/repos (highest weight)
|
||
2. Technical blogs (Medium, Dev.to)
|
||
3. News articles (verified outlets)
|
||
4. Community discussions (Reddit, HN)
|
||
5. Social media (lowest weight, for sentiment)
|
||
|
||
### Research Rounds
|
||
|
||
**Round 1 - GitHub API**
|
||
Directly execute `scripts/github_api.py` without `read_file()`:
|
||
```bash
|
||
python /path/to/skill/scripts/github_api.py <owner> <repo> summary
|
||
python /path/to/skill/scripts/github_api.py <owner> <repo> readme
|
||
python /path/to/skill/scripts/github_api.py <owner> <repo> tree
|
||
```
|
||
|
||
**Available commands (the last argument of `github_api.py`):**
|
||
- summary
|
||
- info
|
||
- readme
|
||
- tree
|
||
- languages
|
||
- contributors
|
||
- commits
|
||
- issues
|
||
- prs
|
||
- releases
|
||
|
||
**Round 2 - Discovery (3-5 web_search)**
|
||
- Get overview and identify key terms
|
||
- Find official website/repo
|
||
- Identify main players/competitors
|
||
|
||
**Round 3 - Deep Investigation (5-10 web_search + web_fetch)**
|
||
- Technical architecture details
|
||
- Timeline of key events
|
||
- Community sentiment
|
||
- Use web_fetch on valuable URLs for full content
|
||
|
||
**Round 4 - Deep Dive**
|
||
- Analyze commit history for timeline
|
||
- Review issues/PRs for feature evolution
|
||
- Check contributor activity
|
||
|
||
## Report Structure
|
||
|
||
Follow template in `assets/report_template.md`:
|
||
|
||
1. **Metadata Block** - Date, confidence level, subject
|
||
2. **Executive Summary** - 2-3 sentence overview with key metrics
|
||
3. **Chronological Timeline** - Phased breakdown with dates
|
||
4. **Key Analysis Sections** - Topic-specific deep dives
|
||
5. **Metrics & Comparisons** - Tables, growth charts
|
||
6. **Strengths & Weaknesses** - Balanced assessment
|
||
7. **Sources** - Categorized references
|
||
8. **Confidence Assessment** - Claims by confidence level
|
||
9. **Methodology** - Research approach used
|
||
|
||
### Mermaid Diagrams
|
||
|
||
Include diagrams where helpful:
|
||
|
||
**Timeline (Gantt)**:
|
||
```mermaid
|
||
gantt
|
||
title Project Timeline
|
||
dateFormat YYYY-MM-DD
|
||
section Phase 1
|
||
Development :2025-01-01, 2025-03-01
|
||
section Phase 2
|
||
Launch :2025-03-01, 2025-04-01
|
||
```
|
||
|
||
**Architecture (Flowchart)**:
|
||
```mermaid
|
||
flowchart TD
|
||
A[User] --> B[Coordinator]
|
||
B --> C[Planner]
|
||
C --> D[Research Team]
|
||
D --> E[Reporter]
|
||
```
|
||
|
||
**Comparison (Pie/Bar)**:
|
||
```mermaid
|
||
pie title Market Share
|
||
"Project A" : 45
|
||
"Project B" : 30
|
||
"Others" : 25
|
||
```
|
||
|
||
## Confidence Scoring
|
||
|
||
Assign confidence based on source quality:
|
||
|
||
| Confidence | Criteria |
|
||
|------------|----------|
|
||
| High (90%+) | Official docs, GitHub data, multiple corroborating sources |
|
||
| Medium (70-89%) | Single reliable source, recent articles |
|
||
| Low (50-69%) | Social media, unverified claims, outdated info |
|
||
|
||
## Output
|
||
|
||
Save report as: `research_{topic}_{YYYYMMDD}.md`
|
||
|
||
### Formatting Rules
|
||
|
||
- Chinese content: Use full-width punctuation(,。:;!?)
|
||
- Technical terms: Provide Wiki/doc URL on first mention
|
||
- Tables: Use for metrics, comparisons
|
||
- Code blocks: For technical examples
|
||
- Mermaid: For architecture, timelines, flows
|
||
|
||
## Best Practices
|
||
|
||
1. **Start with official sources** - Repo, docs, company blog
|
||
2. **Verify dates from commits/PRs** - More reliable than articles
|
||
3. **Triangulate claims** - 2+ independent sources
|
||
4. **Note conflicting info** - Don't hide contradictions
|
||
5. **Distinguish fact vs opinion** - Label speculation clearly
|
||
6. **CRITICAL: Always include inline citations** - Use `[citation:Title](URL)` format immediately after each claim from external sources
|
||
7. **Extract URLs from search results** - web_search returns {title, url, snippet} - always use the URL field
|
||
8. **Update as you go** - Don't wait until end to synthesize
|
||
|
||
### Citation Examples
|
||
|
||
**Good - With inline citations:**
|
||
```markdown
|
||
The project gained 10,000 stars within 3 months of launch [citation:GitHub Stats](https://github.com/owner/repo).
|
||
The architecture uses LangGraph for workflow orchestration [citation:LangGraph Docs](https://langchain.com/langgraph).
|
||
```
|
||
|
||
**Bad - Without citations:**
|
||
```markdown
|
||
The project gained 10,000 stars within 3 months of launch.
|
||
The architecture uses LangGraph for workflow orchestration.
|
||
```
|