Initial commit: hardened DeerFlow factory

Vendored deer-flow upstream (bytedance/deer-flow) plus prompt-injection
hardening:

- New deerflow.security package: content_delimiter, html_cleaner,
  sanitizer (8 layers — invisible chars, control chars, symbols, NFC,
  PUA, tag chars, horizontal whitespace collapse with newline/tab
  preservation, length cap)
- New deerflow.community.searx package: web_search, web_fetch,
  image_search backed by a private SearX instance, every external
  string sanitized and wrapped in <<<EXTERNAL_UNTRUSTED_CONTENT>>>
  delimiters
- All native community web providers (ddg_search, tavily, exa,
  firecrawl, jina_ai, infoquest, image_search) replaced with hard-fail
  stubs that raise NativeWebToolDisabledError at import time, so a
  misconfigured tool.use path fails loud rather than silently falling
  back to unsanitized output
- Native client back-doors (jina_client.py, infoquest_client.py)
  stubbed too
- Native-tool tests quarantined under tests/_disabled_native/
  (collect_ignore_glob via local conftest.py)
- Sanitizer Layer 7 fix: only collapse horizontal whitespace, preserve
  newlines and tabs so list/table structure survives
- Hardened runtime config.yaml references only the searx-backed tools
- Factory overlay (backend/) kept in sync with deer-flow tree as a
  reference / source

See HARDENING.md for the full audit trail and verification steps.
This commit is contained in:
2026-04-12 14:23:57 +02:00
commit 6de0bf9f5b
889 changed files with 173052 additions and 0 deletions

View File

@@ -0,0 +1,235 @@
---
name: systematic-literature-review
description: Use this skill when the user wants a systematic literature review, survey, or synthesis across multiple academic papers on a topic. Also covers annotated bibliographies and cross-paper comparisons. Searches arXiv and outputs reports in APA, IEEE, or BibTeX format. Not for single-paper tasks — use academic-paper-review for reviewing one paper.
---
# Systematic Literature Review Skill
## Overview
This skill produces a structured **systematic literature review (SLR)** across multiple academic papers on a research topic. Given a topic query, it searches arXiv, extracts structured metadata (research question, methodology, key findings, limitations) from each paper in parallel, synthesizes themes across the full set, and emits a final report with consistent citations.
**Distinct from `academic-paper-review`:** that skill does deep peer review of a single paper. This skill does breadth-first synthesis across many papers. If the user hands you one paper URL and asks "review this paper", route to `academic-paper-review` instead.
## When to Use This Skill
Use this skill when the user wants any of the following:
- A literature survey on a topic ("survey transformer attention variants", "review the literature on diffusion models")
- A synthesis across multiple papers ("what do recent papers say about X", "compare methodologies across papers on Y")
- A systematic review with consistent citation format ("do an SLR on Z in APA format")
- An annotated bibliography on a topic
- An overview of research trends in a field over a time window
Do **not** use this skill when:
- The user provides exactly one paper and asks to review it (use `academic-paper-review`)
- The user asks a factual question that does not require synthesizing multiple sources (answer directly)
- The user wants general web research without academic rigor (use standard web search)
## Workflow
The workflow has five phases. Follow them in order.
### Phase 1: Plan
Before doing any retrieval, confirm the following with the user. If any of these are unclear, ask **one** clarifying question that covers the missing pieces. Do not ask one question at a time.
- **Topic**: the research area in plain English (e.g. "transformer attention variants").
- **Scope**: how many papers (default 20, hard upper bound 50), optional time window (e.g. "last 2 years"), optional arXiv category (e.g. `cs.CL`, `cs.CV`).
- **Citation format**: APA, IEEE, or BibTeX (default APA if the user does not specify and does not seem to be writing for a specific venue).
- **Output location**: where to save the final report (default `/mnt/user-data/outputs/`).
If the user says "50+ papers", politely cap it at 50 and explain that synthesis quality degrades quickly past that — for larger surveys they should split by sub-topic.
### Phase 2: Search arXiv
Call the bundled search script. Do **not** try to scrape arXiv by other means and do **not** write your own HTTP client — this script handles URL encoding, Atom XML parsing, and id normalization correctly.
```bash
python /mnt/skills/public/systematic-literature-review/scripts/arxiv_search.py \
"<topic>" \
--max-results <N> \
[--category <cat>] \
[--sort-by relevance] \
[--start-date YYYY-MM-DD] \
[--end-date YYYY-MM-DD]
```
**IMPORTANT — extract 2-3 core keywords before searching.** Do not pass the user's full topic description as the query. Before calling the script, mentally reduce the topic to its 2-3 most essential terms. Drop qualifiers like "in computer vision", "for NLP", "variants", "recent" — those belong in `--category` or `--start-date`, not in the query string.
**Query phrasing — keep it short.** The script wraps multi-word queries in double quotes for phrase matching on arXiv. This means:
- `"diffusion models"` → searches for the exact phrase → good, returns relevant papers.
- `"diffusion models in computer vision"` → searches for that exact 5-word phrase → **too specific, likely returns 0 results** because few papers contain that exact string.
Use **2-3 core keywords** as the query, and use `--category` to narrow the field instead of stuffing field names into the query. Examples:
| User says | Good query | Bad query |
|---|---|---|
| "diffusion models in computer vision" | `"diffusion models" --category cs.CV` | `"diffusion models in computer vision"` |
| "transformer attention variants" | `"transformer attention"` | `"transformer attention variants in NLP"` |
| "graph neural networks for molecules" | `"graph neural networks" --category cs.LG` | `"graph neural networks for molecular property prediction"` |
The script prints a JSON array to stdout. Each paper has: `id`, `title`, `authors`, `abstract`, `published`, `updated`, `categories`, `pdf_url`, `abs_url`.
**Sort strategy**:
- **Always use `relevance` sorting** — arXiv's BM25-style scoring ensures results are actually about the user's topic. `submittedDate` sorting returns the most recently submitted papers in the category regardless of topic relevance, which produces mostly off-topic results.
- When the user asks for "recent" papers or gives a time window, use `--sort-by relevance` **combined with `--start-date`** to constrain the time range while keeping results on-topic. For example, "recent diffusion model papers" → `--sort-by relevance --start-date 2024-01-01`, not `--sort-by submittedDate`.
- `submittedDate` sorting is only appropriate when the user explicitly asks for chronological order (e.g. "show me papers in the order they were published"). This is rare.
- `lastUpdatedDate` is rarely useful; ignore it unless the user asks.
**Run the search exactly once.** Do not retry with modified queries if the results seem imperfect — arXiv's relevance ranking is what it is. Retrying with different query phrasings wastes tool calls and risks hitting the recursion limit. If the results are genuinely empty (0 papers), tell the user and suggest they broaden their topic or remove the category filter.
**If the script returns fewer papers than requested**, that is the real size of the arXiv result set for the query. Do not pad the list — report the actual count to the user and proceed.
**If the script fails** (network error, non-200 from arXiv), tell the user which error and stop. Do not try to fabricate paper metadata.
**Do not save the search results to a file** — the JSON stays in your context for Phase 3. The only file saved during the entire workflow is the final report in Phase 5.
### Phase 3: Extract metadata in parallel
**You MUST delegate extraction to subagents via the `task` tool — do not extract metadata yourself.** This is non-negotiable. Specifically, do NOT do any of the following:
- ❌ Write `python -c "papers = [...]"` or any Python/bash script to process papers
- ❌ Extract metadata inline in your own context by reading abstracts one by one
- ❌ Use any tool other than `task` for this phase
Instead, you MUST call the `task` tool to spawn subagents. The reason: extracting 10-50 papers in your own context consumes too many tokens and degrades synthesis quality in Phase 4. Each subagent runs in an isolated context with only its batch of papers, producing cleaner extractions.
Split papers into batches of ~5, then for each batch, call the `task` tool with `subagent_type: "general-purpose"`. Each subagent receives the paper abstracts as text and returns structured JSON.
**Concurrency limit: at most 3 subagents per turn.** The DeerFlow runtime enforces `MAX_CONCURRENT_SUBAGENTS = 3` and will silently drop any extra dispatches in the same turn — the LLM will not be told this happened, so strictly follow the round strategy below.
**Round strategy — use this decision table, do not compute the split yourself**:
| Paper count | Batches of ~5 papers | Rounds | Per-round subagent count |
|---|---|---|---|
| 15 | 1 batch | 1 round | 1 subagent |
| 610 | 2 batches | 1 round | 2 subagents |
| 1115 | 3 batches | 1 round | 3 subagents |
| 1620 | 4 batches | 2 rounds | 3 + 1 |
| 2125 | 5 batches | 2 rounds | 3 + 2 |
| 2630 | 6 batches | 2 rounds | 3 + 3 |
| 3135 | 7 batches | 3 rounds | 3 + 3 + 1 |
| 3640 | 8 batches | 3 rounds | 3 + 3 + 2 |
| 4145 | 9 batches | 3 rounds | 3 + 3 + 3 |
| 4650 | 10 batches | 4 rounds | 3 + 3 + 3 + 1 |
**Never dispatch more than 3 subagents in the same turn.** When a row says "2 rounds (3 + 1)", that means: first turn dispatches 3 subagents in parallel, wait for all 3 to complete, then second turn dispatches 1 subagent. Rounds are strictly sequential at the main-agent level.
If the paper count lands between rows (e.g. 23 papers), round up to the next row's layout but only dispatch as many batches as you actually need — the decision table gives you the shape, not a rigid prescription.
**Do the batching at the main-agent level**: you already have every paper's abstract from Phase 2, so each subagent receives pure text input. Subagents should not need to access the network or the sandbox — their only job is to read text and return JSON. Do not ask subagents to re-run `arxiv_search.py`; that would waste tokens and risk rate-limiting.
**What each subagent receives**, as a structured prompt:
```
Execute this task: extract structured metadata and key findings from the
following arXiv papers.
Papers:
[Paper 1]
arxiv_id: 1706.03762
title: Attention Is All You Need
authors: Ashish Vaswani, Noam Shazeer, ...
published: 2017-06-12
abstract: <full abstract text>
[Paper 2]
arxiv_id: ...
...
For each paper, return a JSON object with these fields:
- arxiv_id (string)
- title (string)
- authors (list of strings)
- published_date (string, YYYY-MM-DD)
- research_question (1 sentence, what problem the paper tackles)
- methodology (1-2 sentences, how they tackle it)
- key_findings (3-5 bullet points, what they actually found)
- limitations (1-2 sentences, what they acknowledge or what is obviously missing)
Return the result as a JSON array, one object per paper, in the same
order as the input. Do not include any text outside the JSON — no
preamble, no markdown fences, just the array.
```
**Parsing subagent results**: the task tool returns strings with a fixed prefix like `Task Succeeded. Result: [...JSON...]`. Strip the `Task Succeeded. Result: ` prefix (or `Task failed.` / `Task timed out.` prefixes) before trying to parse JSON. If a batch fails or returns unparseable JSON, log it, note which papers were affected, and continue with the remaining batches — do not fail the whole synthesis on one bad batch.
After all rounds complete, flatten the per-batch arrays into a single list of paper metadata objects, preserving order.
### Phase 4: Synthesize and format
Now produce the final SLR report. Two things happen here: cross-paper synthesis (thematic analysis) and citation formatting.
**Cross-paper synthesis**: the report must do more than list papers. At minimum, identify:
- **Themes**: 3-6 recurring research directions, approaches, or problem framings across the set.
- **Convergences**: findings that multiple papers agree on.
- **Disagreements**: where papers reach different conclusions or use incompatible methodologies.
- **Gaps**: what the collective literature does not yet address (often stated explicitly in the "limitations" fields).
If the paper set is too small or too heterogeneous to support thematic synthesis (e.g. 5 papers on wildly different sub-topics), say so explicitly in the report — do not force themes that are not there.
**Citation formatting**: the exact format depends on user preference. Read **only** the template file that matches the user's requested format, not all three:
- [templates/apa.md](templates/apa.md) — APA 7th edition. Default for social sciences and most CS journals. Use when the user requests APA or does not specify a format.
- [templates/ieee.md](templates/ieee.md) — IEEE numeric citations. Use when the user targets an IEEE conference or journal, or explicitly asks for IEEE.
- [templates/bibtex.md](templates/bibtex.md) — BibTeX entries. Use when the user mentions BibTeX, LaTeX, or wants machine-readable references. **Important**: arXiv papers are cited as `@misc`, not `@article` — the BibTeX template covers this explicitly.
Each template contains both the citation rules and a full report structure (executive summary, themes, per-paper annotations, references, methodology section). Follow the template's structure verbatim for the report body, then fill in content from your Phase 3 metadata.
### Phase 5: Save and present
Save the full report to `/mnt/user-data/outputs/slr-<topic-slug>-<YYYYMMDD>.md` where `<topic-slug>` is a lowercased hyphenated version of the topic (e.g. `transformer-attention`). Then call the `present_files` tool with that path so the user can download it.
**In the chat message**, show a short preview so the user immediately sees value without opening the file:
1. **Executive summary** — the 35 sentence paragraph from the top of the report, verbatim.
2. **Themes list** — bullet list of the themes you identified in Phase 4 synthesis (just the theme names + one-line gloss, not the full theme sections).
3. **Paper count + a pointer to the file** — e.g. "Full report with 20 papers, per-paper annotations, and formatted references saved to `slr-transformer-attention-20260409.md`."
Do **not** dump the full 2000+ word report inline — per-paper annotations, references, and methodology belong in the file. The preview is there to let the user judge the report at a glance and decide whether to open it.
## Examples
**Example 1: Typical SLR request**
User: "Do a systematic literature review of recent transformer attention variants, 20 papers, APA format."
Your flow:
1. Phase 1: confirm topic (transformer attention variants), scope (20 papers, default time window), format (APA). Ask **one** clarification only if something is missing (e.g. "Any particular time window, or should I default to the last 3 years?").
2. Phase 2: `arxiv_search.py "transformer attention" --max-results 20 --sort-by relevance --start-date 2023-01-01`.
3. Phase 3: 20 papers → round 1 = 3 subagents × 5 papers = 15 covered, round 2 = 1 subagent × 5 papers = 5 covered. Aggregate.
4. Phase 4: read `templates/apa.md`, write the report using its structure, fill in themes + per-paper annotations from Phase 3 metadata.
5. Phase 5: save to `slr-transformer-attention-20260409.md`, call `present_files`.
**Example 2: Small-set request with ambiguity**
User: "Survey a few papers on diffusion models for me."
Your flow:
1. Phase 1: "a few" is ambiguous. Ask one question: "How many papers would you like — 10, 20, or 30? And any citation format preference (APA is the default)?"
2. User responds "10, BibTeX".
3. Phase 2: `arxiv_search.py "diffusion models" --max-results 10 --category cs.CV`.
4. Phase 3: 10 papers → single round, 2 subagents × 5 papers.
5. Phase 4: read `templates/bibtex.md`, format with `@misc` entries (not `@article`).
6. Phase 5: save and present.
**Example 3: Out-of-scope request**
User: "Here's one paper (https://arxiv.org/abs/1706.03762). Can you review it?"
This is a single-paper peer review, not a literature survey. Do not use this skill. Route to `academic-paper-review` instead.
## Notes
- **Prerequisite: `subagent_enabled` must be `true`**. Phase 3 requires the `task` tool for parallel metadata extraction. This tool is only loaded when `subagent_enabled` is set to `true` in the runtime config (`config.configurable.subagent_enabled`). Without it, the `task` tool will not appear in the available tools and Phase 3 cannot execute as designed.
- **arXiv only, by design**. This skill does not query Semantic Scholar, PubMed, or Google Scholar. arXiv covers the bulk of CS/ML/physics/math preprints, which is what DeerFlow users most often want to survey. Multi-source academic search belongs in a dedicated MCP server, not inside this skill.
- **Hard upper bound of 50 papers**. This is tied to the Phase 3 concurrency strategy (max 3 subagents per round, ~5 papers each, at most ~3 rounds). Surveys larger than 50 papers degrade in synthesis quality and are better done by splitting into sub-topics.
- **Phase 3 requires subagents to be enabled**. This skill's parallel extraction step hard-requires the `task` tool, which is only available when `subagent_enabled=true` at runtime. If subagents are unavailable, do not claim to execute the Phase 3 parallel plan; instead, tell the user that subagents must be enabled for the full workflow, or offer to narrow/split the request into a smaller manual review.
- **Subagent results are strings, not objects**. Always strip the `Task Succeeded. Result: ` / `Task failed.` / `Task timed out.` prefixes before parsing the JSON payload.
- **The `id` field is a bare arXiv id** (e.g. `1706.03762`), not a URL and not with a version suffix. `abs_url` / `pdf_url` hold the full URLs if you need them.
- **Synthesis, not listing**. The final report must identify themes and compare findings across papers. A report that only lists papers one after another is a failure mode — if you cannot find themes, say so explicitly instead of faking them.

View File

@@ -0,0 +1,79 @@
{
"skill_name": "systematic-literature-review",
"evals": [
{
"id": 1,
"prompt": "Do a systematic literature review on diffusion models in computer vision. 10 papers, last 2 years, category cs.CV, APA format. Save to default output location.",
"expected_output": "A structured SLR report saved to /mnt/user-data/outputs/ with APA citations, thematic synthesis across 10 papers, and per-paper annotations.",
"expectations": [
"The skill read SKILL.md for systematic-literature-review",
"The arxiv_search.py script was called with a short keyword query (2-3 words), not the full topic description",
"The search used --category cs.CV",
"The search used --sort-by relevance, not submittedDate",
"The search was executed only once without retries",
"Metadata extraction was delegated via the task tool to subagents, not done inline or via python -c",
"The APA template file (templates/apa.md) was read",
"The final report was saved to /mnt/user-data/outputs/ with a filename matching slr-<topic-slug>-<YYYYMMDD>.md",
"The present_files tool was called to make the report visible to the user",
"The report contains an Executive Summary section",
"The report identifies at least 3 themes with cross-paper analysis",
"The report contains a Convergences and Disagreements section",
"The report contains a Gaps and Open Questions section",
"The report contains per-paper annotations for each of the 10 papers",
"The references section uses APA 7th format with arXiv URLs"
]
},
{
"id": 2,
"prompt": "Survey recent papers on graph neural networks for drug discovery. 5 papers, BibTeX format.",
"expected_output": "A structured SLR report with BibTeX citations using @misc entries for arXiv preprints.",
"expectations": [
"The skill read SKILL.md for systematic-literature-review",
"The arxiv_search.py script was called with a short keyword query",
"Metadata extraction was delegated via the task tool to subagents",
"The BibTeX template file (templates/bibtex.md) was read, not apa.md or ieee.md",
"The final report was saved to /mnt/user-data/outputs/",
"The present_files tool was called",
"The report contains BibTeX entries using @misc, not @article",
"Each BibTeX entry includes eprint and primaryClass fields",
"The report contains thematic synthesis, not just a list of papers"
]
},
{
"id": 3,
"prompt": "Review the literature on retrieval-augmented generation — key findings, limitations, and open questions. 15 papers, IEEE format.",
"expected_output": "A structured SLR report with IEEE numeric citations and 15 papers extracted in parallel batches.",
"expectations": [
"The skill read SKILL.md for systematic-literature-review",
"The arxiv_search.py script was called with --max-results 15 or higher",
"Metadata extraction used the task tool with multiple subagent batches (15 papers requires 3 batches of 5)",
"The IEEE template file (templates/ieee.md) was read",
"The report uses IEEE numeric citations [1], [2], etc. in the text",
"The references section uses IEEE format with numbered entries",
"The report contains per-paper annotations for all papers",
"The report identifies themes across the papers"
]
},
{
"id": 4,
"prompt": "Review this paper: https://arxiv.org/abs/2310.06825",
"expected_output": "The SLR skill should NOT be triggered. The request should route to academic-paper-review instead.",
"expectations": [
"The systematic-literature-review skill was NOT triggered",
"The agent did not call arxiv_search.py",
"The agent recognized this as a single-paper review request"
]
},
{
"id": 5,
"prompt": "What does the literature say about RLHF?",
"expected_output": "The SLR skill should be triggered despite no explicit 'systematic' or 'survey' keyword, because 'the literature' implies multi-paper synthesis.",
"expectations": [
"The skill read SKILL.md for systematic-literature-review",
"The arxiv_search.py script was called",
"The agent asked a clarification question about scope (paper count, format) or used reasonable defaults",
"The final output is a multi-paper synthesis, not a single factual answer"
]
}
]
}

View File

@@ -0,0 +1,102 @@
[
{
"query": "Survey transformer attention variants published in the last 2 years on arXiv cs.CL",
"should_trigger": true,
"rationale": "Explicit survey request with scope and category"
},
{
"query": "What methods do recent papers use for few-shot learning in vision-and-language? Give me 15 papers in BibTeX.",
"should_trigger": true,
"rationale": "Multi-paper synthesis with count and format spec"
},
{
"query": "Review the literature on retrieval-augmented generation — key findings, limitations, and open questions",
"should_trigger": true,
"rationale": "Classic SLR phrasing with explicit synthesis structure"
},
{
"query": "Compare evaluation frameworks used across LLM hallucination detection papers",
"should_trigger": true,
"rationale": "Cross-paper comparison implies multi-paper synthesis"
},
{
"query": "Summarize recent work on Monte Carlo methods for mortgage risk — last 3 years",
"should_trigger": true,
"rationale": "Domain-specific SLR with time window"
},
{
"query": "Annotated bibliography on agentic tool use, 20 papers, IEEE format",
"should_trigger": true,
"rationale": "Annotated bibliography is an SLR variant"
},
{
"query": "What does the literature say about RLHF?",
"should_trigger": true,
"rationale": "No 'systematic' keyword but 'the literature' clearly implies multi-paper synthesis"
},
{
"query": "Give me an overview of diffusion model papers since 2022",
"should_trigger": true,
"rationale": "Time range + 'papers' implies breadth-first survey"
},
{
"query": "Are there papers comparing RAG and fine-tuning?",
"should_trigger": true,
"rationale": "Comparison query across papers implies synthesis"
},
{
"query": "Do a systematic literature review on graph neural networks for drug discovery, APA format",
"should_trigger": true,
"rationale": "Explicit SLR request with format"
},
{
"query": "Review this paper: https://arxiv.org/abs/2310.06825",
"should_trigger": false,
"rationale": "Single paper URL -> should route to academic-paper-review"
},
{
"query": "What is attention in transformers?",
"should_trigger": false,
"rationale": "Factual question, no multi-paper synthesis needed"
},
{
"query": "Search for news about AI regulation",
"should_trigger": false,
"rationale": "General web search, not academic literature review"
},
{
"query": "Summarize this PDF [attached]",
"should_trigger": false,
"rationale": "Single document summary, not literature review"
},
{
"query": "Write me a Python function to parse BibTeX files",
"should_trigger": false,
"rationale": "Coding task, not research"
},
{
"query": "What is the capital of France?",
"should_trigger": false,
"rationale": "Factual question, no research needed"
},
{
"query": "Help me debug this error in my React app",
"should_trigger": false,
"rationale": "Debugging task, not literature review"
},
{
"query": "Translate this paragraph to Chinese",
"should_trigger": false,
"rationale": "Translation task"
},
{
"query": "Explain the difference between CNN and RNN",
"should_trigger": false,
"rationale": "Conceptual explanation, not multi-paper synthesis"
},
{
"query": "Find me the best paper on reinforcement learning",
"should_trigger": false,
"rationale": "Singular 'best paper' implies one result, not a survey across many"
}
]

View File

@@ -0,0 +1,306 @@
#!/usr/bin/env python3
"""arXiv search client for the systematic-literature-review skill.
Queries the public arXiv API (http://export.arxiv.org/api/query) and
returns structured paper metadata as JSON. No API key required.
Design notes:
- No additional dependencies required. Uses `requests` when available,
falls back to `urllib` with a requests-compatible shim (same pattern as
../../github-deep-research/scripts/github_api.py).
- Query parameters are URL-encoded via `urllib.parse.urlencode` with
`quote_via=quote_plus`. Hand-rolled `k=v` joining would break on
multi-word topics like "transformer attention".
- Atom XML is parsed with `xml.etree.ElementTree` using an explicit
namespace map. Forgetting the namespace prefix is the #1 arXiv API
parsing bug, so we bake it into NS_MAP.
- The `<id>` field in arXiv responses is a full URL like
"http://arxiv.org/abs/1706.03762v5". Callers usually want the bare
id "1706.03762", so we normalise it.
- max_results is clamped to 50 to match the skill's documented upper
bound. Larger surveys are out of scope for the MVP.
"""
from __future__ import annotations
import argparse
import json
import sys
from typing import Any
# Namespace map for arXiv's Atom feed. arXiv extends Atom with its own
# elements (primary_category, comment, journal_ref) under the `arxiv:`
# prefix; the core entry fields live under `atom:`.
NS_MAP = {
"atom": "http://www.w3.org/2005/Atom",
"arxiv": "http://arxiv.org/schemas/atom",
}
ARXIV_ENDPOINT = "http://export.arxiv.org/api/query"
MAX_RESULTS_UPPER_BOUND = 50
DEFAULT_TIMEOUT_SECONDS = 30
# --- HTTP client with requests -> urllib fallback --------------------------
try:
import requests # type: ignore
except ImportError:
import urllib.error
import urllib.parse
import urllib.request
class _UrllibResponse:
def __init__(self, data: bytes, status: int) -> None:
self._data = data
self.status_code = status
self.text = data.decode("utf-8", errors="replace")
self.content = data
def raise_for_status(self) -> None:
if self.status_code >= 400:
raise RuntimeError(f"HTTP {self.status_code}")
class _UrllibRequestsShim:
"""Minimal requests-compatible shim using urllib.
Only supports what arxiv_search needs: GET with query params.
Params are encoded with quote_plus so multi-word queries work.
"""
@staticmethod
def get(
url: str,
params: dict | None = None,
timeout: int = DEFAULT_TIMEOUT_SECONDS,
) -> _UrllibResponse:
if params:
query = urllib.parse.urlencode(params, quote_via=urllib.parse.quote_plus)
url = f"{url}?{query}"
req = urllib.request.Request(url, headers={"User-Agent": "deerflow-slr-skill/0.1"})
try:
with urllib.request.urlopen(req, timeout=timeout) as resp:
return _UrllibResponse(resp.read(), resp.status)
except urllib.error.HTTPError as e:
return _UrllibResponse(e.read(), e.code)
requests = _UrllibRequestsShim() # type: ignore
# --- Core query + parsing --------------------------------------------------
def _build_search_query(
query: str,
category: str | None,
start_date: str | None,
end_date: str | None,
) -> str:
"""Build arXiv's `search_query` field.
arXiv uses its own query grammar: `ti:`, `abs:`, `cat:`, `all:`, with
`AND`/`OR`/`ANDNOT` combinators. We search `all:` for the user's
topic (matches title + abstract + authors) and optionally AND it
with a category filter and a submission date range.
"""
# Wrap multi-word queries in double quotes so arXiv's Lucene parser
# treats them as a phrase. Without quotes, `all:diffusion model` is
# parsed as `all:diffusion OR model`, pulling in unrelated papers
# that merely mention the word "model".
if " " in query:
parts = [f'all:"{query}"']
else:
parts = [f"all:{query}"]
if category:
parts.append(f"cat:{category}")
if start_date or end_date:
# arXiv date range format: [YYYYMMDDHHMM TO YYYYMMDDHHMM]
lo = (start_date or "19910101").replace("-", "") + "0000"
hi = (end_date or "29991231").replace("-", "") + "2359"
parts.append(f"submittedDate:[{lo} TO {hi}]")
return " AND ".join(parts)
def _normalise_arxiv_id(raw_id: str) -> str:
"""Convert a full arXiv URL to a bare id.
Handles both modern and legacy arXiv ID formats:
- Modern: "http://arxiv.org/abs/1706.03762v5" -> "1706.03762"
- Legacy: "http://arxiv.org/abs/hep-th/9901001v1" -> "hep-th/9901001"
"""
# Extract everything after /abs/ to preserve legacy archive prefix
if "/abs/" in raw_id:
tail = raw_id.split("/abs/", 1)[1]
else:
tail = raw_id.rsplit("/", 1)[-1]
# Strip version suffix: "1706.03762v5" -> "1706.03762"
if "v" in tail:
base, _, suffix = tail.rpartition("v")
if suffix.isdigit():
return base
return tail
def _parse_entry(entry: Any) -> dict:
"""Turn one Atom <entry> element into a paper dict."""
import xml.etree.ElementTree as ET
def _text(path: str) -> str:
node = entry.find(path, NS_MAP)
return (node.text or "").strip() if node is not None and node.text else ""
raw_id = _text("atom:id")
arxiv_id = _normalise_arxiv_id(raw_id)
authors = [(a.findtext("atom:name", default="", namespaces=NS_MAP) or "").strip() for a in entry.findall("atom:author", NS_MAP)]
authors = [a for a in authors if a]
categories = [c.get("term", "") for c in entry.findall("atom:category", NS_MAP) if c.get("term")]
pdf_url = ""
abs_url = raw_id # default
for link in entry.findall("atom:link", NS_MAP):
if link.get("title") == "pdf":
pdf_url = link.get("href", "")
elif link.get("rel") == "alternate":
abs_url = link.get("href", abs_url)
# Dates come as ISO 8601 (2017-06-12T17:57:34Z). Keep the date part.
published_raw = _text("atom:published")
updated_raw = _text("atom:updated")
published = published_raw.split("T", 1)[0] if published_raw else ""
updated = updated_raw.split("T", 1)[0] if updated_raw else ""
# Abstract (<summary>) has ragged whitespace from arXiv's formatting.
# Collapse internal whitespace to make downstream LLM consumption easier.
abstract = " ".join(_text("atom:summary").split())
# Silence unused import warning; ET is only needed for type hints above.
del ET
return {
"id": arxiv_id,
"title": " ".join(_text("atom:title").split()),
"authors": authors,
"abstract": abstract,
"published": published,
"updated": updated,
"categories": categories,
"pdf_url": pdf_url,
"abs_url": abs_url,
}
def search(
query: str,
max_results: int = 20,
category: str | None = None,
sort_by: str = "relevance",
start_date: str | None = None,
end_date: str | None = None,
) -> list[dict]:
"""Query arXiv and return a list of paper dicts.
Args:
query: free-text topic, e.g. "transformer attention".
max_results: number of papers to return (clamped to 50).
category: optional arXiv category, e.g. "cs.CL".
sort_by: "relevance", "submittedDate", or "lastUpdatedDate".
start_date: YYYY-MM-DD or YYYYMMDD, inclusive.
end_date: YYYY-MM-DD or YYYYMMDD, inclusive.
Returns:
list of dicts, each matching the schema documented in SKILL.md.
"""
import xml.etree.ElementTree as ET
if max_results <= 0:
return []
max_results = min(max_results, MAX_RESULTS_UPPER_BOUND)
search_query = _build_search_query(query, category, start_date, end_date)
params = {
"search_query": search_query,
"start": 0,
"max_results": max_results,
"sortBy": sort_by,
"sortOrder": "descending",
}
resp = requests.get(ARXIV_ENDPOINT, params=params, timeout=DEFAULT_TIMEOUT_SECONDS)
resp.raise_for_status()
# arXiv returns Atom XML, not JSON.
root = ET.fromstring(resp.text)
entries = root.findall("atom:entry", NS_MAP)
return [_parse_entry(e) for e in entries]
# --- CLI -------------------------------------------------------------------
def _build_parser() -> argparse.ArgumentParser:
parser = argparse.ArgumentParser(
description="Query the arXiv API and emit structured paper metadata as JSON.",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog=(
"Examples:\n"
' python arxiv_search.py "transformer attention" --max-results 10\n'
' python arxiv_search.py "diffusion models" --category cs.CV --sort-by submittedDate\n'
' python arxiv_search.py "graph neural networks" --start-date 2023-01-01\n'
),
)
parser.add_argument("query", help="free-text search topic")
parser.add_argument(
"--max-results",
type=int,
default=20,
help=f"number of papers to return (default: 20, max: {MAX_RESULTS_UPPER_BOUND})",
)
parser.add_argument(
"--category",
default=None,
help="optional arXiv category filter, e.g. cs.CL, cs.CV, stat.ML",
)
parser.add_argument(
"--sort-by",
default="relevance",
choices=["relevance", "submittedDate", "lastUpdatedDate"],
help="sort order (default: relevance)",
)
parser.add_argument(
"--start-date",
default=None,
help="earliest submission date, YYYY-MM-DD (inclusive)",
)
parser.add_argument(
"--end-date",
default=None,
help="latest submission date, YYYY-MM-DD (inclusive)",
)
return parser
def main() -> int:
args = _build_parser().parse_args()
try:
papers = search(
query=args.query,
max_results=args.max_results,
category=args.category,
sort_by=args.sort_by,
start_date=args.start_date,
end_date=args.end_date,
)
except Exception as exc:
print(f"arxiv_search.py: {exc}", file=sys.stderr)
return 1
json.dump(papers, sys.stdout, ensure_ascii=False, indent=2)
sys.stdout.write("\n")
return 0
if __name__ == "__main__":
sys.exit(main())

View File

@@ -0,0 +1,125 @@
# APA 7th Edition Citation Template
Use this template when the user requests APA format, or when they do not specify a format. APA 7th is the default for social sciences and most CS journals outside of IEEE venues.
## Citation Format Rules
### In-text citations
- **Single author**: `(Vaswani, 2017)` or `Vaswani (2017) showed that...`
- **Two authors**: `(Vaswani & Shazeer, 2017)` — use `&` inside parentheses, "and" in running text.
- **Three or more authors**: `(Vaswani et al., 2017)` — use `et al.` from the first citation onward (APA 7th changed this from APA 6th).
- **Multiple citations**: `(Vaswani et al., 2017; Devlin et al., 2018)` — alphabetical order, separated by semicolons.
### Reference list entry for arXiv preprints
arXiv papers are preprints, not formally published articles. Cite them as preprints with the arXiv identifier:
```
Author, A. A., Author, B. B., & Author, C. C. (Year). Title of the paper. arXiv. https://arxiv.org/abs/ARXIV_ID
```
**Real example** (from paper metadata `{id: "1706.03762", title: "Attention Is All You Need", authors: ["Ashish Vaswani", "Noam Shazeer", "Niki Parmar", "Jakob Uszkoreit", "Llion Jones", "Aidan N. Gomez", "Łukasz Kaiser", "Illia Polosukhin"], published: "2017-06-12"}`):
```
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. arXiv. https://arxiv.org/abs/1706.03762
```
Formatting rules:
- **Author names**: `LastName, FirstInitial.` (middle initial optional). Join with commas; last author gets an `&`.
- **Year**: the `published` field's year, in parentheses.
- **Title**: sentence case (only first word and proper nouns capitalized). Italicize titles in typeset output; in plain markdown, leave plain.
- **Source**: the literal word `arXiv`, then the full abs URL.
- **No DOI** unless the paper has also been published in a venue with a DOI. arXiv alone uses the URL.
### Special cases
- **Up to 20 authors**: list all of them separated by commas, with `&` before the last.
- **21 or more authors**: list the first 19, then `...`, then the final author.
- **No DOI and no URL**: not possible for arXiv papers; always use the `abs_url` from the paper metadata.
## Report Structure
Follow this structure verbatim when writing the SLR report body. Fill in content from your Phase 3 extraction and Phase 4 synthesis.
```markdown
# Systematic Literature Review: <Topic>
**Date**: <YYYY-MM-DD>
**Papers surveyed**: <N>
**Scope**: <arXiv search query, category, time window>
**Citation format**: APA 7th edition
## Executive Summary
<3-5 sentences summarizing the state of the literature on this topic. What do the surveyed papers collectively tell us? What is the shape of the field? Avoid listing papers synthesize.>
## Methodology
This review surveyed <N> arXiv papers retrieved on <YYYY-MM-DD> using the query `<query>`<, filtered to category <cat>><, published between <start_date> and <end_date>>. Papers were sorted by <relevance | submission date> and the top <N> were included. Metadata extraction (research question, methodology, key findings, limitations) was performed by language-model agents, with cross-paper synthesis performed by the lead agent.
**Limitations of this review**: arXiv preprints are not peer-reviewed; some included papers may not reflect their final published form. Coverage is limited to arXiv — papers published directly in venues without arXiv preprints are not represented.
## Themes
<3-6 thematic sections. Each theme is a recurring research direction, problem framing, or methodological approach across the surveyed papers.>
### Theme 1: <Theme name>
<2-4 paragraphs describing this theme. Cite papers inline as you discuss them, e.g. "Vaswani et al. (2017) introduced X, while subsequent work (Devlin et al., 2018; Liu et al., 2019) extended it to Y." Do not just list papers describe the intellectual thread that connects them.>
### Theme 2: <Theme name>
<...>
## Convergences and Disagreements
**Convergences**: <findings that multiple papers agree on e.g. "Most surveyed papers agree that X is necessary, citing evidence from Y and Z.">
**Disagreements**: <where papers reach different conclusions e.g. "Vaswani et al. (2017) argue that X, while Dai et al. (2019) find the opposite under condition Y.">
## Gaps and Open Questions
<What the collective literature does not yet address. Pull from the "limitations" field of your Phase 3 extraction and identify patterns if 5 papers all mention the same missing piece, that is a gap worth flagging.>
## Per-Paper Annotations
<One subsection per paper, ordered by year then first author. Each subsection is a mini-summary of that paper's contribution.>
### Vaswani et al. (2017)
**Research question**: <1 sentence from Phase 3 metadata>
**Methodology**: <1-2 sentences>
**Key findings**:
- <bullet>
- <bullet>
- <bullet>
**Limitations**: <1-2 sentences>
### <Next paper>
<...>
## References
<Alphabetical list by first author's last name, APA 7th format as described above.>
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv. https://arxiv.org/abs/1810.04805
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. arXiv. https://arxiv.org/abs/1706.03762
<... more entries, one per paper ...>
```
## Quality checks before finalizing
Before saving the report, verify:
- [ ] Every paper in the surveyed set appears **both** in "Per-Paper Annotations" **and** in "References".
- [ ] Every in-text citation matches a reference entry (no dangling citations).
- [ ] Authors are formatted `LastName, FirstInitial.` — not `FirstName LastName`.
- [ ] Years are in parentheses inline, and at the start of reference entries.
- [ ] Titles are in sentence case in references (only first word + proper nouns capitalized).
- [ ] arXiv URLs use the `abs_url` form (`https://arxiv.org/abs/...`), not `pdf_url`.
- [ ] References are alphabetized by first author's last name.

View File

@@ -0,0 +1,156 @@
# BibTeX Citation Template
Use this template when the user mentions BibTeX, LaTeX, wants machine-readable references, or is writing a paper that will be typeset with a LaTeX citation style (natbib, biblatex, etc.).
## Critical: use `@misc`, not `@article`, for arXiv papers
**arXiv preprints must be cited as `@misc`, not `@article`.** This is the most common mistake when generating BibTeX for arXiv papers, and it matters:
- `@article` requires a `journal` field. arXiv is not a journal — it is a preprint server. Using `@article` with `journal = {arXiv}` is technically wrong and some bibliography styles will complain or render it inconsistently.
- `@misc` is the correct entry type for preprints, technical reports, and other non-journal publications. It accepts `howpublished` and `eprint` fields, which is exactly what arXiv citations need.
- Only switch to `@article` (or `@inproceedings`) when the paper has been **formally published** in a peer-reviewed venue and you have the venue metadata. In this workflow we only have arXiv metadata, so always emit `@misc`.
## Citation Format Rules
### Entry structure for arXiv preprints
```bibtex
@misc{citekey,
author = {LastName1, FirstName1 and LastName2, FirstName2 and ...},
title = {Title of the Paper},
year = {YYYY},
eprint = {ARXIV_ID},
archivePrefix = {arXiv},
primaryClass = {PRIMARY_CATEGORY},
url = {https://arxiv.org/abs/ARXIV_ID}
}
```
**Real example**:
```bibtex
@misc{vaswani2017attention,
author = {Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N. and Kaiser, {\L}ukasz and Polosukhin, Illia},
title = {Attention Is All You Need},
year = {2017},
eprint = {1706.03762},
archivePrefix = {arXiv},
primaryClass = {cs.CL},
url = {https://arxiv.org/abs/1706.03762}
}
```
### Field rules
- **Cite key**: `<firstauthorlast><year><firstwordoftitle>`, all lowercase, no punctuation. Example: `vaswani2017attention`. Keys must be unique within the report.
- **`author`**: `LastName, FirstName and LastName, FirstName and ...` — note the literal word `and` between authors, not a comma. LaTeX requires this exact separator. LastName comes first, then a comma, then the given names.
- **Special characters**: escape or wrap LaTeX-sensitive characters. For example, `Łukasz` becomes `{\L}ukasz`, `é` becomes `{\'e}` (or wrap the whole name in braces to preserve casing: `{Łukasz}`). If unsure, wrap the problematic name in curly braces.
- **`title`**: preserve the paper's capitalization by wrapping it in double braces if it contains acronyms or proper nouns you need to keep capitalized: `title = {{BERT}: Pre-training of Deep Bidirectional Transformers}`. Otherwise plain braces are fine.
- **`year`**: the 4-digit year from the paper's `published` field.
- **`eprint`**: the **bare arXiv id** (e.g. `1706.03762`), **without** the `arXiv:` prefix and **without** the version suffix.
- **`archivePrefix`**: literal string `{arXiv}`.
- **`primaryClass`**: the first category from the paper's `categories` list (e.g. `cs.CL`, `cs.CV`, `stat.ML`). This is the paper's primary subject area.
- **`url`**: the full `abs_url` from paper metadata.
## Report Structure
The BibTeX report is slightly different from APA / IEEE: the **bibliography is a separate `.bib` file**, and the main report uses LaTeX-style `\cite{key}` references that would resolve against that file. Since we are emitting markdown, we show `\cite{key}` verbatim in the prose and emit the BibTeX entries inside a fenced code block at the end.
```markdown
# Systematic Literature Review: <Topic>
**Date**: <YYYY-MM-DD>
**Papers surveyed**: <N>
**Scope**: <arXiv search query, category, time window>
**Citation format**: BibTeX
## Executive Summary
<3-5 sentences. Use \cite{key} form for citations, e.g. "Transformer architectures \cite{vaswani2017attention} have become the dominant approach.">
## Methodology
This review surveyed <N> arXiv papers retrieved on <YYYY-MM-DD> using the query `<query>`<, filtered to category <cat>><, published between <start_date> and <end_date>>. Metadata extraction was performed by language-model agents, with cross-paper synthesis performed by the lead agent. All citations in this report use BibTeX cite keys; the corresponding `.bib` entries are at the end of this document.
**Limitations of this review**: arXiv preprints are not peer-reviewed; coverage is limited to arXiv.
## Themes
### Theme 1: <Theme name>
<Paragraphs describing the theme. Cite with \cite{key} form: "The original transformer architecture \cite{vaswani2017attention} introduced self-attention, which was later extended in \cite{dai2019transformerxl}.">
### Theme 2: <Theme name>
<...>
## Convergences and Disagreements
**Convergences**: <e.g. "Multiple papers \cite{key1,key2,key3} agree that X is necessary.">
**Disagreements**: <...>
## Gaps and Open Questions
<...>
## Per-Paper Annotations
### \cite{vaswani2017attention} — "Attention Is All You Need" (2017)
**Research question**: <1 sentence>
**Methodology**: <1-2 sentences>
**Key findings**:
- <bullet>
- <bullet>
- <bullet>
**Limitations**: <1-2 sentences>
### \cite{devlin2018bert} — "BERT: Pre-training of Deep Bidirectional Transformers" (2018)
<...>
## BibTeX Bibliography
Save the entries below to a `.bib` file and reference them from your LaTeX document with `\bibliography{filename}`.
\`\`\`bibtex
@misc{vaswani2017attention,
author = {Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N. and Kaiser, {\L}ukasz and Polosukhin, Illia},
title = {Attention Is All You Need},
year = {2017},
eprint = {1706.03762},
archivePrefix = {arXiv},
primaryClass = {cs.CL},
url = {https://arxiv.org/abs/1706.03762}
}
@misc{devlin2018bert,
author = {Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina},
title = {{BERT}: Pre-training of Deep Bidirectional Transformers for Language Understanding},
year = {2018},
eprint = {1810.04805},
archivePrefix = {arXiv},
primaryClass = {cs.CL},
url = {https://arxiv.org/abs/1810.04805}
}
... more entries, one per paper ...
\`\`\`
```
(Note: in the actual saved report, use a real fenced code block `` ```bibtex `` — the backticks above are escaped only because this template file itself is inside a markdown code block when rendered.)
## Quality checks before finalizing
Before saving the report, verify:
- [ ] Every entry is `@misc`, not `@article` (this workflow only has arXiv metadata).
- [ ] Cite keys are unique within the report.
- [ ] Cite keys follow the `<firstauthorlast><year><firstword>` pattern, all lowercase.
- [ ] `author` field uses ` and ` (the literal word) between authors, not commas.
- [ ] LaTeX special characters in author names are escaped or brace-wrapped.
- [ ] `eprint` is the bare arXiv id (no `arXiv:` prefix, no version suffix).
- [ ] `primaryClass` is set from the paper's first category.
- [ ] Every `\cite{key}` in the text has a matching `@misc` entry in the bibliography.
- [ ] The bibliography is emitted inside a fenced ```` ```bibtex ```` code block so users can copy-paste directly into a `.bib` file.

View File

@@ -0,0 +1,127 @@
# IEEE Citation Template
Use this template when the user targets an IEEE conference or journal, or explicitly asks for IEEE format. IEEE uses **numeric citations** — references are numbered in the order they first appear in the text, and in-text citations use bracketed numbers.
## Citation Format Rules
### In-text citations
- **Single reference**: `[1]` — use the number assigned in the References section.
- **Multiple references**: `[1], [3], [5]` or `[1][3]` for consecutive ranges.
- **Citation as a noun**: "As shown in [1], ..." or "Reference [1] demonstrated...".
- **Author attribution**: "Vaswani et al. [1] introduced..." — author names are optional in IEEE; use them when it improves readability, always followed by the bracketed number.
Numbers are assigned in **order of first appearance in the text**, not alphabetically. The first reference you cite is `[1]`, the second new reference is `[2]`, and so on.
### Reference list entry for arXiv preprints
IEEE format for arXiv preprints:
```
[N] A. A. Author, B. B. Author, and C. C. Author, "Title of the paper," arXiv:ARXIV_ID, Year.
```
**Real example**:
```
[1] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, "Attention is all you need," arXiv:1706.03762, 2017.
```
Formatting rules:
- **Author names**: `FirstInitial. LastName` — initials before the last name, opposite of APA. Join with commas; last author gets `and` (no Oxford comma before it in strict IEEE, but accepted).
- **Title**: in double quotes, sentence case. No italics.
- **Source**: `arXiv:<id>` — the literal prefix `arXiv:` followed by the bare id (e.g. `arXiv:1706.03762`, not the full URL).
- **Year**: at the end, after a comma.
- **URL**: optional in IEEE. Include if the publication venue requires it; otherwise the `arXiv:<id>` identifier is sufficient and is the IEEE-preferred form.
### Special cases
- **More than 6 authors**: IEEE allows listing the first author followed by `et al.`: `A. Vaswani et al., "Attention is all you need," arXiv:1706.03762, 2017.` Use this for papers with many authors to keep reference entries readable.
- **If the paper has also been published at a venue**: prefer the venue citation format over arXiv. In this workflow we only have arXiv metadata, so always use the arXiv form.
## Report Structure
Follow this structure verbatim. Note that IEEE reports use **numeric citations throughout**, so you need to assign a number to each paper **in order of first appearance** in the Themes section, then use those numbers consistently in per-paper annotations and the reference list.
```markdown
# Systematic Literature Review: <Topic>
**Date**: <YYYY-MM-DD>
**Papers surveyed**: <N>
**Scope**: <arXiv search query, category, time window>
**Citation format**: IEEE
## Executive Summary
<3-5 sentences summarizing the state of the literature. Cite papers with bracketed numbers as you first introduce them, e.g. "Transformer architectures [1] have become the dominant approach, with extensions focusing on efficiency [2], [3] and long-context handling [4].">
## Methodology
This review surveyed <N> arXiv papers retrieved on <YYYY-MM-DD> using the query `<query>`<, filtered to category <cat>><, published between <start_date> and <end_date>>. Papers were sorted by <relevance | submission date> and the top <N> were included. Metadata extraction was performed by language-model agents, with cross-paper synthesis performed by the lead agent.
**Limitations of this review**: arXiv preprints are not peer-reviewed; coverage is limited to arXiv.
## Themes
<3-6 thematic sections. First appearance of each paper gets a bracketed number; subsequent mentions reuse the same number. The number assignment order is: first paper mentioned in Theme 1 gets [1], next new paper gets [2], etc.>
### Theme 1: <Theme name>
<Paragraphs describing the theme. Cite with bracketed numbers: "The original transformer architecture [1] introduced self-attention, which was later extended in [2] and [3]. Comparative analyses [4] show that...">
### Theme 2: <Theme name>
<...>
## Convergences and Disagreements
**Convergences**: <e.g. "Multiple papers [1], [3], [5] agree that X is necessary.">
**Disagreements**: <e.g. "While [1] argues X, [2] finds the opposite under condition Y.">
## Gaps and Open Questions
<What the collective literature does not yet address, with citations to papers that explicitly mention these gaps.>
## Per-Paper Annotations
<One subsection per paper, ordered by their assigned reference number.>
### [1] Vaswani et al., "Attention is all you need" (2017)
**Research question**: <1 sentence>
**Methodology**: <1-2 sentences>
**Key findings**:
- <bullet>
- <bullet>
- <bullet>
**Limitations**: <1-2 sentences>
### [2] <Next paper>
<...>
## References
<Numbered list in order of first appearance in the text. The number must match the in-text citations above.>
[1] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, "Attention is all you need," arXiv:1706.03762, 2017.
[2] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, "BERT: Pre-training of deep bidirectional transformers for language understanding," arXiv:1810.04805, 2018.
<... more entries ...>
```
## Quality checks before finalizing
Before saving the report, verify:
- [ ] Every paper in the surveyed set has a unique reference number.
- [ ] Reference numbers are assigned in order of **first appearance in the text**, not alphabetically.
- [ ] Every bracketed number in the text has a matching entry in the References section.
- [ ] Every entry in References is cited at least once in the text.
- [ ] Author names use `FirstInitial. LastName` format (initials before last name).
- [ ] Titles are in double quotes and sentence case.
- [ ] arXiv identifiers use the `arXiv:<bare_id>` form, not the full URL.
- [ ] Per-paper annotations are ordered by reference number, matching the References section order.