Files
DATA 6de0bf9f5b Initial commit: hardened DeerFlow factory
Vendored deer-flow upstream (bytedance/deer-flow) plus prompt-injection
hardening:

- New deerflow.security package: content_delimiter, html_cleaner,
  sanitizer (8 layers — invisible chars, control chars, symbols, NFC,
  PUA, tag chars, horizontal whitespace collapse with newline/tab
  preservation, length cap)
- New deerflow.community.searx package: web_search, web_fetch,
  image_search backed by a private SearX instance, every external
  string sanitized and wrapped in <<<EXTERNAL_UNTRUSTED_CONTENT>>>
  delimiters
- All native community web providers (ddg_search, tavily, exa,
  firecrawl, jina_ai, infoquest, image_search) replaced with hard-fail
  stubs that raise NativeWebToolDisabledError at import time, so a
  misconfigured tool.use path fails loud rather than silently falling
  back to unsanitized output
- Native client back-doors (jina_client.py, infoquest_client.py)
  stubbed too
- Native-tool tests quarantined under tests/_disabled_native/
  (collect_ignore_glob via local conftest.py)
- Sanitizer Layer 7 fix: only collapse horizontal whitespace, preserve
  newlines and tabs so list/table structure survives
- Hardened runtime config.yaml references only the searx-backed tools
- Factory overlay (backend/) kept in sync with deer-flow tree as a
  reference / source

See HARDENING.md for the full audit trail and verification steps.
2026-04-12 14:23:57 +02:00

7.2 KiB

BibTeX Citation Template

Use this template when the user mentions BibTeX, LaTeX, wants machine-readable references, or is writing a paper that will be typeset with a LaTeX citation style (natbib, biblatex, etc.).

Critical: use @misc, not @article, for arXiv papers

arXiv preprints must be cited as @misc, not @article. This is the most common mistake when generating BibTeX for arXiv papers, and it matters:

  • @article requires a journal field. arXiv is not a journal — it is a preprint server. Using @article with journal = {arXiv} is technically wrong and some bibliography styles will complain or render it inconsistently.
  • @misc is the correct entry type for preprints, technical reports, and other non-journal publications. It accepts howpublished and eprint fields, which is exactly what arXiv citations need.
  • Only switch to @article (or @inproceedings) when the paper has been formally published in a peer-reviewed venue and you have the venue metadata. In this workflow we only have arXiv metadata, so always emit @misc.

Citation Format Rules

Entry structure for arXiv preprints

@misc{citekey,
  author        = {LastName1, FirstName1 and LastName2, FirstName2 and ...},
  title         = {Title of the Paper},
  year          = {YYYY},
  eprint        = {ARXIV_ID},
  archivePrefix = {arXiv},
  primaryClass  = {PRIMARY_CATEGORY},
  url           = {https://arxiv.org/abs/ARXIV_ID}
}

Real example:

@misc{vaswani2017attention,
  author        = {Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N. and Kaiser, {\L}ukasz and Polosukhin, Illia},
  title         = {Attention Is All You Need},
  year          = {2017},
  eprint        = {1706.03762},
  archivePrefix = {arXiv},
  primaryClass  = {cs.CL},
  url           = {https://arxiv.org/abs/1706.03762}
}

Field rules

  • Cite key: <firstauthorlast><year><firstwordoftitle>, all lowercase, no punctuation. Example: vaswani2017attention. Keys must be unique within the report.
  • author: LastName, FirstName and LastName, FirstName and ... — note the literal word and between authors, not a comma. LaTeX requires this exact separator. LastName comes first, then a comma, then the given names.
  • Special characters: escape or wrap LaTeX-sensitive characters. For example, Łukasz becomes {\L}ukasz, é becomes {\'e} (or wrap the whole name in braces to preserve casing: {Łukasz}). If unsure, wrap the problematic name in curly braces.
  • title: preserve the paper's capitalization by wrapping it in double braces if it contains acronyms or proper nouns you need to keep capitalized: title = {{BERT}: Pre-training of Deep Bidirectional Transformers}. Otherwise plain braces are fine.
  • year: the 4-digit year from the paper's published field.
  • eprint: the bare arXiv id (e.g. 1706.03762), without the arXiv: prefix and without the version suffix.
  • archivePrefix: literal string {arXiv}.
  • primaryClass: the first category from the paper's categories list (e.g. cs.CL, cs.CV, stat.ML). This is the paper's primary subject area.
  • url: the full abs_url from paper metadata.

Report Structure

The BibTeX report is slightly different from APA / IEEE: the bibliography is a separate .bib file, and the main report uses LaTeX-style \cite{key} references that would resolve against that file. Since we are emitting markdown, we show \cite{key} verbatim in the prose and emit the BibTeX entries inside a fenced code block at the end.

# Systematic Literature Review: <Topic>

**Date**: <YYYY-MM-DD>
**Papers surveyed**: <N>
**Scope**: <arXiv search query, category, time window>
**Citation format**: BibTeX

## Executive Summary

<3-5 sentences. Use \cite{key} form for citations, e.g. "Transformer architectures \cite{vaswani2017attention} have become the dominant approach.">

## Methodology

This review surveyed <N> arXiv papers retrieved on <YYYY-MM-DD> using the query `<query>`<, filtered to category <cat>><, published between <start_date> and <end_date>>. Metadata extraction was performed by language-model agents, with cross-paper synthesis performed by the lead agent. All citations in this report use BibTeX cite keys; the corresponding `.bib` entries are at the end of this document.

**Limitations of this review**: arXiv preprints are not peer-reviewed; coverage is limited to arXiv.

## Themes

### Theme 1: <Theme name>

<Paragraphs describing the theme. Cite with \cite{key} form: "The original transformer architecture \cite{vaswani2017attention} introduced self-attention, which was later extended in \cite{dai2019transformerxl}.">

### Theme 2: <Theme name>

<...>

## Convergences and Disagreements

**Convergences**: <e.g. "Multiple papers \cite{key1,key2,key3} agree that X is necessary.">

**Disagreements**: <...>

## Gaps and Open Questions

<...>

## Per-Paper Annotations

### \cite{vaswani2017attention} — "Attention Is All You Need" (2017)

**Research question**: <1 sentence>
**Methodology**: <1-2 sentences>
**Key findings**:
- <bullet>
- <bullet>
- <bullet>
**Limitations**: <1-2 sentences>

### \cite{devlin2018bert} — "BERT: Pre-training of Deep Bidirectional Transformers" (2018)

<...>

## BibTeX Bibliography

Save the entries below to a `.bib` file and reference them from your LaTeX document with `\bibliography{filename}`.

\`\`\`bibtex
@misc{vaswani2017attention,
  author        = {Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N. and Kaiser, {\L}ukasz and Polosukhin, Illia},
  title         = {Attention Is All You Need},
  year          = {2017},
  eprint        = {1706.03762},
  archivePrefix = {arXiv},
  primaryClass  = {cs.CL},
  url           = {https://arxiv.org/abs/1706.03762}
}

@misc{devlin2018bert,
  author        = {Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina},
  title         = {{BERT}: Pre-training of Deep Bidirectional Transformers for Language Understanding},
  year          = {2018},
  eprint        = {1810.04805},
  archivePrefix = {arXiv},
  primaryClass  = {cs.CL},
  url           = {https://arxiv.org/abs/1810.04805}
}

... more entries, one per paper ...
\`\`\`

(Note: in the actual saved report, use a real fenced code block ```bibtex — the backticks above are escaped only because this template file itself is inside a markdown code block when rendered.)

Quality checks before finalizing

Before saving the report, verify:

  • Every entry is @misc, not @article (this workflow only has arXiv metadata).
  • Cite keys are unique within the report.
  • Cite keys follow the <firstauthorlast><year><firstword> pattern, all lowercase.
  • author field uses and (the literal word) between authors, not commas.
  • LaTeX special characters in author names are escaped or brace-wrapped.
  • eprint is the bare arXiv id (no arXiv: prefix, no version suffix).
  • primaryClass is set from the paper's first category.
  • Every \cite{key} in the text has a matching @misc entry in the bibliography.
  • The bibliography is emitted inside a fenced ```bibtex code block so users can copy-paste directly into a .bib file.