Initial commit: hardened DeerFlow factory

Vendored deer-flow upstream (bytedance/deer-flow) plus prompt-injection
hardening:

- New deerflow.security package: content_delimiter, html_cleaner,
  sanitizer (8 layers — invisible chars, control chars, symbols, NFC,
  PUA, tag chars, horizontal whitespace collapse with newline/tab
  preservation, length cap)
- New deerflow.community.searx package: web_search, web_fetch,
  image_search backed by a private SearX instance, every external
  string sanitized and wrapped in <<<EXTERNAL_UNTRUSTED_CONTENT>>>
  delimiters
- All native community web providers (ddg_search, tavily, exa,
  firecrawl, jina_ai, infoquest, image_search) replaced with hard-fail
  stubs that raise NativeWebToolDisabledError at import time, so a
  misconfigured tool.use path fails loud rather than silently falling
  back to unsanitized output
- Native client back-doors (jina_client.py, infoquest_client.py)
  stubbed too
- Native-tool tests quarantined under tests/_disabled_native/
  (collect_ignore_glob via local conftest.py)
- Sanitizer Layer 7 fix: only collapse horizontal whitespace, preserve
  newlines and tabs so list/table structure survives
- Hardened runtime config.yaml references only the searx-backed tools
- Factory overlay (backend/) kept in sync with deer-flow tree as a
  reference / source

See HARDENING.md for the full audit trail and verification steps.
This commit is contained in:
2026-04-12 14:23:57 +02:00
commit 6de0bf9f5b
889 changed files with 173052 additions and 0 deletions

View File

@@ -0,0 +1,185 @@
---
name: podcast-generation
description: Use this skill when the user requests to generate, create, or produce podcasts from text content. Converts written content into a two-host conversational podcast audio format with natural dialogue.
---
# Podcast Generation Skill
## Overview
This skill generates high-quality podcast audio from text content. The workflow includes creating a structured JSON script (conversational dialogue) and executing audio generation through text-to-speech synthesis.
## Core Capabilities
- Convert any text content (articles, reports, documentation) into podcast scripts
- Generate natural two-host conversational dialogue (male and female hosts)
- Synthesize speech audio using text-to-speech
- Mix audio chunks into a final podcast MP3 file
- Support both English and Chinese content
## Workflow
### Step 1: Understand Requirements
When a user requests podcast generation, identify:
- Source content: The text/article/report to convert into a podcast
- Language: English or Chinese (based on content)
- Output location: Where to save the generated podcast
- You don't need to check the folder under `/mnt/user-data`
### Step 2: Create Structured Script JSON
Generate a structured JSON script file in `/mnt/user-data/workspace/` with naming pattern: `{descriptive-name}-script.json`
The JSON structure:
```json
{
"locale": "en",
"lines": [
{"speaker": "male", "paragraph": "dialogue text"},
{"speaker": "female", "paragraph": "dialogue text"}
]
}
```
### Step 3: Execute Generation
Call the Python script:
```bash
python /mnt/skills/public/podcast-generation/scripts/generate.py \
--script-file /mnt/user-data/workspace/script-file.json \
--output-file /mnt/user-data/outputs/generated-podcast.mp3 \
--transcript-file /mnt/user-data/outputs/generated-podcast-transcript.md
```
Parameters:
- `--script-file`: Absolute path to JSON script file (required)
- `--output-file`: Absolute path to output MP3 file (required)
- `--transcript-file`: Absolute path to output transcript markdown file (optional, but recommended)
> [!IMPORTANT]
> - Execute the script in one complete call. Do NOT split the workflow into separate steps.
> - The script handles all TTS API calls and audio generation internally.
> - Do NOT read the Python file, just call it with the parameters.
> - Always include `--transcript-file` to generate a readable transcript for the user.
## Script JSON Format
The script JSON file must follow this structure:
```json
{
"title": "The History of Artificial Intelligence",
"locale": "en",
"lines": [
{"speaker": "male", "paragraph": "Hello Deer! Welcome back to another episode."},
{"speaker": "female", "paragraph": "Hey everyone! Today we have an exciting topic to discuss."},
{"speaker": "male", "paragraph": "That's right! We're going to talk about..."}
]
}
```
Fields:
- `title`: Title of the podcast episode (optional, used as heading in transcript)
- `locale`: Language code - "en" for English or "zh" for Chinese
- `lines`: Array of dialogue lines
- `speaker`: Either "male" or "female"
- `paragraph`: The dialogue text for this speaker
## Script Writing Guidelines
When creating the script JSON, follow these guidelines:
### Format Requirements
- Only two hosts: male and female, alternating naturally
- Target runtime: approximately 10 minutes of dialogue (around 40-60 lines)
- Start with the male host saying a greeting that includes "Hello Deer"
### Tone & Style
- Natural, conversational dialogue - like two friends chatting
- Use casual expressions and conversational transitions
- Avoid overly formal language or academic tone
- Include reactions, follow-up questions, and natural interjections
### Content Guidelines
- Frequent back-and-forth between hosts
- Keep sentences short and easy to follow when spoken
- Plain text only - no markdown formatting in the output
- Translate technical concepts into accessible language
- No mathematical formulas, code, or complex notation
- Make content engaging and accessible for audio-only listeners
- Exclude meta information like dates, author names, or document structure
## Podcast Generation Example
User request: "Generate a podcast about the history of artificial intelligence"
Step 1: Create script file `/mnt/user-data/workspace/ai-history-script.json`:
```json
{
"title": "The History of Artificial Intelligence",
"locale": "en",
"lines": [
{"speaker": "male", "paragraph": "Hello Deer! Welcome back to another fascinating episode. Today we're diving into something that's literally shaping our future - the history of artificial intelligence."},
{"speaker": "female", "paragraph": "Oh, I love this topic! You know, AI feels so modern, but it actually has roots going back over seventy years."},
{"speaker": "male", "paragraph": "Exactly! It all started back in the 1950s. The term artificial intelligence was actually coined by John McCarthy in 1956 at a famous conference at Dartmouth."},
{"speaker": "female", "paragraph": "Wait, so they were already thinking about machines that could think back then? That's incredible!"},
{"speaker": "male", "paragraph": "Right? The early pioneers were so optimistic. They thought we'd have human-level AI within a generation."},
{"speaker": "female", "paragraph": "But things didn't quite work out that way, did they?"},
{"speaker": "male", "paragraph": "No, not at all. The 1970s brought what's called the first AI winter..."}
]
}
```
Step 2: Execute generation:
```bash
python /mnt/skills/public/podcast-generation/scripts/generate.py \
--script-file /mnt/user-data/workspace/ai-history-script.json \
--output-file /mnt/user-data/outputs/ai-history-podcast.mp3 \
--transcript-file /mnt/user-data/outputs/ai-history-transcript.md
```
This will generate:
- `ai-history-podcast.mp3`: The audio podcast file
- `ai-history-transcript.md`: A readable markdown transcript of the podcast
## Specific Templates
Read the following template file only when matching the user request.
- [Tech Explainer](templates/tech-explainer.md) - For converting technical documentation and tutorials
## Output Format
The generated podcast follows the "Hello Deer" format:
- Two hosts: one male, one female
- Natural conversational dialogue
- Starts with "Hello Deer" greeting
- Target duration: approximately 10 minutes
- Alternating speakers for engaging flow
## Output Handling
After generation:
- Podcasts and transcripts are saved in `/mnt/user-data/outputs/`
- Share both the podcast MP3 and transcript MD with user using `present_files` tool
- Provide brief description of the generation result (topic, duration, hosts)
- Offer to regenerate if adjustments needed
## Requirements
The following environment variables must be set:
- `VOLCENGINE_TTS_APPID`: Volcengine TTS application ID
- `VOLCENGINE_TTS_ACCESS_TOKEN`: Volcengine TTS access token
- `VOLCENGINE_TTS_CLUSTER`: Volcengine TTS cluster (optional, defaults to "volcano_tts")
## Notes
- **Always execute the full pipeline in one call** - no need to test individual steps or worry about timeouts
- The script JSON should match the content language (en or zh)
- Technical content should be simplified for audio accessibility in the script
- Complex notations (formulas, code) should be translated to plain language in the script
- Long content may result in longer podcasts

View File

@@ -0,0 +1,284 @@
import argparse
import base64
import json
import logging
import os
import uuid
from concurrent.futures import ThreadPoolExecutor, as_completed
from typing import Literal, Optional
import requests
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
# Types
class ScriptLine:
def __init__(self, speaker: Literal["male", "female"] = "male", paragraph: str = ""):
self.speaker = speaker
self.paragraph = paragraph
class Script:
def __init__(self, locale: Literal["en", "zh"] = "en", lines: Optional[list[ScriptLine]] = None):
self.locale = locale
self.lines = lines or []
@classmethod
def from_dict(cls, data: dict) -> "Script":
script = cls(locale=data.get("locale", "en"))
for line in data.get("lines", []):
script.lines.append(
ScriptLine(
speaker=line.get("speaker", "male"),
paragraph=line.get("paragraph", ""),
)
)
return script
def text_to_speech(text: str, voice_type: str) -> Optional[bytes]:
"""Convert text to speech using Volcengine TTS."""
app_id = os.getenv("VOLCENGINE_TTS_APPID")
access_token = os.getenv("VOLCENGINE_TTS_ACCESS_TOKEN")
cluster = os.getenv("VOLCENGINE_TTS_CLUSTER", "volcano_tts")
if not app_id or not access_token:
raise ValueError(
"VOLCENGINE_TTS_APPID and VOLCENGINE_TTS_ACCESS_TOKEN environment variables must be set"
)
url = "https://openspeech.bytedance.com/api/v1/tts"
# Authentication: Bearer token with semicolon separator
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer;{access_token}",
}
payload = {
"app": {
"appid": app_id,
"token": "access_token", # literal string, not the actual token
"cluster": cluster,
},
"user": {"uid": "podcast-generator"},
"audio": {
"voice_type": voice_type,
"encoding": "mp3",
"speed_ratio": 1.2,
},
"request": {
"reqid": str(uuid.uuid4()), # must be unique UUID
"text": text,
"text_type": "plain",
"operation": "query",
},
}
try:
response = requests.post(url, json=payload, headers=headers)
if response.status_code != 200:
logger.error(f"TTS API error: {response.status_code} - {response.text}")
return None
result = response.json()
if result.get("code") != 3000:
logger.error(f"TTS error: {result.get('message')} (code: {result.get('code')})")
return None
audio_data = result.get("data")
if audio_data:
return base64.b64decode(audio_data)
except Exception as e:
logger.error(f"TTS error: {str(e)}")
return None
def _process_line(args: tuple[int, ScriptLine, int]) -> tuple[int, Optional[bytes]]:
"""Process a single script line for TTS. Returns (index, audio_bytes)."""
i, line, total = args
# Select voice based on speaker gender
if line.speaker == "male":
voice_type = "zh_male_yangguangqingnian_moon_bigtts" # Male voice
else:
voice_type = "zh_female_sajiaonvyou_moon_bigtts" # Female voice
logger.info(f"Processing line {i + 1}/{total} ({line.speaker})")
audio = text_to_speech(line.paragraph, voice_type)
if not audio:
logger.warning(f"Failed to generate audio for line {i + 1}")
return (i, audio)
def tts_node(script: Script, max_workers: int = 4) -> list[bytes]:
"""Convert script lines to audio chunks using TTS with multi-threading."""
logger.info(f"Converting script to audio using {max_workers} workers...")
total = len(script.lines)
# Handle empty script case
if total == 0:
raise ValueError("Script contains no lines to process")
# Validate required environment variables before starting TTS
if not os.getenv("VOLCENGINE_TTS_APPID") or not os.getenv("VOLCENGINE_TTS_ACCESS_TOKEN"):
raise ValueError(
"Missing required environment variables: VOLCENGINE_TTS_APPID and VOLCENGINE_TTS_ACCESS_TOKEN must be set"
)
tasks = [(i, line, total) for i, line in enumerate(script.lines)]
# Use ThreadPoolExecutor for parallel TTS generation
results: dict[int, Optional[bytes]] = {}
failed_indices: list[int] = []
with ThreadPoolExecutor(max_workers=max_workers) as executor:
futures = {executor.submit(_process_line, task): task[0] for task in tasks}
for future in as_completed(futures):
idx, audio = future.result()
results[idx] = audio
# Use `not audio` to catch both None and empty bytes
if not audio:
failed_indices.append(idx)
# Log failed lines with 1-based indices for user-friendly output
if failed_indices:
logger.warning(
f"Failed to generate audio for {len(failed_indices)}/{total} lines: "
f"line numbers {sorted(i + 1 for i in failed_indices)}"
)
# Collect results in order, skipping failed ones
audio_chunks = []
for i in range(total):
audio = results.get(i)
if audio:
audio_chunks.append(audio)
logger.info(f"Generated {len(audio_chunks)}/{total} audio chunks successfully")
if not audio_chunks:
raise ValueError(
f"TTS generation failed for all {total} lines. "
"Please check VOLCENGINE_TTS_APPID and VOLCENGINE_TTS_ACCESS_TOKEN environment variables."
)
return audio_chunks
def mix_audio(audio_chunks: list[bytes]) -> bytes:
"""Combine audio chunks into a single audio file."""
logger.info("Mixing audio chunks...")
if not audio_chunks:
raise ValueError("No audio chunks to mix - TTS generation may have failed")
output = b"".join(audio_chunks)
if len(output) == 0:
raise ValueError("Mixed audio is empty - TTS generation may have failed")
logger.info(f"Audio mixing complete: {len(output)} bytes")
return output
def generate_markdown(script: Script, title: str = "Podcast Script") -> str:
"""Generate a markdown script from the podcast script."""
lines = [f"# {title}", ""]
for line in script.lines:
speaker_name = "**Host (Male)**" if line.speaker == "male" else "**Host (Female)**"
lines.append(f"{speaker_name}: {line.paragraph}")
lines.append("")
return "\n".join(lines)
def generate_podcast(
script_file: str,
output_file: str,
transcript_file: Optional[str] = None,
) -> str:
"""Generate a podcast from a script JSON file."""
# Read script JSON
with open(script_file, "r", encoding="utf-8") as f:
script_json = json.load(f)
if "lines" not in script_json:
raise ValueError(f"Invalid script format: missing 'lines' key. Got keys: {list(script_json.keys())}")
script = Script.from_dict(script_json)
logger.info(f"Loaded script with {len(script.lines)} lines")
# Generate transcript markdown if requested
if transcript_file:
title = script_json.get("title", "Podcast Script")
markdown_content = generate_markdown(script, title)
transcript_dir = os.path.dirname(transcript_file)
if transcript_dir:
os.makedirs(transcript_dir, exist_ok=True)
with open(transcript_file, "w", encoding="utf-8") as f:
f.write(markdown_content)
logger.info(f"Generated transcript to {transcript_file}")
# Convert to audio
audio_chunks = tts_node(script)
if not audio_chunks:
raise Exception("Failed to generate any audio")
# Mix audio
output_audio = mix_audio(audio_chunks)
# Save output
output_dir = os.path.dirname(output_file)
if output_dir:
os.makedirs(output_dir, exist_ok=True)
with open(output_file, "wb") as f:
f.write(output_audio)
result = f"Successfully generated podcast to {output_file}"
if transcript_file:
result += f" and transcript to {transcript_file}"
return result
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Generate podcast from script JSON file")
parser.add_argument(
"--script-file",
required=True,
help="Absolute path to script JSON file",
)
parser.add_argument(
"--output-file",
required=True,
help="Output path for generated podcast MP3",
)
parser.add_argument(
"--transcript-file",
required=False,
help="Output path for transcript markdown file (optional)",
)
args = parser.parse_args()
try:
result = generate_podcast(
args.script_file,
args.output_file,
args.transcript_file,
)
print(result)
except Exception as e:
import traceback
print(f"Error generating podcast: {e}")
traceback.print_exc()

View File

@@ -0,0 +1,63 @@
# Tech Explainer Podcast Template
Use this template when converting technical documentation, API guides, or developer tutorials into podcasts.
## Input Preparation
When the user wants to convert technical content to a podcast, help them structure the input:
1. **Simplify Code Examples**: Replace code snippets with plain language descriptions
- Instead of showing actual code, describe what the code does
- Focus on concepts rather than syntax
2. **Remove Complex Notation**:
- Mathematical formulas should be explained in words
- API endpoints described by function rather than URL paths
- Configuration examples summarized as settings descriptions
3. **Add Context**:
- Explain why the technology matters
- Include real-world use cases
- Add analogies for complex concepts
## Example Transformation
### Original Technical Content:
```markdown
# Using the API
POST /api/v1/users
{
"name": "John",
"email": "john@example.com"
}
Response: 201 Created
```
### Podcast-Ready Content:
```markdown
# Creating Users with the API
The user creation feature allows applications to register new users in the system.
When you want to add a new user, you send their name and email address to the server.
If everything goes well, the server confirms the user was created successfully.
This is commonly used in signup flows, admin dashboards, or when importing users from other systems.
```
## Generation Command
```bash
python /mnt/skills/public/podcast-generation/scripts/generate.py \
--script-file /mnt/user-data/workspace/tech-explainer-script.json \
--output-file /mnt/user-data/outputs/tech-explainer-podcast.mp3 \
--transcript-file /mnt/user-data/outputs/tech-explainer-transcript.md
```
## Tips for Technical Podcasts
- Keep episodes focused on one main concept
- Use analogies to explain abstract concepts
- Include practical "why this matters" context
- Avoid jargon without explanation
- Make the dialogue accessible to beginners