- 📁 scripts/
- 📄 LICENSE.txt
- 📄 SKILL.md
Implement speech-to-text (ASR/automatic speech recognition) capabilities using the z-ai-web-dev-sdk. Use this skill when the user needs to transcribe audio files, convert speech to text, build voice input features, or process audio recordings. Supports base64 encoded audio files and returns accurate text transcriptions.
- 📁 core/
- 📁 models/
- 📁 references/
- 📄 .env.example
- 📄 pyproject.toml
- 📄 README.md
Generates images, videos, audio, speech, and music using MuleRouter or MuleRun multimodal APIs. Text-to-Image, Image-to-Image, Text-to-Video, Image-to-Video, Reference-to-Video, Video-to-Video, video editing (VACE, keyframe interpolation), Text-to-Speech, Text-to-Music. Use when the user wants to generate, edit, or transform images, videos, speech, or music using AI models like Wan2.6, Veo3, Nano Banana Pro, Sora2, Midjourney, Kling V3, Kling V3 Omni, MiniMax Speech 2.8, MiniMax Music 2.5.
语音领域每日论文速递。搜索最新语音大模型(Speech LLM、TTS、ASR、codec、speech generation)和语音前端(speech enhancement、noise suppression、beamforming、source separation、dereverberation)预印本论文,以毒舌但判断极准的 senior reviewer 口吻精读每篇论文,重点服务语音大模型和语音前端研究者;输出技术方案、实验结果、简介摘要和10分制评分,并将结果写入腾讯文档「每日论文速递」文件夹。触发场景:用户说"帮我找最新语音论文"、"搜语音预印本"、"语音论文速递"、"今天有什么语音论文"、"看看最新的 TTS/ASR/语音增强论文"等。