Kana & Mari’s SoundRepos
By: Kana & Mari
Language: ja
Categories: Technology
Kana と Mari が、GitHub で見つけた TTS・MIDI・Audio など “音” にまつわる注目リポジトリを声で紹介。音とコードが交差するオープンソースの世界を軽やかにナビゲートします。 Kana と Mari のプロフィールはこちら:Kana – Newbie Esports CasterMari – Newbie Esports Analyst※ 本番組の原稿は生成 AI を用いて自動生成されています。内容には誤りを含む可能性がありますので参考情報としてお楽しみください。
Episodes
pnnbao97/VieNeu-TTS
Jan 11, 2026Vietnamese TTS with instant voice cloning • On-device • Real-time CPU inference • 24kHz audio quality
Duration: 00:01:57eduardolat/kokoro-web
Jan 10, 2026Kokoro Web: Free AI text-to-speech, online or self-hosted, OpenAI compatible!
Duration: 00:01:32ekwek1/soprano
Jan 09, 2026Soprano: Instant, Ultra-Realistic Text-to-Speech
Duration: 00:01:51diodiogod/TTS-Audio-Suite
Jan 08, 2026A ComfyUI custom node integration for multi-engine multi-language Text-to-Speech and Voice Conversion. Supports: RVC, Cozy Voice 3, Step Audio EditX, IndexTTS-2, Chatterbox (classic and multilingual 23-lang), F5-TTS, Higgs Audio 2 and Microsoft VibeVoice with unlimited text length, SRT timing, Character support, and many audio tools
Duration: 00:01:43ddPn08/rvc-webui
Jan 07, 2026liujing04/Retrieval-based-Voice-Conversion-WebUI reconstruction project
Duration: 00:01:50huakunyang/SummerTTS
Jan 06, 2026SummerTTS 是一个基于C++的独立编译的中文和英文语音合成项目,可以本地运行不需要网络,而且没有额外的依赖,一键编译完成即可用于中文和英文的语音合成。SummerTTS is a standalone Chinese and English speech synthesis(TTS) project that has almost no dependency and could be easily used for Chinese TTS with just one key build out
Duration: 00:01:51shibing624/parrots
Jan 05, 2026Automatic Speech Recognition(ASR), Text-To-Speech(TTS) engine. 中英语音识别、多角色语音合成,支持多语言,准确率高
Duration: 00:01:42modelscope/KAN-TTS
Jan 04, 2026KAN-TTS is a speech-synthesis training framework, please try the demos we have posted at https://modelscope.cn/models?page=1&tasks=text-to-speech
Duration: 00:01:51mbailey/voicemode
Jan 03, 2026VoiceMode MCP brings natural conversations to Claude Code
Duration: 00:01:57gotev/android-speech
Jan 02, 2026Android speech recognition and text to speech made easy
Duration: 00:01:52see2023/Bert-VITS2-ext
Jan 01, 2026基于Bert-VITS2做的表情、动画测试. Animation testing based on Bert-VITS2.
Duration: 00:01:44google/tacotron
Dec 31, 2025Audio samples accompanying publications related to Tacotron, an end-to-end speech synthesis model.
Duration: 00:01:45StarmoonAI/Starmoon
Dec 30, 2025A conversational, AI device + software framework for companionship, entertainment, education, healthcare, IoT applications, and DIY robotics. Built with Python, NextJS, Arduino, ESP32, LLMs (GPT-4o), Deepgram STT and Azure TTS
Duration: 00:01:37p0p4k/vits2_pytorch
Dec 29, 2025unofficial vits2-TTS implementation in pytorch
Duration: 00:01:55wildminder/ComfyUI-VibeVoice
Dec 28, 2025ComfyUI custom node for the VibeVoice TTS. Expressive, long-form, multi-speaker conversational audio
Duration: 00:01:52daswer123/xtts-api-server
Dec 27, 2025A simple FastAPI Server to run XTTSv2
Duration: 00:01:48domesticatedviking/TextyMcSpeechy
Dec 26, 2025Easily create Piper text-to-speech models in any voice. Make a text-to-speech model with your own voice recordings, or use thousands of RVC voices. Works offline on a Raspberry pi. Rapidly record custom datasets for any metadata.csv file and listen to your model as it is training.
Duration: 00:01:50zuoban/tts
Dec 25, 2025tts 服务
Duration: 00:01:53alan-ai/alan-sdk-reactnative
Dec 24, 2025The Self-Coding System for Your App — Alan AI SDK for React Native
Duration: 00:01:51vilassn/whisper_android
Dec 23, 2025Offline Speech Recognition with OpenAI Whisper and TensorFlow Lite for Android
Duration: 00:01:56oscie57/tiktok-voice
Dec 22, 2025Simple Python script to interact with the TikTok TTS API
Duration: 00:01:46gexgd0419/NaturalVoiceSAPIAdapter
Dec 21, 2025Make Azure natural TTS voices accessible to any SAPI 5-compatible application.
Duration: 00:02:06geekwenjie/SmartJavaAI
Dec 20, 2025Java免费离线AI算法工具箱,支持人脸识别,活体检测,表情识别、目标检测、实例分割、行人检测、OCR文字识别、车牌识别、表格识别、ASR+TTS、机器翻译等功能,Maven引用即可使用。支持PyTorch、Tensorflow,已集成 Mtcnn、InsightFace、SeetaFace6、YOLOv8~v12、PaddleOCR(PPOCRv5)、Whisper等主流模型
Duration: 00:02:13AIDC-AI/Pixelle-Video
Dec 19, 2025AI 全自动短视频引擎 | AI Fully Automated Short Video Engine
Duration: 00:01:54lucasnewman/f5-tts-mlx
Dec 17, 2025Implementation of F5-TTS in MLX
Duration: 00:01:46madroidmaq/mlx-omni-server
Dec 16, 2025MLX Omni Server is a local inference server powered by Apple's MLX framework, specifically designed for Apple Silicon (M-series) chips. It implements OpenAI-compatible API endpoints, enabling seamless integration with existing OpenAI SDK clients while leveraging the power of local ML inference.
Duration: 00:01:45zai-org/GLM-TTS
Dec 15, 2025GLM-TTS: Controllable & Emotion-Expressive Zero-shot TTS with Multi-Reward Reinforcement Learning
Duration: 00:01:38daniilrobnikov/vits2
Dec 14, 2025VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech with Adversarial Learning and Architecture Design
Duration: 00:01:37superstarryeyes/lue
Dec 13, 2025Terminal eBook Reader with Audiobook-Quality Text-to-Speech — Supports EPUB, PDF, DOCX, HTML, RTF, TXT, and MD.
Duration: 00:01:47devnen/Chatterbox-TTS-Server
Dec 12, 2025Self-host the powerful Chatterbox TTS model. This server offers a user-friendly Web UI, flexible API endpoints (incl. OpenAI compatible), predefined voices, voice cloning, and large audiobook-scale text processing. Runs accelerated on NVIDIA (CUDA), AMD (ROCm), and CPU.
Duration: 00:01:39seungwonpark/melgan
Dec 11, 2025MelGAN vocoder (compatible with NVIDIA/tacotron2)
Duration: 00:01:50HA6Bots/Automatic-Youtube-Reddit-Text-To-Speech-Video-Generator-and-Uploader
Dec 10, 2025A series of 3 programs that will automatically receive scripts from Reddit, allow the user to edit them, then be sent off to a video generator where they will be uploaded to YouTube automatically.
Duration: 00:02:02lucasjinreal/Kokoros
Dec 09, 2025Kokoro in Rust. https://huggingface.co/hexgrad/Kokoro-82M Insanely fast, realtime TTS with high quality you ever have.
Duration: 00:01:51thorstenMueller/Thorsten-Voice
Dec 08, 2025Thorsten-Voice: A free to use, offline working, high quality german TTS voice should be available for every project without any license struggling.
Duration: 00:01:54google/voice-builder
Dec 07, 2025An opensource text-to-speech (TTS) voice building tool
Duration: 00:02:00soobinseo/Transformer-TTS
Dec 06, 2025A Pytorch Implementation of "Neural Speech Synthesis with Transformer Network"
Duration: 00:01:58lobehub/lobe-tts
Dec 05, 2025Lobe TTS - A high-quality & reliable TTS/STT library for Server and Browser
Duration: 00:01:44jaywalnut310/glow-tts
Dec 04, 2025A Generative Flow for Text-to-Speech via Monotonic Alignment Search
Duration: 00:01:54dlutton/flutter_tts
Dec 03, 2025Flutter Text to Speech package
Duration: 00:01:51C-Loftus/QuickPiperAudiobook
Dec 02, 2025With one command, create a natural-sounding audiobook from a variety of input formats (epub, mobi, txt, PDF, HTML and more!)
Duration: 00:01:51markovka17/dla
Dec 01, 2025Deep learning for audio processing
Duration: 00:01:54stepfun-ai/Step-Audio-EditX
Nov 30, 2025A powerful 3B-parameter, LLM-based Reinforcement Learning audio edit model excels at editing emotion, speaking style, and paralinguistics, and features robust zero-shot text-to-speech
Duration: 00:00:27coqui-ai/TTS-papers
Nov 29, 2025collection of TTS papers
Duration: 00:01:47cboard-org/cboard
Nov 28, 2025Augmentative and Alternative Communication (AAC) system with text-to-speech for the browser
Duration: 00:01:52wangzongming/esp-ai
Nov 27, 2025The simplest and lowest-cost AI integration solution. If you like this project, please give it a Star~ | 最简单、最低成本的AI接入方案。喜欢本项目的话点个 Star 吧~
Duration: 00:02:02VRCWizard/TTS-Voice-Wizard
Nov 26, 2025Speech to Text to Speech. Song now playing. Sends text as OSC messages to VRChat to display on avatar. (STTTS) (Speech to TTS) (VRC STT System) (VTuber TTS)
Duration: 00:01:57NVIDIA-AI-Blueprints/pdf-to-podcast
Nov 25, 2025Transform PDFs into AI podcasts for engaging on-the-go audio content.
Duration: 00:01:48supertone-inc/supertonic
Nov 24, 2025Lightning-fast, on-device TTS — running natively via ONNX.
Duration: 00:01:45BandarLabs/gitpodcast
Nov 23, 2025Convert any git repository into an engaging podcast
Duration: 00:01:46hujingshuang/MTrans
Nov 22, 2025Multi-source Translation
Duration: 00:01:49Tomiinek/Multilingual_Text_to_Speech
Nov 21, 2025An implementation of Tacotron 2 that supports multilingual experiments with parameter-sharing, code-switching, and voice cloning.
Duration: 00:01:43lobehub/lobe-vidol
Nov 20, 2025Lobe Vidol - Making Virtual Idols Accessible for EveryOne
Duration: 00:01:50OpenBMB/VoxCPM
Nov 19, 2025VoxCPM: Tokenizer-Free TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning
Duration: 00:02:03PABannier/bark.cpp
Nov 17, 2025Suno AI's Bark model in C/C++ for fast text-to-speech generation
Duration: 00:01:45daswer123/xtts-webui
Nov 16, 2025Webui for using XTTS and for finetuning it
Duration: 00:01:54GetStream/Vision-Agents
Nov 15, 2025Open Vision Agents by Stream. Build Vision Agents quickly with any model or video provider. Uses Stream's edge network for ultra-low latency.
Duration: 00:01:36Spr-Aachen/Easy-Voice-Toolkit
Nov 14, 2025一个简易的AI语音工具箱 | A user-friendly audio toolkit for voice recognition, voice transcription, voice conversion etc.
Duration: 00:01:58aedocw/epub2tts
Nov 13, 2025Turn an epub or text file into an audiobook
Duration: 00:01:56lmnt-com/diffwave
Nov 12, 2025DiffWave is a fast, high-quality neural vocoder and waveform synthesizer.
Duration: 00:01:47FireRedTeam/FireRedTTS
Nov 11, 2025An Open-Sourced LLM-empowered Foundation TTS System
Duration: 00:01:52gabrielmittag/NISQA
Nov 10, 2025NISQA - Non-Intrusive Speech Quality and TTS Naturalness Assessment
Duration: 00:01:50High-Logic/Genie-TTS
Nov 09, 2025GPT-SoVITS ONNX Inference Engine & Model Converter
Duration: 00:02:01mosecorg/mosec
Nov 08, 2025A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine
Duration: 00:01:35nazdridoy/kokoro-tts
Nov 07, 2025A CLI text-to-speech tool using the Kokoro model, supporting multiple languages, voices (with blending), and various input formats including EPUB books and PDF documents.
Duration: 00:01:38joey-zhou/xiaozhi-esp32-server-java
Nov 06, 2025小智ESP32的Java企业级管理平台,提供设备监控、音色定制、角色切换和对话记录管理的前后端及服务端一体化解决方案
Duration: 00:01:42athena-team/athena
Nov 05, 2025an open-source implementation of sequence-to-sequence based speech processing engine
Duration: 00:01:48wxxxcxx/ms-ra-forwarder
Nov 04, 2025免费的在线文本转语音API
Duration: 00:02:01ardha27/AI-Waifu-Vtuber
Nov 03, 2025AI Vtuber for Streaming on Youtube/Twitch
Duration: 00:02:03Azure-Samples/Cognitive-Speech-TTS
Nov 02, 2025Microsoft Text-to-Speech API sample code in several languages, part of Cognitive Services.
Duration: 00:01:41NATSpeech/NATSpeech
Nov 01, 2025A Non-Autoregressive Text-to-Speech (NAR-TTS) framework, including official PyTorch implementation of PortaSpeech (NeurIPS 2021) and DiffSpeech (AAAI 2022)
Duration: 00:01:46ttop32/MouseTooltipTranslator
Oct 31, 2025Mouseover Translate Any Language At Once - Chrome Extension: PDF Translator, EBOOK, EPUB, OCR, TTS, NETFLIX, YOUTUBE DUAL SUBTITLES, GOOGLE DOCS, AI, VIEWER, GMAIL, WRITING, IMAGE, DUAL SUBS, MANGA, HOVER, DICTIONARY, WEBTOON, EDGE, JAPANESE, ENGLISH
Duration: 00:01:55Artrajz/vits-simple-api
Oct 30, 2025A simple VITS HTTP API, developed by extending Moegoe with additional features.
Duration: 00:01:47Edresson/YourTTS
Oct 29, 2025YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone
Duration: 00:01:46janvarev/Irene-Voice-Assistant
Oct 28, 2025Ирина - русский голосовой ассистент для работы оффлайн. Поддерживает скиллы через плагины.
Duration: 00:01:49Enemyx-net/VibeVoice-ComfyUI
Oct 27, 2025A comprehensive ComfyUI integration for Microsoft's VibeVoice text-to-speech model, enabling high-quality single and multi-speaker voice synthesis directly within your ComfyUI workflows.
Duration: 00:01:59PriesiaMioShirakana/DragonianVoice
Oct 26, 2025多个SVC/TTS的C++推理库
Duration: 00:01:46semperai/amica
Oct 25, 2025Amica is an open source interface for interactive communication with 3D characters with voice synthesis and speech recognition.
Duration: 00:02:11Stypox/dicio-android
Oct 24, 2025Dicio assistant app for Android
Duration: 00:01:50Henry-23/VideoChat
Oct 23, 2025实时语音交互数字人,支持端到端语音方案(GLM-4-Voice - THG)和级联方案(ASR-LLM-TTS-THG)。可自定义形象与音色,无须训练,支持音色克隆,首包延迟低至3s。Real-time voice interactive digital human, supporting end-to-end voice solutions (GLM-4-Voice - THG) and cascaded solutions (ASR-LLM-TTS-THG). Customizable appearance and voice, supporting voice cloning, with initial package delay as low as 3s.
Duration: 00:01:48spring-media/TransformerTTS
Oct 21, 2025Transformer TTS: Implementation of a non-autoregressive Transformer based neural network for text to speech.
Duration: 00:01:38Kyubyong/dc_tts
Oct 20, 2025A TensorFlow Implementation of DC-TTS: yet another text-to-speech model
Duration: 00:01:54ictnlp/StreamSpeech
Oct 19, 2025StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
Duration: 00:02:01bawangxx/XZVoice
Oct 18, 2025Free and open source text-to-speech software
Duration: 00:01:53hgneng/ekho
Oct 17, 2025Chinese text-to-speech engine
Duration: 00:01:52gitmylo/audio-webui
Oct 15, 2025A webui for different audio related Neural Networks
Duration: 00:01:49PlayVoice/vits_chinese
Oct 14, 2025Best practice TTS based on BERT and VITS with some Natural Speech Features Of Microsoft; Support ONNX streaming out!
Duration: 00:01:37haoheliu/voicefixer
Oct 13, 2025General Speech Restoration
Duration: 00:02:07sauravpanda/BrowserAI
Oct 12, 2025Run local LLMs like llama, deepseek-distill, kokoro and more inside your browser
Duration: 00:02:04R3gm/SoniTranslate
Oct 11, 2025Synchronized Translation for Videos. Video dubbing
Duration: 00:01:50travisvn/openai-edge-tts
Oct 10, 2025Free, high-quality text-to-speech API endpoint to replace OpenAI, Azure, or ElevenLabs
Duration: 00:02:07lenML/Speech-AI-Forge
Oct 09, 2025Speech-AI-Forge is a project developed around TTS generation model, implementing an API Server and a Gradio-based WebUI.
Duration: 00:01:54coqui-ai/open-speech-corpora
Oct 08, 2025A list of accessible speech corpora for ASR, TTS, and other Speech Technologies
Duration: 00:01:42LuckyHookin/edge-TTS-record
Oct 07, 2025一个可以录制 Microsoft Edge 浏览器的语音合成(TTS)语音并输出为 .wav 音频的(windows平台)工具。
Duration: 00:01:56edwko/OuteTTS
Oct 06, 2025Interface for OuteTTS models.
Duration: 00:01:43voice-cloning-app/Voice-Cloning-App
Oct 05, 2025A Python/Pytorch app for easily synthesising human voices
Duration: 00:01:52wwbin2017/bailing
Oct 04, 2025百聆 是一个类似GPT-4o的语音对话机器人,通过ASR+LLM+TTS实现,集成DeepSeek R1等优秀大模型,时延低至800ms,Mac等低配置也可运行,支持打断
Duration: 00:01:47neural-maze/ava-whatsapp-agent-course
Oct 03, 2025Meet Ava, the WhatsApp Agent
Duration: 00:01:46cosin2077/easyVoice
Oct 02, 2025开源文本转语音工具,支持超长文本,多角色配音
Duration: 00:01:52kan-bayashi/ParallelWaveGAN
Oct 01, 2025Unofficial Parallel WaveGAN (+ MelGAN & Multi-band MelGAN & HiFi-GAN & StyleMelGAN) with Pytorch
Duration: 00:01:48AlexxIT/YandexStation
Sep 30, 2025Управление Яндекс.Станцией и другими устройствами умного дома с Алисой из Home Assistant
Duration: 00:01:44