Kana & Mari’s SoundRepos

Kana & Mari’s SoundRepos

By: Kana & Mari

Language: ja

Categories: Technology

Kana と Mari が、GitHub で見つけた TTS・MIDI・Audio など “音” にまつわる注目リポジトリを声で紹介。音とコードが交差するオープンソースの世界を軽やかにナビゲートします。 Kana と Mari のプロフィールはこちら:Kana – Newbie Esports CasterMari – Newbie Esports Analyst※ 本番組の原稿は生成 AI を用いて自動生成されています。内容には誤りを含む可能性がありますので参考情報としてお楽しみください。

Episodes

pnnbao97/VieNeu-TTS
Jan 11, 2026

Vietnamese TTS with instant voice cloning • On-device • Real-time CPU inference • 24kHz audio quality

Duration: 00:01:57
eduardolat/kokoro-web
Jan 10, 2026

Kokoro Web: Free AI text-to-speech, online or self-hosted, OpenAI compatible!

Duration: 00:01:32
ekwek1/soprano
Jan 09, 2026

Soprano: Instant, Ultra-Realistic Text-to-Speech

Duration: 00:01:51
diodiogod/TTS-Audio-Suite
Jan 08, 2026

A ComfyUI custom node integration for multi-engine multi-language Text-to-Speech and Voice Conversion. Supports: RVC, Cozy Voice 3, Step Audio EditX, IndexTTS-2, Chatterbox (classic and multilingual 23-lang), F5-TTS, Higgs Audio 2 and Microsoft VibeVoice with unlimited text length, SRT timing, Character support, and many audio tools

Duration: 00:01:43
ddPn08/rvc-webui
Jan 07, 2026

liujing04/Retrieval-based-Voice-Conversion-WebUI reconstruction project

Duration: 00:01:50
huakunyang/SummerTTS
Jan 06, 2026

SummerTTS 是一个基于C++的独立编译的中文和英文语音合成项目,可以本地运行不需要网络,而且没有额外的依赖,一键编译完成即可用于中文和英文的语音合成。SummerTTS is a standalone Chinese and English speech synthesis(TTS) project that has almost no dependency and could be easily used for Chinese TTS with just one key build out

Duration: 00:01:51
shibing624/parrots
Jan 05, 2026

Automatic Speech Recognition(ASR), Text-To-Speech(TTS) engine. 中英语音识别、多角色语音合成,支持多语言,准确率高

Duration: 00:01:42
modelscope/KAN-TTS
Jan 04, 2026

KAN-TTS is a speech-synthesis training framework, please try the demos we have posted at https://modelscope.cn/models?page=1&tasks=text-to-speech

Duration: 00:01:51
mbailey/voicemode
Jan 03, 2026

VoiceMode MCP brings natural conversations to Claude Code

Duration: 00:01:57
gotev/android-speech
Jan 02, 2026

Android speech recognition and text to speech made easy

Duration: 00:01:52
see2023/Bert-VITS2-ext
Jan 01, 2026

基于Bert-VITS2做的表情、动画测试. Animation testing based on Bert-VITS2.

Duration: 00:01:44
google/tacotron
Dec 31, 2025

Audio samples accompanying publications related to Tacotron, an end-to-end speech synthesis model.

Duration: 00:01:45
StarmoonAI/Starmoon
Dec 30, 2025

A conversational, AI device + software framework for companionship, entertainment, education, healthcare, IoT applications, and DIY robotics. Built with Python, NextJS, Arduino, ESP32, LLMs (GPT-4o), Deepgram STT and Azure TTS

Duration: 00:01:37
p0p4k/vits2_pytorch
Dec 29, 2025

unofficial vits2-TTS implementation in pytorch

Duration: 00:01:55
wildminder/ComfyUI-VibeVoice
Dec 28, 2025

ComfyUI custom node for the VibeVoice TTS. Expressive, long-form, multi-speaker conversational audio

Duration: 00:01:52
daswer123/xtts-api-server
Dec 27, 2025

A simple FastAPI Server to run XTTSv2

Duration: 00:01:48
domesticatedviking/TextyMcSpeechy
Dec 26, 2025

Easily create Piper text-to-speech models in any voice. Make a text-to-speech model with your own voice recordings, or use thousands of RVC voices. Works offline on a Raspberry pi. Rapidly record custom datasets for any metadata.csv file and listen to your model as it is training.

Duration: 00:01:50
zuoban/tts
Dec 25, 2025

tts 服务

Duration: 00:01:53
alan-ai/alan-sdk-reactnative
Dec 24, 2025

The Self-Coding System for Your App — Alan AI SDK for React Native

Duration: 00:01:51
vilassn/whisper_android
Dec 23, 2025

Offline Speech Recognition with OpenAI Whisper and TensorFlow Lite for Android

Duration: 00:01:56
oscie57/tiktok-voice
Dec 22, 2025

Simple Python script to interact with the TikTok TTS API

Duration: 00:01:46
gexgd0419/NaturalVoiceSAPIAdapter
Dec 21, 2025

Make Azure natural TTS voices accessible to any SAPI 5-compatible application.

Duration: 00:02:06
geekwenjie/SmartJavaAI
Dec 20, 2025

Java免费离线AI算法工具箱,支持人脸识别,活体检测,表情识别、目标检测、实例分割、行人检测、OCR文字识别、车牌识别、表格识别、ASR+TTS、机器翻译等功能,Maven引用即可使用。支持PyTorch、Tensorflow,已集成 Mtcnn、InsightFace、SeetaFace6、YOLOv8~v12、PaddleOCR(PPOCRv5)、Whisper等主流模型

Duration: 00:02:13
AIDC-AI/Pixelle-Video
Dec 19, 2025

AI 全自动短视频引擎 | AI Fully Automated Short Video Engine

Duration: 00:01:54
lucasnewman/f5-tts-mlx
Dec 17, 2025

Implementation of F5-TTS in MLX

Duration: 00:01:46
madroidmaq/mlx-omni-server
Dec 16, 2025

MLX Omni Server is a local inference server powered by Apple's MLX framework, specifically designed for Apple Silicon (M-series) chips. It implements OpenAI-compatible API endpoints, enabling seamless integration with existing OpenAI SDK clients while leveraging the power of local ML inference.

Duration: 00:01:45
zai-org/GLM-TTS
Dec 15, 2025

GLM-TTS: Controllable & Emotion-Expressive Zero-shot TTS with Multi-Reward Reinforcement Learning

Duration: 00:01:38
daniilrobnikov/vits2
Dec 14, 2025

VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech with Adversarial Learning and Architecture Design

Duration: 00:01:37
superstarryeyes/lue
Dec 13, 2025

Terminal eBook Reader with Audiobook-Quality Text-to-Speech — Supports EPUB, PDF, DOCX, HTML, RTF, TXT, and MD.

Duration: 00:01:47
devnen/Chatterbox-TTS-Server
Dec 12, 2025

Self-host the powerful Chatterbox TTS model. This server offers a user-friendly Web UI, flexible API endpoints (incl. OpenAI compatible), predefined voices, voice cloning, and large audiobook-scale text processing. Runs accelerated on NVIDIA (CUDA), AMD (ROCm), and CPU.

Duration: 00:01:39
seungwonpark/melgan
Dec 11, 2025

MelGAN vocoder (compatible with NVIDIA/tacotron2)

Duration: 00:01:50
HA6Bots/Automatic-Youtube-Reddit-Text-To-Speech-Video-Generator-and-Uploader
Dec 10, 2025

A series of 3 programs that will automatically receive scripts from Reddit, allow the user to edit them, then be sent off to a video generator where they will be uploaded to YouTube automatically.

Duration: 00:02:02
lucasjinreal/Kokoros
Dec 09, 2025

Kokoro in Rust. https://huggingface.co/hexgrad/Kokoro-82M Insanely fast, realtime TTS with high quality you ever have.

Duration: 00:01:51
thorstenMueller/Thorsten-Voice
Dec 08, 2025

Thorsten-Voice: A free to use, offline working, high quality german TTS voice should be available for every project without any license struggling.

Duration: 00:01:54
google/voice-builder
Dec 07, 2025

An opensource text-to-speech (TTS) voice building tool

Duration: 00:02:00
soobinseo/Transformer-TTS
Dec 06, 2025

A Pytorch Implementation of "Neural Speech Synthesis with Transformer Network"

Duration: 00:01:58
lobehub/lobe-tts
Dec 05, 2025

Lobe TTS - A high-quality & reliable TTS/STT library for Server and Browser

Duration: 00:01:44
jaywalnut310/glow-tts
Dec 04, 2025

A Generative Flow for Text-to-Speech via Monotonic Alignment Search

Duration: 00:01:54
dlutton/flutter_tts
Dec 03, 2025

Flutter Text to Speech package

Duration: 00:01:51
C-Loftus/QuickPiperAudiobook
Dec 02, 2025

With one command, create a natural-sounding audiobook from a variety of input formats (epub, mobi, txt, PDF, HTML and more!)

Duration: 00:01:51
markovka17/dla
Dec 01, 2025

Deep learning for audio processing

Duration: 00:01:54
stepfun-ai/Step-Audio-EditX
Nov 30, 2025

A powerful 3B-parameter, LLM-based Reinforcement Learning audio edit model excels at editing emotion, speaking style, and paralinguistics, and features robust zero-shot text-to-speech

Duration: 00:00:27
coqui-ai/TTS-papers
Nov 29, 2025

collection of TTS papers

Duration: 00:01:47
cboard-org/cboard
Nov 28, 2025

Augmentative and Alternative Communication (AAC) system with text-to-speech for the browser

Duration: 00:01:52
wangzongming/esp-ai
Nov 27, 2025

The simplest and lowest-cost AI integration solution. If you like this project, please give it a Star~ | 最简单、最低成本的AI接入方案。喜欢本项目的话点个 Star 吧~

Duration: 00:02:02
VRCWizard/TTS-Voice-Wizard
Nov 26, 2025

Speech to Text to Speech. Song now playing. Sends text as OSC messages to VRChat to display on avatar. (STTTS) (Speech to TTS) (VRC STT System) (VTuber TTS)

Duration: 00:01:57
NVIDIA-AI-Blueprints/pdf-to-podcast
Nov 25, 2025

Transform PDFs into AI podcasts for engaging on-the-go audio content.

Duration: 00:01:48
supertone-inc/supertonic
Nov 24, 2025

Lightning-fast, on-device TTS — running natively via ONNX.

Duration: 00:01:45
BandarLabs/gitpodcast
Nov 23, 2025

Convert any git repository into an engaging podcast

Duration: 00:01:46
hujingshuang/MTrans
Nov 22, 2025

Multi-source Translation

Duration: 00:01:49
Tomiinek/Multilingual_Text_to_Speech
Nov 21, 2025

An implementation of Tacotron 2 that supports multilingual experiments with parameter-sharing, code-switching, and voice cloning.

Duration: 00:01:43
lobehub/lobe-vidol
Nov 20, 2025

Lobe Vidol - Making Virtual Idols Accessible for EveryOne

Duration: 00:01:50
OpenBMB/VoxCPM
Nov 19, 2025

VoxCPM: Tokenizer-Free TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning

Duration: 00:02:03
PABannier/bark.cpp
Nov 17, 2025

Suno AI's Bark model in C/C++ for fast text-to-speech generation

Duration: 00:01:45
daswer123/xtts-webui
Nov 16, 2025

Webui for using XTTS and for finetuning it

Duration: 00:01:54
GetStream/Vision-Agents
Nov 15, 2025

Open Vision Agents by Stream. Build Vision Agents quickly with any model or video provider. Uses Stream's edge network for ultra-low latency.

Duration: 00:01:36
Spr-Aachen/Easy-Voice-Toolkit
Nov 14, 2025

一个简易的AI语音工具箱 | A user-friendly audio toolkit for voice recognition, voice transcription, voice conversion etc.

Duration: 00:01:58
aedocw/epub2tts
Nov 13, 2025

Turn an epub or text file into an audiobook

Duration: 00:01:56
lmnt-com/diffwave
Nov 12, 2025

DiffWave is a fast, high-quality neural vocoder and waveform synthesizer.

Duration: 00:01:47
FireRedTeam/FireRedTTS
Nov 11, 2025

An Open-Sourced LLM-empowered Foundation TTS System

Duration: 00:01:52
gabrielmittag/NISQA
Nov 10, 2025

NISQA - Non-Intrusive Speech Quality and TTS Naturalness Assessment

Duration: 00:01:50
High-Logic/Genie-TTS
Nov 09, 2025

GPT-SoVITS ONNX Inference Engine & Model Converter

Duration: 00:02:01
mosecorg/mosec
Nov 08, 2025

A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine

Duration: 00:01:35
nazdridoy/kokoro-tts
Nov 07, 2025

A CLI text-to-speech tool using the Kokoro model, supporting multiple languages, voices (with blending), and various input formats including EPUB books and PDF documents.

Duration: 00:01:38
joey-zhou/xiaozhi-esp32-server-java
Nov 06, 2025

小智ESP32的Java企业级管理平台,提供设备监控、音色定制、角色切换和对话记录管理的前后端及服务端一体化解决方案

Duration: 00:01:42
athena-team/athena
Nov 05, 2025

an open-source implementation of sequence-to-sequence based speech processing engine

Duration: 00:01:48
wxxxcxx/ms-ra-forwarder
Nov 04, 2025

免费的在线文本转语音API

Duration: 00:02:01
ardha27/AI-Waifu-Vtuber
Nov 03, 2025

AI Vtuber for Streaming on Youtube/Twitch

Duration: 00:02:03
Azure-Samples/Cognitive-Speech-TTS
Nov 02, 2025

Microsoft Text-to-Speech API sample code in several languages, part of Cognitive Services.

Duration: 00:01:41
NATSpeech/NATSpeech
Nov 01, 2025

A Non-Autoregressive Text-to-Speech (NAR-TTS) framework, including official PyTorch implementation of PortaSpeech (NeurIPS 2021) and DiffSpeech (AAAI 2022)

Duration: 00:01:46
ttop32/MouseTooltipTranslator
Oct 31, 2025

Mouseover Translate Any Language At Once - Chrome Extension: PDF Translator, EBOOK, EPUB, OCR, TTS, NETFLIX, YOUTUBE DUAL SUBTITLES, GOOGLE DOCS, AI, VIEWER, GMAIL, WRITING, IMAGE, DUAL SUBS, MANGA, HOVER, DICTIONARY, WEBTOON, EDGE, JAPANESE, ENGLISH

Duration: 00:01:55
Artrajz/vits-simple-api
Oct 30, 2025

A simple VITS HTTP API, developed by extending Moegoe with additional features.

Duration: 00:01:47
Edresson/YourTTS
Oct 29, 2025

YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone

Duration: 00:01:46
janvarev/Irene-Voice-Assistant
Oct 28, 2025

Ирина - русский голосовой ассистент для работы оффлайн. Поддерживает скиллы через плагины.

Duration: 00:01:49
Enemyx-net/VibeVoice-ComfyUI
Oct 27, 2025

A comprehensive ComfyUI integration for Microsoft's VibeVoice text-to-speech model, enabling high-quality single and multi-speaker voice synthesis directly within your ComfyUI workflows.

Duration: 00:01:59
PriesiaMioShirakana/DragonianVoice
Oct 26, 2025

多个SVC/TTS的C++推理库

Duration: 00:01:46
semperai/amica
Oct 25, 2025

Amica is an open source interface for interactive communication with 3D characters with voice synthesis and speech recognition.

Duration: 00:02:11
Stypox/dicio-android
Oct 24, 2025

Dicio assistant app for Android

Duration: 00:01:50
Henry-23/VideoChat
Oct 23, 2025

实时语音交互数字人,支持端到端语音方案(GLM-4-Voice - THG)和级联方案(ASR-LLM-TTS-THG)。可自定义形象与音色,无须训练,支持音色克隆,首包延迟低至3s。Real-time voice interactive digital human, supporting end-to-end voice solutions (GLM-4-Voice - THG) and cascaded solutions (ASR-LLM-TTS-THG). Customizable appearance and voice, supporting voice cloning, with initial package delay as low as 3s.

Duration: 00:01:48
spring-media/TransformerTTS
Oct 21, 2025

Transformer TTS: Implementation of a non-autoregressive Transformer based neural network for text to speech.

Duration: 00:01:38
Kyubyong/dc_tts
Oct 20, 2025

A TensorFlow Implementation of DC-TTS: yet another text-to-speech model

Duration: 00:01:54
ictnlp/StreamSpeech
Oct 19, 2025

StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.

Duration: 00:02:01
bawangxx/XZVoice
Oct 18, 2025

Free and open source text-to-speech software

Duration: 00:01:53
hgneng/ekho
Oct 17, 2025

Chinese text-to-speech engine

Duration: 00:01:52
gitmylo/audio-webui
Oct 15, 2025

A webui for different audio related Neural Networks

Duration: 00:01:49
PlayVoice/vits_chinese
Oct 14, 2025

Best practice TTS based on BERT and VITS with some Natural Speech Features Of Microsoft; Support ONNX streaming out!

Duration: 00:01:37
haoheliu/voicefixer
Oct 13, 2025

General Speech Restoration

Duration: 00:02:07
sauravpanda/BrowserAI
Oct 12, 2025

Run local LLMs like llama, deepseek-distill, kokoro and more inside your browser

Duration: 00:02:04
R3gm/SoniTranslate
Oct 11, 2025

Synchronized Translation for Videos. Video dubbing

Duration: 00:01:50
travisvn/openai-edge-tts
Oct 10, 2025

Free, high-quality text-to-speech API endpoint to replace OpenAI, Azure, or ElevenLabs

Duration: 00:02:07
lenML/Speech-AI-Forge
Oct 09, 2025

Speech-AI-Forge is a project developed around TTS generation model, implementing an API Server and a Gradio-based WebUI.

Duration: 00:01:54
coqui-ai/open-speech-corpora
Oct 08, 2025

A list of accessible speech corpora for ASR, TTS, and other Speech Technologies

Duration: 00:01:42
LuckyHookin/edge-TTS-record
Oct 07, 2025

一个可以录制 Microsoft Edge 浏览器的语音合成(TTS)语音并输出为 .wav 音频的(windows平台)工具。

Duration: 00:01:56
edwko/OuteTTS
Oct 06, 2025

Interface for OuteTTS models.

Duration: 00:01:43
voice-cloning-app/Voice-Cloning-App
Oct 05, 2025

A Python/Pytorch app for easily synthesising human voices

Duration: 00:01:52
wwbin2017/bailing
Oct 04, 2025

百聆 是一个类似GPT-4o的语音对话机器人,通过ASR+LLM+TTS实现,集成DeepSeek R1等优秀大模型,时延低至800ms,Mac等低配置也可运行,支持打断

Duration: 00:01:47
neural-maze/ava-whatsapp-agent-course
Oct 03, 2025

Meet Ava, the WhatsApp Agent

Duration: 00:01:46
cosin2077/easyVoice
Oct 02, 2025

开源文本转语音工具,支持超长文本,多角色配音

Duration: 00:01:52
kan-bayashi/ParallelWaveGAN
Oct 01, 2025

Unofficial Parallel WaveGAN (+ MelGAN & Multi-band MelGAN & HiFi-GAN & StyleMelGAN) with Pytorch

Duration: 00:01:48
AlexxIT/YandexStation
Sep 30, 2025

Управление Яндекс.Станцией и другими устройствами умного дома с Алисой из Home Assistant

Duration: 00:01:44