Undergraduate student at HUST, research intern at CMU WAVLab, member of Fish Audio.

Hi, my name is Yifan Cheng(程一帆). I’m a third year undergraduate student at HUST. I have the privilege of being a research intern at CMU WAVLab advised by Prof. Shinji Watanabe and Senior PhD Jiatong Shi. Also, I’m fortunate to be a member of Fish Audio team to develop the SOTA open-source TTS system Fish-Speech. Prior to this, I worked at Meituan as a speech synthesis intern for 4 months.

My research interests include controllable TTS system, voice agent and AI mates. Our project Fish-Speech has gained significant recognition in the open-source community.

I’m currently seeking 26 Fall PhD and work opportunities, welcome any contacts! whaledolphin666@gmail.com

🔥 News

2025.09: 🎉🎉 Our paper ARECHO is accepted by NIPS2025 as spot light as second author!!! Really thankful to my advisor Shinji Watanabe and his senior Jiatong Shi and my co-authors for their guidence and support.
2025.07: 🎉🎉 I officially join Hanabi AI (Fish Audio) as a speech AI engineer, I’m responsible for the Audio Understanding, Audio Qestion Answer and next generation dialogue model development.
2025.05: 🎉🎉 OpenAudio-S1 is now open-sourced! It support emotion control and naturl language control! Beter performance model at fish audio.
2025.05: 🎉🎉 Our paper MIKU-PAL is accepted by INTERSPEECH2025!!!
2024.12: 🎉🎉 It’s my great honor to join WAVLab as a research intern, advised by Prof. Shinji Watanabe.
2024.12: 🎉🎉 Fish-Speech v1.5 is now available! Our open-source TTS system has garnered over 19k stars on GitHub, and rank No.2 on TTS arena.

📝 Publications

NIPS2025 SPOTLIGHT

ARECHO: Autoregressive Evaluation via Chain-Based Hypothesis Optimization for Speech Multi-Metric Estimation

Jiatong Shi, Yifan Cheng, Bo-Hao Su, Hye-jin Shim, Jinchuan Tian, Samuele Cornell, Yiwen Zhao, Siddhant Arora, Shinji Watanabe

Project

INTERSPEECH2025

MIKU-PAL: An Automated and Standardized Multi-Modal Method for Speech Paralinguistic and Affect Labeling

Yifan Cheng, Ruoyi Zhang, Jiatong Shi

Dataset

arxiv

Fish-Speech: Leveraging Large Language Models for Advanved Multilinguak Text-to-Speech Synthesis

Shijia Liao, Yuxuan Wang, Tianyu Li, Yifan Cheng, Ruoyi Zhang, Rongzhi Zhou, Yijin Xing

Project

SOTA open-source TTS system, use dual-AR and FF-GAN. Garnered over 21k stars on github.

Website

📖 Educations

2024.12 - now, Being a research intern at CMU WAVLab, advised by Prof. Shinji Watanabe and Senior PhD Jiatong Shi.
2022.09 - now, Majoring in Software Engineering at Huazhong University of Science and Technology (HUST).

💻 Internships

2027.07 - now, Hanabi AI(Fish Audio), Remote.
2024.05 - 2024.09, Meituan, Beijing China.