Whiting School of Engineering,
Johns Hopkins University, Baltimore
Email: hwang258@jhu.edu
Phone: +1 667-354-9059
I am a PhD candidate at Johns Hopkins University and expect to graduate in 2026. Before that, I received my Bachelor's Degree from Tsinghua University in 2019 and my Master's Degree from Peking University in 2022. My research interest majorly lies in AI for speech and audio signal processing, encompassing audio generation tasks such as source separation and text-to-speech synthesis, as well as audio understanding tasks like audio
classification and captioning. Moreover, I have conducted research or held internship positions at leading companies including Meta, Microsoft, Tencent, Amazon, and Zoom.
(Note: I am actively looking for any industry opportunities. Please feel free to contact me.)
December, 2025: SAM Audio launched!
May, 2025: I have got 1000 citations in the Google Scholar!
Dec, 2024: Our TASLP paper "Diffsound: Discrete Diffusion Model for Text-to-Sound Generation" has been selected for the 2024 IEEE SPS Young Author Best Paper Award!
Sep, 2024: Our INTERSPEECH paper "Noise-robust Speech Separation with Fast Generative Correction" has been nominated for the Best Student Paper Award and the Best Paper Award (5 out of 1,030 accepted papers)!
May, 2022: I have got Outstanding Graduate Student & Thesis Award of Peking University!
April, 2022: I have got 100 citations in the Google Scholar!
Machine Learning, Audio Processing, Speech Processing, Artificial Intelligence
May 2025 - December 2025
Meta FAIR, AudioBox Team, New York, USA, Research Scientist Intern.
Supervisors: Wei-Ning Hsu and Bowen Shi
August 2022 - Now
Johns Hopkins University, Center for Language and Speech Processing (CLSP), Baltimore, USA, Research Assistant.
Supervisors: Najim Dehak, Laureano Moro-Velázquez and Jesús Villalba
May 2024 - August 2024
Tencent AI Lab, Speech Group, Bellevue, USA, Intern.
Supervisors: Meng Yu and Dong Yu
December 2022 - December 2023
Amazon General Intelligence (AGI), Speech Team, Baltimore, USA, Student Researcher.
Supervisors: Venkatesh Ravichandran and Milind Rao
February 2022 - May 2022
Microsoft STCA, NLP Group, Beijing, China, Intern.
Supervisors: Linjun Shou and Ming Gong
May 2020 - November 2021
Tencent AI Lab, Speech Group, Shenzhen, China, Intern.
Supervisors: Bo Wu and Chao Weng
August 2019 - July 2022
Peking University, ADSP Lab, Shenzhen, China, master student.
Supervisor: Yuexian Zou
Co-author: Wenwu Wang
July 2019 - September 2019
Ubtech Robotics Inc., Speech Group, Shenzhen, China, Intern.
Supervisor: Dongyan Huang
July 2018 - September 2018
University of California Berkeley, California Path Lab, Berkeley, USA, Summer Research.
Supervisor: Masayoshi Tomizuka
Bowen Shi*, Andros Tjandra*, John Hoffman*, Helin Wang*, Yi-Chiao Wu*, Luya Gao*, Julius Richter, Matt Le, Apoorv Vyas, Sanyuan Chen, Christoph Feichtenhofer, Piotr Dollár, Wei-Ning Hsu, Ann Lee
SAM Audio: Segment Anything in Audio
Preprint, 2025.
[Code]
Helin Wang*, Jiarui Hai*, Dading Chong, Karan Thakkar, Tiantian Feng, Dongchao Yang, Junhyeok Lee, Thomas Thebaud, Laureano Moro Velazquez, Jesus Villalba, Zengyi Qin, Shrikanth Narayanan, Mounya Elhiali, Najim Dehak
Capspeech: Enabling downstream applications in style-captioned text-to-speech
Preprint, 2025.
[Code]
Helin Wang, Jiarui Hai, Dongchao Yang, Chen Chen, Kai Li, Junyi Peng, Thomas Thebaud, Laureano Moro Velazquez, Jesus Villalba, Najim Dehak
SoloSpeech: Enhancing Intelligibility and Quality in Target Speech Extraction through a Cascaded Generative Pipeline
Preprint, 2025.
[Code]
Helin Wang, Meng Yu, Jiarui Hai, Chen Chen, Yuchen Hu, Rilin Chen, Najim Dehak, Dong Yu
SSR-Speech: Towards Stable, Safe and Robust Zero-shot Text-based Speech Editing and Synthesis
ICASSP, 2025.
[Code]
Helin Wang*, Jiarui Hai*, Yen-Ju Lu, Karan Thakkar, Mounya Elhilali, Najim Dehak
SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer
ICASSP, 2025.
[Code]
Helin Wang, Jesus Villalba, Laureano Moro-Velazquez, Jiarui Hai, Thomas Thebaud, Najim Dehak
Noise-robust Speech Separation with Fast Generative Correction
Interspeech, 2024.
[Code]
Dading Chong*,Helin Wang*, Peilin Zhou, Qingcheng Zeng
Masked Spectrogram Prediction For Self- Supervised Audio Pre-Training
ICASSP, 2023.
[Code]
2025/02 - 2025/05, Teaching Assistant, Johns Hopkins University, Baltimore, USA:
EN.520.439/659: Machine Learning for Medical Applications in Spring 2025 at the Department of Electrical and Computer Engineering
2024/02 - 2024/05, Teaching Assistant, Johns Hopkins University, Baltimore, USA:
EN.520.123: Computational Modeling for Electrical and Computer Engineering in Spring 2024 at the Department of Electrical and Computer Engineering