본문 바로가기

audio/audio generation (tts)

(29)
[논문리뷰] WaveGlow: A Flow-Based Generative Network for Speech Synthesis (ICASSP19) 제목: WaveGlow: A Flow-Based Generative Network for Speech Synthesis 저자: Ryan Prenger, Rafael Valle, Bryan Catanzaro 소속: NVIDIA Corporation 발표: ICASSP 2019 논문: https://ieeexplore.ieee.org/document/8683143 오디오샘플: https://nv-adlr.github.io/WaveGlow 코드: https://github.com/NVIDIA/waveglow - WaveGlow - flow 방식의 보코더(멜스펙트로그램으로부터 오디오를 생성) - Glow의 구조를 기본으로 하고 여기에 WaveNet의 아이디어를 결합했음. flow기반이기 때문에 엄청 빠르게 생성..
[논문리뷰] FastSpeech: Fast, Robust and Controllable Text to Speech (NeurIPS19) 제목: FastSpeech: Fast, Robust and Controllable Text to Speech 저자: Yi Ren, Yangjun Ruan, Xu Tan, Tao Qin, Sheng Zhao, Zhou Zhao, Tie-Yan Liu 소속: Zhejiang University, Microsoft Research, Microsoft STC Asia 발표: NeurIPS 2019 논문: https://arxiv.org/abs/1905.09263 오디오샘플: https://speechresearch.github.io/fastspeech/ - FastSpeech - 일단 기본적인 Transformer [Li19] 모델을 훈련시켜 teacher 모델을 만듬. 이 모델을 이용하여 attention a..
[논문리뷰] Neural Speech Synthesis with Transformer Network (AAAI19) 제목: Neural Speech Synthesis with Transformer Network 저자: Naihan Li, Shujie Liu, Yanqing Liu, Sheng Zhao, Ming Liu, Ming Zhou 소속: University of Electronic Science and Technology of China, Microsoft Research Asia, Microsoft STC Asia, CETC Big Data Research Institute 발표: AAAI 2019 논문: https://arxiv.org/abs/1809.08895 오디오샘플: https://neuraltts.github.io/transformertts/ - Transformer TTS - Tacotron2[S..
[논문리뷰] Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions (ICASSP18) 논문: Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions 저자: Jonathan Shen, Ruoming Pang, Ron J. Weiss, Mike Schuster, Navdeep Jaitly, Zongheng Yang, Zhifeng Chen, Yu Zhang, Yuxuan Wang, RJ Skerry-Ryan, Rif A. Saurous, Yannis Agiomyrgiannakis, Yonghui Wu 소속: Google, University of California, Berkeley 발표: ICASSP 2018 논문: https://arxiv.org/abs/1712.05884 오디오샘플: https://goog..
[논문리뷰] Char2Wav: End-to-End Speech Synthesis (ICLR17 Workshop) 논문제목: Char2Wav: End-to-End Speech Synthesis 저자: Jose Sotelo, Soroush Mehri, Kundan Kumar, Joao Felipe Santos, Kyle Kastner, Aaron Courville, Yoshua Bengio 소속: Universite de Montreal, IIT Kanpur, INRS-EMT 발표: ICLR 2017 Workshop 코드: https://github.com/sotelo/parrot 오디오샘플: http://josesotelo.com/speechsynthesis/ - 이때 당시까지 찾아보기 힘들었던 End-to-end speech synthesis 시스템을 제안한 논문 - 전체적인 시스템은 어텐션+RNN을 사용한 인코더..
[논문리뷰] Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning (ICLR18) 제목: Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning 저자: Wei Ping, Kainan Peng, Andrew Gibiansky, Sercan O. Arık, Ajay Kannan, Sharan Narang, Jonathan Raiman, John Miller 소속: Baidu Research 발표: ICLR 2018 논문: https://arxiv.org/abs/1710.07654 - Deep Voice 시리즈의 3번째 버전 - fully-convolutional attention-based TTS system. 따라서 이전 모델들보다 훨씬 빠름. 그러면서도 오디오 품질도 대등하게 만듬. - LibriSpeech ..
[논문리뷰] Deep Voice 2: Multi-Speaker Neural Text-to-Speech (NeurIPS17) 제목: Deep Voice 2: Multi-Speaker Neural Text-to-Speech 저자: Sercan Ö. Arık, Gregory Diamos, Andrew Gibiansky, John Miller, Kainan Peng, Wei Ping, Jonathan Raiman, Yanqi Zhou 소속: Baidu Research 발표: NeurIPS 2017 논문: https://arxiv.org/abs/1705.08947 - Deep Voice 2 - 싱글 스피커 TTS모델인 Deep Voice 1[Arik17]의 멀티 스피커 버전. - 스피커 임베딩을 훈련시켜서 모델 여기저기에 넣어서 완성함. - 이것만 한 것은 아니고 Deep Voice 1의 여러 부분에 손을 데서 오디오 품질도 더 높임..
[논문리뷰] Tacotron: Towards End-to-End Speech Synthesis (INTERSPEECH17) 제목: TACOTRON: Towards End-to-End Speech Synthesis 저자: Yuxuan Wang, RJ Skerry-Ryan, Daisy Stanton, Yonghui Wu, Ron J. Weiss, Navdeep Jaitly, Zongheng Yang, Ying Xiao, Zhifeng Chen, Samy Bengio, Quoc Le, Yannis Agiomyrgiannakis, Rob Clark, Rif A. Saurous 소속: Google 발표: INTERSPEECH 2017 논문: https://arxiv.org/abs/1703.10135 오디오샘플: https://google.github.io/tacotron/ - Tacotron - 정말로 처음부터 끝까지 한 번에 다하고..