본문 바로가기

audio/audio generation (tts)

(29)
[논문리뷰] High Fidelity Speech Synthesis with Adversarial Networks (ICLR20) 논문제목: High Fidelity Speech Synthesis with Adversarial Networks 저자: Mikołaj Binkowski, Jeff Donahue, Sander Dieleman, Aidan Clark, Erich Elsen, Norman Casagrande, Luis C. Cobo, Karen Simonyan 소속: Imperial College London, DeepMind 발표: ICLR 2020 논문: https://arxiv.org/abs/1909.11646 코드: https://github.com/mbinkowski/DeepSpeechDistances (Frechet DeepSpeech Distance) - GAN-TTS, 말그대로 GAN을 사용한 TTS(Text-..
[논문리뷰] MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis (NeurIPS19) 논문제목: MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis 저자: Kundan Kumar, Rithesh Kumar, Thibault de Boissiere, Lucas Gestin, Wei Zhen Teoh, Jose Sotelo, Alexandre de Brebisson, Yoshua Bengio, Aaron Courville 소속: Lyrebird AI, Mila, University of Montreal 발표: NeurIPS 2019 논문: https://arxiv.org/abs/1910.06711 코드: https://github.com/descriptinc/melgan-neurips 오디오샘플: https:/..
[논문리뷰] Diff-TTS: A Denoising Diffusion Model for Text-to-Speech (INTERSPEECH21) 제목: Diff-TTS: A Denoising Diffusion Model for Text-to-Speech 저자: Myeonghun Jeong, Hyeongju Kim, Sung Jun Cheon, Byoung Jin Choi, Nam Soo Kim 소속: Seoul National University, Neosapience 발표: INTERSPEECH 2021 논문: https://arxiv.org/abs/2104.01409 웹페이지: https://jmhxxi.github.io/Diff-TTS-demo/index.html - 최근 Diffusion Model을 이용한 오디오 생성기법들[Chen21][Kong21]이 소개됨. 그런데 이 논문들에서는 숫자정도를 조건으로 넣어서 생성하는 것까지는 제안하였..
[논문리뷰] DiffWave: A Versatile Diffusion Model for Audio Synthesis (ICLR21) 제목: DIFFWAVE: A Versatile Diffusion Model for Audio Synthesis 저자: Zhifeng Kong, Wei Ping, Jiaji Huang, Kexin Zhao, Bryan Catanzaro 소속: UCSD, NVIDIA, Baidu Research 발표: ICLR 2021 논문: https://arxiv.org/abs/2009.09761 웹페이지: https://diffwave-demo.github.io/ - Diffusion model을 이용하여 오디오를 생성하는 논문. ICLR21에 [Chen21]과 동시에 발표되었음. 두 논문 모두 비슷한 방법론을 사용하지만 소소한 차이점이 있어서 비교하면서 읽으면 재미있음. - 이 논문은 오디오 생성에 WaveNet[O..
[논문리뷰] WaveGrad: Estimating Gradients for Waveform Generation (ICLR21) 제목: WAVEGRAD: Estimating Gradients for Waveform Generation 저자: Nanxin Chen, Yu Zhang, Heiga Zen, Ron J. Weiss, Mohammad Norouzi, William Chan 소속: Johns Hopkins University, Google Research, Brain Team 발표: ICLR 2021 논문 및 웹페이지: https://wavegrad.github.io/ - 요즘 이동네에서 핫한 생성 기술 score matching & diffusion probabilistic models을 바탕으로 audio를 만들어냄. - [Ho20]에서 제안한 discrete refinement step index로도 만들어보고 노이즈레..