본문 바로가기

audio/audio generation (tts)

(29)
[논문리뷰] Parallel Tacotron: Non-Autoregressive and Controllable TTS (ICASSP21) 제목: Parallel Tacotron: Non-Autoregressive and Controllable TTS 저자: Isaac Elias, Heiga Zen, Jonathan Shen, Yu Zhang, Ye Jia, Ron J. Weiss, Yonghui Yu 소속: Google 발표: ICASSP 2021 오디오샘플: https://google.github.io/tacotron/publications/parallel_tacotron/ - Parallel Tacotron - 이름대로 Tacotron에 기반하였지만 non-autoregressive방식의 neural TTS. - variational autoencoder-based residual encoder를 이용하여 speech의 잔잔한 특성을 ..
[논문리뷰] Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech (ICML21) 제목: Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech 저자: Cadim Popov, Ivan Vovk, Vladimir Gogoryan, Tasnima Sadekova, Mikhail Kudinov 소속: Huawei Noah's Ark Lab, Higher School of Economics 발표: ICML 2021 논문: https://arxiv.org/abs/2105.06337 코드: https://github.com/huawei-noah/speech-backbones 오디오샘플: https://grad-tts.github.io/ - Grad-TTS - 멜스펙트로그램을 생성하는 디코더에서 diffusion model을 사용하는 모델. -..
[논문리뷰] Non-Autoregressive Neural Text-to-Speech (ICML20) 제목: Non-Autoregressive Neural Text-to-Speech 저자: Kainan Peng, Wei Ping, Zhao Song, Kixin Zhao 소속: Baidu Research 발표: ICML 2020 논문: https://arxiv.org/abs/1905.08459 오디오샘플: https://parallel-neural-tts-demo.github.io/ - ParaNet + WaveVAE - Baidu Research에서 DeepVoice3[Ping18], ClariNet[Ping19] 모델을 만든 후 속도 개선을 목표로 만든 모델. 따라서 전체 구조는 앞의 두 논문의 구조와 유사성이 많음. - text -> spectrogram 부분인 ParaNet은 DeepVoice3에서..
[논문리뷰] FastSpeech2: Fast and High-Quality End-to-End Text to Speech (ICLR21) 제목: FastSpeech2: Fast and High-Quality End-to-End Text to Speech 저자: Yi Ren, Chenxu Hu, Xu Tan, Tao Qin, Sheng Zhao, Zhou Zhao, Tie-Yan Liu 소속: Zhejiang University, Microsoft Research Asia, Microsoft Azure Speech 발표: ICLR 2021 논문: https://arxiv.org/abs/2006.04558 오디오샘플: https://speechresearch.github.io/fastspeech2/ - FastSpeech 2와 FastSpeech 2s (아이폰이여 머여) - FastSpeech[Ren19]이 확실히 빠른 좋은 모델이긴 하지만 ..
[논문리뷰] End-to-End Adversarial Text-to-Speech (ICLR21) 제목: End-to-End Adversarial Text-to-Speech 저자: Jeff Donahue, Sander Dieleman, Mikolaj Binkowski, Erich Elsen, Karen Simonyan 소속: DeepMind 발표: ICLR 2021 논문: https://arxiv.org/abs/2006.03575 오디오샘플: https://www.deepmind.com/publications/end-to-end-adversarial-text-to-speech - EATS(End-to-end Adversarial Text-to-Speech) - 텍스트 입력에서부터 오디오 생성까지 end-to-end 방식으로 생성되는 TTS시스템 - 일단 자기팀에서 만든 보코더 GAN-TTS[Binko..
[논문리뷰] Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search (NeurIPS20) 제목: Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search 저자: Jaehyeon Kim, Sungwon Kim, Jungil Kong, Sungroh Yoon 소속: Kakao Enterprise, Seoul National University 발표: NeurIPS 2020 논문: https://arxiv.org/abs/2005.11129 오디오샘플: https://jaywalnut310.github.io/glow-tts-demo/index.html 코드: https://github.com/jaywalnut310/glow-tts - Glow-TTS - flow를 사용하여 빠르게 TTS를 해보자. - 그런데 별도의 a..
[논문리뷰] WaveFlow: A Compact Flow-based Model for Raw Audio (ICML20) 논문: WaveFlow: A Compact Flow-based Model for Raw Audio 저자: Wei Ping, Kainan Peng, Kexin Zhao, Zhao Song 소속: Baidu Research 발표: ICML 2020 논문: https://proceedings.mlr.press/v119/ping20a.html 오디오샘플: https://waveflow-demo.github.io/ 코드: https://github.com/PaddlePaddle/Parakeet/tree/develop/examples/waveflow - WaveFlow - 기존 Flow방식 보코더들이 데이터를 1D로만 squeeze해서 다루기 때문에 비효율적이라고 여김. 2D로 표현하고 dilated 2D conv..
[논문리뷰] FloWaveNet: A Generative Flow for Raw Audio (ICML19) 논문제목: FloWaveNet: A Generative Flow for Raw Audio 저자: Sungwon Kim, Sang-gil Lee, Jongyoon Song, Jaehyeon Kim, Sungroh Yoon 소속: Seoul National University, Kakao Corporation 발표: ICML 2019 논문: https://proceedings.mlr.press/v97/kim19b.html 오디오샘플: https://ksw0306.github.io/flowavenet-demo/ 코드: https://github.com/ksw0306/FloWaveNet - FloWaveNet - flow기반 보코더. 따라서 하나의 네트워크, 로스만으로도 훈련이 가능하고 병렬로 빠른 생성이 가능..