본문 바로가기

audio

(39)
[논문리뷰] Deep Voice 2: Multi-Speaker Neural Text-to-Speech (NeurIPS17) 제목: Deep Voice 2: Multi-Speaker Neural Text-to-Speech 저자: Sercan Ö. Arık, Gregory Diamos, Andrew Gibiansky, John Miller, Kainan Peng, Wei Ping, Jonathan Raiman, Yanqi Zhou 소속: Baidu Research 발표: NeurIPS 2017 논문: https://arxiv.org/abs/1705.08947 - Deep Voice 2 - 싱글 스피커 TTS모델인 Deep Voice 1[Arik17]의 멀티 스피커 버전. - 스피커 임베딩을 훈련시켜서 모델 여기저기에 넣어서 완성함. - 이것만 한 것은 아니고 Deep Voice 1의 여러 부분에 손을 데서 오디오 품질도 더 높임..
[논문리뷰] Tacotron: Towards End-to-End Speech Synthesis (INTERSPEECH17) 제목: TACOTRON: Towards End-to-End Speech Synthesis 저자: Yuxuan Wang, RJ Skerry-Ryan, Daisy Stanton, Yonghui Wu, Ron J. Weiss, Navdeep Jaitly, Zongheng Yang, Ying Xiao, Zhifeng Chen, Samy Bengio, Quoc Le, Yannis Agiomyrgiannakis, Rob Clark, Rif A. Saurous 소속: Google 발표: INTERSPEECH 2017 논문: https://arxiv.org/abs/1703.10135 오디오샘플: https://google.github.io/tacotron/ - Tacotron - 정말로 처음부터 끝까지 한 번에 다하고..
[논문리뷰] Deep Voice: Real-time Neural Text-to-Speech (ICML17) 논문제목: Deep Voice: Real-time Neural Text-to-Speech 저자: Sercan O Arık, Mike Chrzanowski, Adam Coates, Gregory Diamos, Andrew Gibiansky, Yongguo Kang, Xian Li, John Miller, Andrew Ng, Jonathan Raiman, Shubho Sengupta, Mohammad Shoeybi 소속: Baidu Research 논문: https://arxiv.org/abs/1702.07825 발표: ICML 2017 - Deep Voice 첫번째 버전 - TTS에서 구성성분을 5개의 블럭으로 구성하고 각각을 모두 NN으로 구현함. 여기서 phoneme boundary segmentati..
[논문리뷰] Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech (ICML21) 논문제목: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech 저자: Jaehyeon Kim, Jungil Kong, Juhee Son 소속: Kakao Enterprise, KAIST 발표: ICML 2021 논문: https://arxiv.org/abs/2106.06103 코드: https://github.com/jaywalnut310/vits 오디오샘플: https://jaywalnut310.github.io/vits-demo/index.html - VITS(Variational Inference with adversarial learning for end-to-end Text-to-Sp..
[논문리뷰] Parallel WaveGAN: A Fast Waveform Generation Model based on GANs with Multi-Resolution Spectrogram (ICASSP20) 논문제목: Parallel WaveGAN: A Fast Waveform Generation Model based on GANs with Multi-Resolution Spectrogram 저자: Ryuichi Yamamoto, Eunwoo Song, Jae-Min Kim 소속: LINE Corp, NAVER Corp 발표: ICASSP 2020 논문: https://arxiv.org/abs/1910.11480 오디오샘플: https://r9y9.github.io/projects/pwg/ - Parallel WaveGAN - WaveNet[Oord16]을 빠르게 만드는 또 하나의 방법. - 아예 non-autoregressive WaveNet을 GAN을 이용하여 훈련시킴. - 기본적인 adversarial..
[논문리뷰] ClariNet: Parallel Wave Generation in End-to-End Text-to-Speech (ICLR19) 논문제목: ClariNet: Parallel Wave Generation in End-to-End Text-to-Speech 저자: Wei Ping, Kainan Peng, Jitong Chen 소속: Baidu Research 발표: ICLR 2019 논문: https://arxiv.org/abs/1807.07281 오디오샘플: https://clarinet-demo.github.io/ - ClariNet - [Oord18] 보다 좀 더 단순한 방식으로도 WaveNet[Oord16]의 생성 속도를 높일 수 있다 것을 보임. - WaveNet의 아웃풋과 IAF의 아웃풋을 단순 가우시안으로 모델링해도 상관없었음. 이렇게 하면 distill할 때 샘플링하지 않고 closed-form으로도 가능함. - 사실 ..
[논문리뷰] Parallel WaveNet: Fast High-Fidelity Speech Synthesis (ICML18) 논문제목: Parallel WaveNet: Fast High-Fidelity Speech Synthesis 저자: Aaron van den Oord, Yazhe Li, Igor Babuschkin, Karen Simonyan, Oriol Vinyals, Koray Kavukcuoglu, George van den Driessche, Edward Lockhart, Luis C. Cobo, Florian Stimberg, Norman Casagrande, Dominik Grewe, Seb Noury, Sander Dieleman, Erich Elsen, Nal Kalchbrenner, Heiga Zen, Alex Graves, Helen King, Tom Walters, Dan Belov, Demis Has..
[논문리뷰] GAN Vocoder: Multi-Resolution Discriminator Is All You Need (INTERSPEECH21) 논문제목: GAN Vocoder: Multi-Resolution Discriminator Is All You Need 저자: Jaeseong You, Dalhyun Kim, Gyuhyeon Nam, Geumbyeol Hwang, Gyeongsu Chae 소속: MoneyBrain Inc 발표: INTERSPEECH 2021 논문: https://arxiv.org/abs/2103.05236 오디오샘플: https://deepbrainai-research.github.io/gan-vocoder/ - 요즘 GAN을 사용한 보코더들이 이렇게 잘 되고 있는데 그 이유가 뭘까? - 혹시 multi-resolution discriminator를 사용하기 때문이 아닐까? - 이런저런 generator들을 만들어서 실험..