본문 바로가기

audio

(39)
[논문리뷰] FloWaveNet: A Generative Flow for Raw Audio (ICML19) 논문제목: FloWaveNet: A Generative Flow for Raw Audio 저자: Sungwon Kim, Sang-gil Lee, Jongyoon Song, Jaehyeon Kim, Sungroh Yoon 소속: Seoul National University, Kakao Corporation 발표: ICML 2019 논문: https://proceedings.mlr.press/v97/kim19b.html 오디오샘플: https://ksw0306.github.io/flowavenet-demo/ 코드: https://github.com/ksw0306/FloWaveNet - FloWaveNet - flow기반 보코더. 따라서 하나의 네트워크, 로스만으로도 훈련이 가능하고 병렬로 빠른 생성이 가능..
[논문리뷰] WaveGlow: A Flow-Based Generative Network for Speech Synthesis (ICASSP19) 제목: WaveGlow: A Flow-Based Generative Network for Speech Synthesis 저자: Ryan Prenger, Rafael Valle, Bryan Catanzaro 소속: NVIDIA Corporation 발표: ICASSP 2019 논문: https://ieeexplore.ieee.org/document/8683143 오디오샘플: https://nv-adlr.github.io/WaveGlow 코드: https://github.com/NVIDIA/waveglow - WaveGlow - flow 방식의 보코더(멜스펙트로그램으로부터 오디오를 생성) - Glow의 구조를 기본으로 하고 여기에 WaveNet의 아이디어를 결합했음. flow기반이기 때문에 엄청 빠르게 생성..
[논문리뷰] FastSpeech: Fast, Robust and Controllable Text to Speech (NeurIPS19) 제목: FastSpeech: Fast, Robust and Controllable Text to Speech 저자: Yi Ren, Yangjun Ruan, Xu Tan, Tao Qin, Sheng Zhao, Zhou Zhao, Tie-Yan Liu 소속: Zhejiang University, Microsoft Research, Microsoft STC Asia 발표: NeurIPS 2019 논문: https://arxiv.org/abs/1905.09263 오디오샘플: https://speechresearch.github.io/fastspeech/ - FastSpeech - 일단 기본적인 Transformer [Li19] 모델을 훈련시켜 teacher 모델을 만듬. 이 모델을 이용하여 attention a..
[논문리뷰] Neural Speech Synthesis with Transformer Network (AAAI19) 제목: Neural Speech Synthesis with Transformer Network 저자: Naihan Li, Shujie Liu, Yanqing Liu, Sheng Zhao, Ming Liu, Ming Zhou 소속: University of Electronic Science and Technology of China, Microsoft Research Asia, Microsoft STC Asia, CETC Big Data Research Institute 발표: AAAI 2019 논문: https://arxiv.org/abs/1809.08895 오디오샘플: https://neuraltts.github.io/transformertts/ - Transformer TTS - Tacotron2[S..
[논문리뷰] Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions (ICASSP18) 논문: Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions 저자: Jonathan Shen, Ruoming Pang, Ron J. Weiss, Mike Schuster, Navdeep Jaitly, Zongheng Yang, Zhifeng Chen, Yu Zhang, Yuxuan Wang, RJ Skerry-Ryan, Rif A. Saurous, Yannis Agiomyrgiannakis, Yonghui Wu 소속: Google, University of California, Berkeley 발표: ICASSP 2018 논문: https://arxiv.org/abs/1712.05884 오디오샘플: https://goog..
[논문리뷰] Char2Wav: End-to-End Speech Synthesis (ICLR17 Workshop) 논문제목: Char2Wav: End-to-End Speech Synthesis 저자: Jose Sotelo, Soroush Mehri, Kundan Kumar, Joao Felipe Santos, Kyle Kastner, Aaron Courville, Yoshua Bengio 소속: Universite de Montreal, IIT Kanpur, INRS-EMT 발표: ICLR 2017 Workshop 코드: https://github.com/sotelo/parrot 오디오샘플: http://josesotelo.com/speechsynthesis/ - 이때 당시까지 찾아보기 힘들었던 End-to-end speech synthesis 시스템을 제안한 논문 - 전체적인 시스템은 어텐션+RNN을 사용한 인코더..
[논문리뷰] SampleRNN: An Unconditional End-to-End Neural Audio Generation Model (ICLR17) 논문제목: SampleRNN: An Unconditional End-to-End Neural Audio Generation Model 저자: Soroush Mehri, Kundan Kumar, Ishaan Gulrajani, Rithesh Kumar, Shubham Jain, Jose Sotelo Aaron Courville, Yoshua Bengio 소속: University of Montreal, IIT Kampur, SSNCE 발표: ICLR 2017 논문: https://arxiv.org/abs/1612.07837 코드: https://github.com/soroushmehr/sampleRNN_ICLR2017 오디오 샘플: https://soundcloud.com/samplernn/sets - S..
[논문리뷰] Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning (ICLR18) 제목: Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning 저자: Wei Ping, Kainan Peng, Andrew Gibiansky, Sercan O. Arık, Ajay Kannan, Sharan Narang, Jonathan Raiman, John Miller 소속: Baidu Research 발표: ICLR 2018 논문: https://arxiv.org/abs/1710.07654 - Deep Voice 시리즈의 3번째 버전 - fully-convolutional attention-based TTS system. 따라서 이전 모델들보다 훨씬 빠름. 그러면서도 오디오 품질도 대등하게 만듬. - LibriSpeech ..