본문 바로가기

전체 글

(41)
[논문리뷰] WaveFlow: A Compact Flow-based Model for Raw Audio (ICML20) 논문: WaveFlow: A Compact Flow-based Model for Raw Audio 저자: Wei Ping, Kainan Peng, Kexin Zhao, Zhao Song 소속: Baidu Research 발표: ICML 2020 논문: https://proceedings.mlr.press/v119/ping20a.html 오디오샘플: https://waveflow-demo.github.io/ 코드: https://github.com/PaddlePaddle/Parakeet/tree/develop/examples/waveflow - WaveFlow - 기존 Flow방식 보코더들이 데이터를 1D로만 squeeze해서 다루기 때문에 비효율적이라고 여김. 2D로 표현하고 dilated 2D conv..
[논문리뷰] FloWaveNet: A Generative Flow for Raw Audio (ICML19) 논문제목: FloWaveNet: A Generative Flow for Raw Audio 저자: Sungwon Kim, Sang-gil Lee, Jongyoon Song, Jaehyeon Kim, Sungroh Yoon 소속: Seoul National University, Kakao Corporation 발표: ICML 2019 논문: https://proceedings.mlr.press/v97/kim19b.html 오디오샘플: https://ksw0306.github.io/flowavenet-demo/ 코드: https://github.com/ksw0306/FloWaveNet - FloWaveNet - flow기반 보코더. 따라서 하나의 네트워크, 로스만으로도 훈련이 가능하고 병렬로 빠른 생성이 가능..
[논문리뷰] WaveGlow: A Flow-Based Generative Network for Speech Synthesis (ICASSP19) 제목: WaveGlow: A Flow-Based Generative Network for Speech Synthesis 저자: Ryan Prenger, Rafael Valle, Bryan Catanzaro 소속: NVIDIA Corporation 발표: ICASSP 2019 논문: https://ieeexplore.ieee.org/document/8683143 오디오샘플: https://nv-adlr.github.io/WaveGlow 코드: https://github.com/NVIDIA/waveglow - WaveGlow - flow 방식의 보코더(멜스펙트로그램으로부터 오디오를 생성) - Glow의 구조를 기본으로 하고 여기에 WaveNet의 아이디어를 결합했음. flow기반이기 때문에 엄청 빠르게 생성..
[논문리뷰] FastSpeech: Fast, Robust and Controllable Text to Speech (NeurIPS19) 제목: FastSpeech: Fast, Robust and Controllable Text to Speech 저자: Yi Ren, Yangjun Ruan, Xu Tan, Tao Qin, Sheng Zhao, Zhou Zhao, Tie-Yan Liu 소속: Zhejiang University, Microsoft Research, Microsoft STC Asia 발표: NeurIPS 2019 논문: https://arxiv.org/abs/1905.09263 오디오샘플: https://speechresearch.github.io/fastspeech/ - FastSpeech - 일단 기본적인 Transformer [Li19] 모델을 훈련시켜 teacher 모델을 만듬. 이 모델을 이용하여 attention a..
[논문리뷰] Neural Speech Synthesis with Transformer Network (AAAI19) 제목: Neural Speech Synthesis with Transformer Network 저자: Naihan Li, Shujie Liu, Yanqing Liu, Sheng Zhao, Ming Liu, Ming Zhou 소속: University of Electronic Science and Technology of China, Microsoft Research Asia, Microsoft STC Asia, CETC Big Data Research Institute 발표: AAAI 2019 논문: https://arxiv.org/abs/1809.08895 오디오샘플: https://neuraltts.github.io/transformertts/ - Transformer TTS - Tacotron2[S..
[논문리뷰] Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions (ICASSP18) 논문: Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions 저자: Jonathan Shen, Ruoming Pang, Ron J. Weiss, Mike Schuster, Navdeep Jaitly, Zongheng Yang, Zhifeng Chen, Yu Zhang, Yuxuan Wang, RJ Skerry-Ryan, Rif A. Saurous, Yannis Agiomyrgiannakis, Yonghui Wu 소속: Google, University of California, Berkeley 발표: ICASSP 2018 논문: https://arxiv.org/abs/1712.05884 오디오샘플: https://goog..
[논문리뷰] Char2Wav: End-to-End Speech Synthesis (ICLR17 Workshop) 논문제목: Char2Wav: End-to-End Speech Synthesis 저자: Jose Sotelo, Soroush Mehri, Kundan Kumar, Joao Felipe Santos, Kyle Kastner, Aaron Courville, Yoshua Bengio 소속: Universite de Montreal, IIT Kanpur, INRS-EMT 발표: ICLR 2017 Workshop 코드: https://github.com/sotelo/parrot 오디오샘플: http://josesotelo.com/speechsynthesis/ - 이때 당시까지 찾아보기 힘들었던 End-to-end speech synthesis 시스템을 제안한 논문 - 전체적인 시스템은 어텐션+RNN을 사용한 인코더..
[논문리뷰] SampleRNN: An Unconditional End-to-End Neural Audio Generation Model (ICLR17) 논문제목: SampleRNN: An Unconditional End-to-End Neural Audio Generation Model 저자: Soroush Mehri, Kundan Kumar, Ishaan Gulrajani, Rithesh Kumar, Shubham Jain, Jose Sotelo Aaron Courville, Yoshua Bengio 소속: University of Montreal, IIT Kampur, SSNCE 발표: ICLR 2017 논문: https://arxiv.org/abs/1612.07837 코드: https://github.com/soroushmehr/sampleRNN_ICLR2017 오디오 샘플: https://soundcloud.com/samplernn/sets - S..