言語種別 |
英語 |
発行・発表の年月 |
2021/06/15 |
形態種別 |
学術研究論文 |
査読 |
査読あり |
標題 |
Full-band LPCNet: A real-time neural vocoder for 48 kHz audio with a CPU |
執筆形態 |
共著 |
掲載誌名 |
IEEE Access |
掲載区分 |
国外 |
出版社・発行元 |
IEEE |
巻・号・頁 |
9,94923-94933 |
著者・共著者 |
K. Matsubara, T. Okamoto, R. Takashima, T. Takiguchi, T. Toda, Y. Shiga and H. Kawai |
概要 |
This paper investigates a real-time neural speech synthesis system on CPUs that can synthesize high-fidelity 48 kHz speech waveforms to cover the entire frequency range audible by human beings. Although most previous studies on 48 kHz speech synthesis have used traditional source-filter vocoders or a WaveNet vocoder for waveform generation, they have some drawbacks regarding synthesis quality or inference speed. LPCNet was proposed as a real-time neural vocoder with a mobile CPU but its sampling frequency is still only 16 kHz. In this paper, we propose a Full-band LPCNet to synthesize high-fidelity 48 kHz speech waveforms with a CPU by introducing some simple but effective modifications to the conventional LPCNet. We then evaluate the synthesis quality using both normal speech and a singing voice. |