シガ ヨシノリ SHIGA Yoshinori
志賀 芳則
所属 東京電機大学 工学部 情報通信工学科
東京電機大学大学院 先端科学技術研究科 情報通信メディア工学専攻
東京電機大学大学院 工学研究科 情報通信工学専攻
職種 教授
言語種別 英語
発行・発表の年月 2021/06/15
形態種別 学術研究論文
査読 査読あり
標題 Full-band LPCNet: A real-time neural vocoder for 48 kHz audio with a CPU
執筆形態 共著
掲載誌名 IEEE Access
掲載区分 国外
出版社・発行元 IEEE
巻・号・頁 9,94923-94933
著者・共著者 K. Matsubara, T. Okamoto, R. Takashima, T. Takiguchi, T. Toda, Y. Shiga and H. Kawai
概要 This paper investigates a real-time neural speech synthesis system on CPUs that can synthesize high-fidelity 48 kHz speech waveforms to cover the entire frequency range audible by human beings. Although most previous studies on 48 kHz speech synthesis have used traditional source-filter vocoders or a WaveNet vocoder for waveform generation, they have some drawbacks regarding synthesis quality or inference speed. LPCNet was proposed as a real-time neural vocoder with a mobile CPU but its sampling frequency is still only 16 kHz. In this paper, we propose a Full-band LPCNet to synthesize high-fidelity 48 kHz speech waveforms with a CPU by introducing some simple but effective modifications to the conventional LPCNet. We then evaluate the synthesis quality using both normal speech and a singing voice.