言語種別 |
英語 |
発行・発表の年月 |
2011 |
形態種別 |
国際会議論文 |
査読 |
査読あり |
標題 |
A Study on Automatic Chinese Text Classification |
執筆形態 |
共著 |
掲載誌名 |
11TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR 2011) |
出版社・発行元 |
IEEE COMPUTER SOC |
巻・号・頁 |
920-924 |
著者・共著者 |
Xi Luo,Wataru Ohyama,Tetsushi Wakabayashi,Fumitaka Kimura |
概要 |
In this paper, we perform Chinese text classification using N-gram (uni-gram, bi-gram and mixed uni-gram/bigram) frequency feature instead of word frequency feature to represent documents and propose the use of mixed unigram/ bi-gram after feature transformation. We further propose a serial approach based on feature transformation and dimension reduction techniques to improve the performance. Experimental results show that our proposed approach is efficient and effective for improving the performance of Chinese text classification. Furthermore, we present several experiments evaluating the selection of features based on part-of-speech analysis and the results show that suitable combination of part-of-speech can lead to better classification performance. |
DOI |
10.1109/ICDAR.2011.187 |
ISSNコード |
15205363 |
DBLP ID |
conf/icdar/LuoOWK11 |
PermalinkURL |
http://dblp.uni-trier.de/db/conf/icdar/icdar2011.html#conf/icdar/LuoOWK11 |