中文意見探勘系統設計

由於中文文法結構與英文不同，字與字之間是沒有間隔分開來，若使用POS或Parser來找尋意見詞時，會很容易產生錯誤，因此本論文在採用詞庫方式來擷取意見詞同時，搭配著我們提出的排除字方法來改善意見詞擷取的準確率。由於每個不同的領域都有不同的習慣用語(意見詞和排除字)，所以一般的詞典很難涵蓋一個特定領域中所有的意見詞。但我們認為針對一個特定領域而言，只要訓練資料夠多，大部分的詞典外的意見詞和排除字都可被擷取，而未出現在訓練資料集的詞典外意見詞和排除字的除了數量並不多且呈現穩定狀態外，而且通常都是較冷門較不常使用的意見詞和排除字。本論文節將分別利用Mobile01電信和網路寬頻兩個不同但相似領域之實驗數據來證明此一觀點。由於本論文是採用詞庫方式來來擷取意見詞和排除字，所以詞典內的意見詞和排除字都可被節取出來，而詞典外的意見詞和排除字則必須利用人工標註方式才可找出，但此法必須花費大量時間和人工；因此依據新增詞典外意見詞和排除字的穩定性，我們設計出二階段式詞庫訓練方法來解決非常耗時費力的問題。我們二階段式詞庫訓練方法，第一階段是借助人工半自動標註來擷取訓練資料的意見詞或排除字，第二階段則是在系統上線時，直接利用詞典來擷取文章中的意見詞或排除字，再利用人工檢查所擷取意見詞和排除字的正確性。依據實驗數據顯示，我們第二步驟訓練流程相較於第一個月的訓練，在犧牲準確率及回收率很少狀況下，能夠節省大量人力標註及檢查的時間。

關鍵字

意見探勘；冷門詞；熱門詞；排除字；詞庫穩定

並列摘要

Since the Chinese grammatical structure is different from English, there is no interval space in between Chinese words. Using POS or Parser in search of opinion words can easily lead to errors. Therefore, when capturing opinion words by using the thesaurus (lexicon) way, this study uses the proposed exclusion word method to improve the opinion word capturing precision. As each of the different fields has different terminologies or idioms (opinion words and exclusion words), ordinary dictionaries can hardly cover all the opinion words in a specific field. However, for a specific field, as long as the training data are sufficient, most of the opinion words and exclusion words outside the dictionaries can be captured. The opinion words and exclusion words outside the dictionaries that have not been included in the training set are few, and at a stable state. Moreover, they are often opinion words and exclusion words that are not frequently used. This paper uses the experimental data of two different but similar fields of Mobile01 telecommunications. As this paper uses the thesaurus/lexicon way to capture the opinion words and exclusion words, all the opinion words and exclusion words in dictionaries can be captured. The opinion words and exclusion words outside the dictionaries can be determined only by manual tagging, which is time and labor consuming. Therefore, according to the stability of the new opinion words and exclusion words outside the dictionaries, this study attempts to design a two-stage lexicon training method to solve this problem. Regarding the proposed two-stage lexicon training method, the first stage is to capture the opinion words or exclusion words of training data by manual semi-automated tagging. The second stage is to directly use the dictionaries to capture the opinion words or exclusion words of the articles when the system is online before manually inspecting the accuracy of the captured opinion words and exclusion words. According to the experimental data, the training procedure of the second stage can save a great deal of time for manual tagging.

並列關鍵字

Opinion Mining ； unpopular words ； popular words ； exclude words ； lexicon stable

參考文獻

[41] 林偉揚, "應用種子詞彙延伸方式於BBS電影評論之口碑分析," 元智大學資訊管理學系, 2011.

[44] 平震宇, " 一個適用於行動裝置的網頁搜尋結果分群系統之研究," 元智大學資訊管理研究所碩士論文, 2007.

[45] 陳子龍, "中文意見探勘系統之句法分析," 淡江大學資訊工程學系資訊網路與通訊研究所碩士論文, 2012.

[9] 許中川 and 陳景揆, "探勘中文新聞文件," 資訊管理學報, vol. 7, pp. 103-122, 2001.

[27] 邱鴻達, "意見探勘在中文電影評論之應用," 國立交通大學資訊科學與工程研究所, 2011.

被引用紀錄

許喬安（2013）。結合意見探勘系統應用於口碑行銷-以電信為例〔碩士論文，淡江大學〕。華藝線上圖書館。https://doi.org/10.6846/TKU.2013.00470

陳子龍（2012）。中文意見探勘系統之句法分析〔碩士論文，淡江大學〕。華藝線上圖書館。https://doi.org/10.6846/TKU.2012.01149

瞿怡正（2012）。MOBILE01和PTT兩個不同論壇相同面向習慣用詞之探討-以電信、寬頻版為例〔碩士論文，淡江大學〕。華藝線上圖書館。https://doi.org/10.6846/TKU.2012.00766

國際替代計量

中文意見探勘系統設計

全文下載

主題瀏覽