Computer Science Technical Reports
CS at VT

Intelligent Fusion of Structural and Citation-Based Evidence for Text Classification

Zhang, Baoping and Goncalves, Marcos Andre and Fan, Weiguo and Chen, Yuxin and Fox, Edward A. and Calado, Pavel and Cristo, Marco (2004) Intelligent Fusion of Structural and Citation-Based Evidence for Text Classification. Technical Report TR-04-16, Computer Science, Virginia Tech.

Full text available as:
PDF - Requires Adobe Acrobat Reader or other PDF viewer.
GP5.pdf (126495)

Abstract

This paper investigates how citation-based information and structural content (e.g., title, abstract) can be combined to improve classification of text documents into predefined categories. We evaluate different measures of similarity, five derived from the citation structure of the collection, and three measures derived from the structural content, and determine how they can be fused to improve classification effectiveness. To discover the best fusion framework, we apply Genetic Programming (GP) techniques. Our empirical experiments using documents from the ACM digital library and the ACM classification scheme show that we can discover similarity functions that work better than any evidence in isolation and whose combined performance through a simple majority voting is comparable to that of Support Vector Machine classifiers.

Item Type:Departmental Technical Report
Keywords:Classification, document similarity, citation analysis, Genetic Programming
Subjects:Computer Science > Information Retrieval
Computer Science > Digital Libraries
ID Code:693
Deposited By:Administrator, Eprints
Deposited On:09 September 2005