Computer Science Technical Reports
CS at VT

A Case Study of Using Domain Analysis for the Conflation Algorithms Domain

Yilmaz, Okan and Frakes, William (2007) A Case Study of Using Domain Analysis for the Conflation Algorithms Domain. Technical Report TR-07-32, Computer Science, Virginia Tech.

Full text available as:
PDF - Requires Adobe Acrobat Reader or other PDF viewer.
Yilmaz-tse07.pdf (489030)

Abstract

This paper documents the domain engineering process for much of the conflation algorithms domain. Empirical data on the process and products of domain engineering were collected. Six conflation algorithms of four different types: three affix removal, one successor variety, one table lookup, and one n-gram were analyzed. Products of the analysis include a generic architecture, reusable components, a little language and an application generator that extends the scope of the domain analysis beyond previous generators. The application generator produces source code for not only affix removal type but also successor variety, table lookup, and n-gram stemmers. The performance of the stemmers generated automatically was compared with the stemmers developed manually in terms of stem similarity, source and executable sizes, and development and execution times. All five stemmers generated by the application generator produced more than 99.9% identical stems with the manually developed stemmers. Some of the generated stemmers were as efficient as their manual equivalents and some were not.

Item Type:Departmental Technical Report
Subjects:Computer Science > Algorithms and Data Structure
ID Code:993
Deposited By:Administrator, Eprints
Deposited On:17 September 2007