Computer Science Technical Reports
CS at VT

An O(n log n) Algorithm for Finding Minimal Perfect Hash Functions

Fox, Edward A. and Heath, Lenwood S. and Chen, Qi-Fan (1989) An O(n log n) Algorithm for Finding Minimal Perfect Hash Functions. Technical Report TR-89-10, Computer Science, Virginia Polytechnic Institute and State University.

Full text available as:
PDF - Requires Adobe Acrobat Reader or other PDF viewer.
TR-89-10.pdf (2102403)

Abstract

We describe the first practical algorithm for finding minimal perfect hash functions that can be used to access very large databases. This method extends earlier work wherein an O(n3) algorithm was devised, building upon prior work by Sager that described an O(n4) algorithm. Our new O(n log n) expected time algorithm makes use of three key insights: applying randomness whereever possible, ordering our search for hash functions based on the degree of the vertices in a graph that represents word dependencies, and viewing hash value assignment in terms of adding circular patterns of related words to a partially filled disk. While ultimately applicable to a wide variety of data and file access needs, this algorithm has already proven useful in aiding our work in improving performance of CD-ROM systems and our construction of a Large External Network Database (LEND) for semantic networks and hypertext/hypermedia collections. Virginia Disc One includes a demonstration of a minimal perfect hash function running on a PC to access a 130,198 word list on that CD-ROM, and several other microcomputer, minicomputer, and parallel processor versions and applications of our algorithm have also been developed. Tests with a French word list of 420,878 entries and a library catalog key set with over 1.2 million keys have shown that our methods work with very large databases.

Item Type:Departmental Technical Report
Subjects:Computer Science > Historical Collection(Till Dec 2001)
ID Code:147
Deposited By:User autouser
Deposited On:18 October 2005