October 8, 2014
Speaker: Ernesto Rodriguez Reina, Ontario Tech University graduate student
Title: An Index Structure for Fast Range Search in Hamming Space
Abstract: This thesis addresses the problem of indexing and querying very large databases of binary vectors. Such databases of binary vectors are a common occurrence in domains that produce or analyze large numbers of high-dimensional binary vectors, such as information retrieval and computer vision. We propose an indexing structure consisting of a compressed trie and a hash table for supporting range queries in Hamming space. The index structure, which can be updated incrementally, is able to solve the range queries for any radius. Our indexing structure outperforms the existing techniques in terms of query processing times. The increased performance is due to the fact that our method is able to prune unnecessary look-ups that necessarily arise in range queries of binary vectors.