November 10, 2010
Title: Online Annotation of Text Streams With Structured Entities
Abstract: We propose a framework and algorithm for annotating unbounded text streams with entities of a structured database. The algorithm allows one to correlate unstructured and dirty text streams from sources such as emails, chats and blogs, to entities stored in structured databases. In contrast to previous work on entity extraction, our emphasis is on performing entity annotation in a completely online fashion. The algorithm continuously extracts important phrases and assigns to them top-$k$ relevant entities. Our algorithm does so with a guarantee of constant time and space complexity for each additional word in the text stream, thus infinite text streams can be annotated. Our framework allows the online annotation algorithm to adapt to changing stream rate by self-adjusting multiple run-time parameters to reduce or improve the quality of annotation for fast or slow streams, respectively. The framework also allows the online annotation algorithm to incorporate query feedback to learn user preferences and personalize the annotation for individual users.
Biography: Dr. Pu is an assistant professor at Faculty of Science, Ontario Tech. His research interest includes keyword search for relational databases, stream data processing, and applications of specific algorithms from machine learning, statistics, string matching and combinatorics to solve database problems. More recently, Dr. Pu is interested in storage and analysis of image streams obtained from biological phenomena and non-classical user interfaces to large data repositories.