Skip to main content
Ontario Tech acknowledges the lands and people of the Mississaugas of Scugog Island First Nation.

We are thankful to be welcome on these lands in friendship. The lands we are situated on are covered by the Williams Treaties and are the traditional territory of the Mississaugas, a branch of the greater Anishinaabeg Nation, including Algonquin, Ojibway, Odawa and Pottawatomi. These lands remain home to many Indigenous nations and peoples.

We acknowledge this land out of respect for the Indigenous nations who have cared for Turtle Island, also called North America, from before the arrival of settler peoples until this day. Most importantly, we acknowledge that the history of these lands has been tainted by poor treatment and a lack of friendship with the First Nations who call them home.

This history is something we are all affected by because we are all treaty people in Canada. We all have a shared history to reflect on, and each of us is affected by this history in different ways. Our past defines our present, but if we move forward as friends and allies, then it does not have to define our future.

Learn more about Indigenous Education and Cultural Services

November 10, 2010

Speaker: Dr. Ken Pu, Faculty of Science, Ontario Tech University

Title: Online Annotation of Text Streams With Structured Entities

Abstract: We propose a framework and algorithm for annotating unbounded text streams with entities of a structured database. The algorithm allows one to correlate unstructured and dirty text streams from sources such as emails, chats and blogs, to entities stored in structured databases. In contrast to previous work on entity extraction, our emphasis is on performing entity annotation in a completely online fashion. The algorithm continuously extracts important phrases and assigns to them top-$k$ relevant entities. Our algorithm does so with a guarantee of constant time and space complexity for each additional word in the text stream, thus infinite text streams can be annotated. Our framework allows the online annotation algorithm to adapt to changing stream rate by self-adjusting multiple run-time parameters to reduce or improve the quality of annotation for fast or slow streams, respectively. The framework also allows the online annotation algorithm to incorporate query feedback to learn user preferences and personalize the annotation for individual users.

Biography: Dr. Pu is an assistant professor at Faculty of Science, Ontario Tech. His research interest includes keyword search for relational databases, stream data processing, and applications of specific algorithms from machine learning, statistics, string matching and combinatorics to solve database problems. More recently, Dr. Pu is interested in storage and analysis of image streams obtained from biological phenomena and non-classical user interfaces to large data repositories.