March 31, 2010

Speaker: Dr. Gerald Penn, Chief Scientist Knowledge Media Design Institute, Associate Professor of Computer Science, University of Toronto

Title: Summarizing Speech

Abstract: Speech is arguably the most basic, most natural form of communication that we engage in, so it should come as no surprise that there has been a consistent pressure to deliver spoken audio content on web pages that, in principle, can be searched through. Even once the search problem is solved, however, the low-bandwidth, non-visual, traditional delivery of spoken audio makes it much more difficult to browse through. This makes the automated summarization of speech particularly attractive: given a number N, prepare a summary of a spoken "document" that contains the most important or salient content that is N seconds long, or N utterances long, or N percent of the original document's length.

This talk will present a (human-prepared) summary of our research on summarizing speech. We'll talk about how speech summarization is usually evaluated, including some of the appropriate baselines in this area, the dependence of genre on the performance and tuning of summarizers, the role of automated speech transcription in summarization, and the usefulness of some of the acoustic, untranscribed features of the speech signal.

Biography: Gerald Penn is an Associate Professor of Computer Science at the University of Toronto. He received his PhD in 2000 from the School of Computer Science at Carnegie Mellon University. From 1998 to 2001, he worked in the Multimedia Communications Research Laboratory at Bell Labs in the United States. His other research interests include mathematical linguistics, parsing in freer word-order languages, spoken language processing and programming language theory.