Skip to main content

October 14, 2015

Speaker: Mennatallah El-Assady, PhD candidate

Title: Incremental Hierarchical Topic Modeling for Multi-Party Conversation Analysis

Abstract: Contributions in political debates or online forum discussions usually do not coincide in their structure, noisiness, and length. Due to this heterogeneity of conversational text data, its fully-automatic analysis remains a challenge for natural language processing.

In recent years, a broad spectrum of topic modeling approaches emerged that capture the thematic structure of parallel document collections with high accuracy. However, studies show that most of these techniques struggle with handling linearly-structured heterogeneous texts. Therefore, when applied on transcribed real-world conversations, these models tend to have a lower accuracy.

In this talk, I am going to present my work on the development of an incremental hierarchical topic modeling algorithm for analyzing linearly structured conversation data of multi-party political debates with respect to their topic distribution. The algorithm is tailored for handling noisy heterogeneous texts and has been successfully applied on several text corpora, yielding accurate topic structures.