Date

2013

Document Type

Dissertation

Degree

Doctor of Philosophy

Department

Computer Science

First Adviser

Davison, Brian D.

Other advisers/committee members

Heflin, Jeff D.; Nagel, Roger N.; Bandyopadhyay, Soutir

Abstract

The social network-enhanced Web has become increasingly important. With a wide spectrum of social services such as blogs, wikis, online forums, social network services and community question answering portals, individuals can produce, consume and share information through rich user interactions. These interactions include conversations, annotations and resource sharing, enabling faster and wider dissemination and development of information at a large scale. In addition, the recent popularized micro-blogging services such as Twitter and Tumblr have revolutionized the Web to a more synchronized world, opening opportunities for users around the world with various cultural backgrounds to generate and propagate information in "real-time". Conversations as a scientific field have been studied for decades. Traditional research related to conversations has been considered by a variety of disciplines including linguistics, sociology, anthropology, psychology, communication studies and translation studies, each of which is subject to its own assumptions, dimensions of analysis, and methodologies. One major characteristic of traditional research on conversations is that most previous classic studies were based on surveys, field research, small scale datasets and sometimes depend on a detailed inspection of tape recordings or transcriptions made from such recordings. While some theories and methodologies developed in these areas became the foundations for modern analysis of conversations (e.g., the ones based on computational linguistics and information retrieval), most of them cannot be directly applied to online settings due to their qualitative nature and also due to some of their case-by-case style of studies that cannot be scaled to the amount of data online. In addition, since such research was conducted prior to the time of popularity of the Internet, the conclusions and results obtained through these methods are also needed to be re-verified in the new era as well.Although a large amount of research has been made in mining and understanding online conversational media, some practical problems remain unanswered. First of all, when facing a large amount of socially generated content, users simply cannot consume it in an effective and efficient way, leading to the problem of information overload. On the other hand, it is difficult for a user to obtain information distributed outside of their social circle, even though it might match their interests, leading to the problem of information shortage. Users may spend a significant amount of time to filter and search relevant information in such platforms. In general, the problem can be considered as information filtering in online conversational media. One of the central challenges to information filtering is to track users' interests. The assumption is that if we can understand them perfectly, most relevant and fresh information can be selected from the ocean of items and presented to users. The key ingredient of tracking users' interests for online conversational media is to understand the content generated by users, usually modeled as topical distributions, as well as rich interaction data. In this dissertation, we will discuss both information filtering and topic/interest tracking as they are two important problems in online conversational media, in a principled way. On one hand, we will demonstrate how we develop new approaches to achieve the state-of-the-art performance in each direction. On the other hand, we will also discuss the relationships between these two directions and show how they can indeed link with each other. We link two directions of mining and understanding online conversational media as a dual relationship of data analysis in online conversational media and demonstrate that how they benefit from the development of each other. This dissertation can be used as a guideline for readers who are interested in data analysis in social media in general.

Share

COinS