Highlights

  • Research interest
  • Projects
  • Work overview

Research Interest

  • Privacy, Deviance, User Influence and Retention in the Social Web
  • Social data mining
  • Semantic web
  • Software Quality & Testing

Research Projects

  • The Good, the Bad and the Deviant in Community Question Answering
  • We focus on abuse reports to characterize the behavior of the good guys vs. bad guys in a popular community-based question answering platform, Yahoo Answers. We defined and validated metrics that reflect the extent to which a user deviates from the typical behavior. The analysis of more than a year of sampled user activity shows that deviant users are not all necessarily bad. Some are (so much so that their accounts will be eventually suspended), while others add value to the community rather than subtract it. We also attempt to understand the association between users privacy concerns as manifested by their account settings and their activity in the CQA platform.

  • Retention in Community Blogs
  • Turnover in community blogging is very high, with most people who initially join and start contributing to the community, failing to contribute in the long run. In this project, we ask what factors cause a blogger to continue participating in the community by contributing content (e.g., posts, comments). We crawled a sample of blogger profiles (contributed 91% community posts) from a popular community blogging platform ``Blogster'' and built a predictive model for retention. Our results show that the male and aged (senior) bloggers, who face fewer constraints and have more opportunities in the community are more retained than others. Other bloggers pay a high degree of attention to these retained bloggers through implicit (reading posts) and explicit (writing comments) interactions. We have also found that a blogger has higher retention if her friends have also higher retention and a strong social tie reduces retention imbalance between two blogger friends.

  • Aegis: A Semantic Implementation of Privacy as Contextual Integrity in Social Ecosystems
  • Combining and incorporating rich semantics of user social data, which is currently fragmented and managed by proprietary applications, has the potential to more accurately represent a user's social ecosystems. However, social ecosystems raise serious concerns regarding privacy. This project proposes to model privacy as contextual integrity using semantic web tools and focuses on defining default privacy policies, as they have the highest impact on today's systems.

  • Influential Bloggers in a Blogging Community
  • Blogging is a popular activity with high impact on marketing, shaping public opinions, and informing the world about major events from a grassroots point of view. Influential bloggers are recognized by businesses as significant forces for product promotion or demotion, and by oppressive political regimes as serious threats to their power. This work studies the problem of identifying influential bloggers in a blogging community, BlogCatalog, by using network centrality metrics. Our analysis shows that bloggers are connected in a core-periphery network structure, with the highly influential bloggers well connected with each others forming the core, and the non-influential bloggers at the periphery. The six node centrality metrics we analyzed are highly correlated, showing that an aggregate centrality score as a measure of influence will be stable to variations in centrality metrics.

  • Agile Testing: PRAT as a Metric of Testing Quality in Scrum
  • This project presents a metric that measures the quality of the testing process in a Scrum process. As product quality and process quality correlate, improved test quality can ensure high quality products. We propose a metric Product Backlog Rating (PRAT) to assess the testing process in Scrum. PRAT considers complexity of the features to be developed in an iteration of Scrum, assess test ratings and offers a numerical score of the testing process.

  • Test Case Prioritization for Regression Testing
  • Test case prioritization techniques involve scheduling test cases for regression testing in an order that increases their effectiveness at meeting some performance goal. Using information obtained from previous test case execution, prioritization techniques order the test cases for regression testing so that most beneficial are executed first thus allows an improved effectiveness of testing. We leverage fault dependencies and propose techniques/ algorithms for a regression test suite prioritization. We also propose metrics to evaluate the effectiveness of a dependency based prioritization technique.