Table of Contents

Weibo Visualization

Data

Weibo data includes two object types: user and status. The original data set is obtained from the Sina Weibo OpenAPI, but we only use a small part of it as a test case. They have the following attributes:

User

Name

Number of Followers

Number of Friends

Location: Province, City

Gender

Status

Text

Time

Number of Reposts

Number of Comments

The relations between types of data objects are:

User-Status Relation (Authorship)

User-User Relation (Follow)

Status-Status Relation (Repost)

Data Selection and Acquisition

For our test case, we chose several popular topics and keywords:

火车票: 火车票, 12306, 抢票

天气: 天气, PM2.5, 空气

世界末日: 玛雅, 世界末日, 2012年12月21日

食品安全: 食品安全, 速生鸡, 肯德基

We searched for given keywords in whole collection of statuses. Since a large portion of statuses in the database have zero repost and comment and are considered to be unimportant, these statuses are filtered out of our test case.

We used these criteria to search in the statuses database:

Keywords

Posted prior to Jan 1, 2013

Reposts count must greater than 3

Only return 2000 statuses

Then we fetch the authors of these statuses.

The final dataset contains 1,425 users, 1,930 statuses posted between Dec 30 8:18 to Dec 23:59, 2012. It is 3.7MB in size.

Design

What We Want to Reveal

Our main goal is to reveal the evolution of popular topics within a period of time, as well as correlations between users and topics. Our design motivation includes:

To show the popularity and evolution of topics, its popularity and the related users.

To show the distribution in terms of location, gender, topics and time.

Discover critical users and statuses to this topics.

To show the sentimental distribution and trend.

Views

Bubble chart of statuses, grouped by topics and time

Y position: time, grouped by intervals

X position: first group by topic, then sort by sentiment

Color: topic

Size: number of reposts and comments

Bubble chart of users, grouped by location (province)

Color: gender

Size: number of followers

Stacked area chart to show overview of topic evolution

Links between users and statuses

Stacked area chart to show overview of topic evolution, both in absolute value and in percentage

Interaction

Foldable time intervals: expand to see detailed time-wise distribution of statuses

Focus on single user and his posts

Multiple selection of topics

Multiple selection of provinces

Screenshots

Expand time groups.

Filter users by province.

Filter statuses by topic.

Select a user and his posts.

Filter by user and topic.

Implementation

Technologies

This visualization is implemented using web technologies: HTML, CSS and JavaScript, with D3.js library. Data is stored in MongoDB, and processed and provided through a Node.js web server.

Findings

Users from Beijing, Guangdong and Shanghai contribute most to the selected topics.

Quiet during 2am to 6am — sleep time.

People talked more about the Apocalypse towards the end of the year, usually in summary of 2012.

Ads and promotions get lots of attention and responses.

What We Learned

Understanding of data: clarify what to expect from the visualization before anything else (design, data preparation, etc)

With a huge amount of elements in the visualization, UI performance becomes an issue. Consider unconventional UI programming hacks for performance, though at the cost of code extendability and maintainability (violating DRY principle).