Abstract

Recently, microblogging is widely studied by the researchers in the domain of the online social network (OSN). How to evaluate the popularities of microblogging users is an important research field, which can be applied to commercial advertising, user behavior analysis and information dissemination, and so forth. Previous studies on the evaluation methods cannot effectively solve and accurately evaluate the popularities of the microbloggers. In this paper, we proposed an electromagnetic field theory based model to analyze the popularities of microbloggers. The concept of the source in microblogging field is first put forward, which is based on the concept of source in the electromagnetic field; then, one’s microblogging flux is calculated according to his/her behaviors (send or receive feedbacks) on the microblogging platform; finally, we used three methods to calculate one’s microblogging flux density, which can represent one’s popularity on the microblogging platform. In the experimental work, we evaluated our model using real microblogging data and selected the best one from the three popularity measure methods. We also compared our model with the classic PageRank algorithm; and the results show that our model is more effective and accurate to evaluate the popularities of the microbloggers.

1. Introduction

Microblogging is a broadcast medium in the form of blogging. A microblog differs from a traditional blog in that its content is typically smaller in both actual and aggregate file size. Microblogging allows users to exchange small elements of content such as short sentences, individual images, or video links. Twitter and Weibo are all the famous microblogging and have more than hundreds of millions of users. Twitter and Weibo social networks have emerged as a critical factor in information dissemination, search, marketing, expertise, and influence discovery and potentially an important tool for mobilizing people [1–5]. Social media have made social networks ubiquitous, also given researchers access to massive quantities of data for empirical analysis. These data sets offer a rich source of evidence for studying dynamics of individual and group behavior, the structure of networks, and global patterns of the flow of information on them [6–8]. Popularity Evaluation Model for microbloggers is very important research field on social network such as Sina Weibo. For example, companies choose popularities of microblogging users to run their commercials, by popularities of microblogging users to publish and forward, to achieve enterprise business advertising. Also in the study of online social networks, the network users need to study the role of other issues and also need a model or method of analysis of the popularities to the user. Therefore, Popularity Evaluation Model for microbloggers can be applied into commercial advertising, user behavior analysis and information dissemination, and so forth.

How to evaluate the popularities of microblogging users is an important research for online social network. Previous studies on the evaluation methods cannot effectively solve and accurately evaluate the popularities of the microbloggers [9, 10]. For example, the popularities of microbloggers are hard to evaluate based on transitional network structure models (PageRank algorithm [10]). It is well known that the more fans users has, which showed greater the popularities on Weibo social network [11]. According to the actual data statistics, we found that there were inactive users in Sina Weibo. We referred to the users as “zombie.” The existence of “zombie” had no contribution to popularity of users, and this is why the relationships between the number of users’ fans and popularity were not close enough. Therefore, this method based on the fan list cannot truly reflect one’s connection strength or popularity.

In order to effectively and accurately evaluate the popularities of the microbloggers over time, we proposed an electromagnetic field theory based model to analyze the popularities of microbloggers in this paper. The concept of the source in microblogging field is first put forward, which is based on the concept of source in the electromagnetic field; then, one’s microblogging flux is calculated according to his/her behaviors (send or receive comments) in the microblogging platform; finally, we used three methods to calculate one’s microblogging flux density, which can represent one’s popularity in the microblogging platform. The remainder of this paper is organized as follows. We discuss the related work in Section 2. In Section 3, we proposed three kinds of user source in microblogging field as those in the electromagnetic field, which are positive, negative, and neutral sources. For every microblogger, the microblogging flux is calculated according to his/her behaviors (send or receive comments). Three methods are put forward to calculate one’s microblogging flux density in Section 4. In Section 5, we evaluated our model using real microblogging data and selected the best one from the three measure methods to evaluate the popularities of microbloggers. Finally, we conclude the paper in Section 6.

2. Related Work

Recently, online social networks [1] have gained significant popularity and are now among the most popular sites on the Web. Online social network researchers mainly focus on network-structure-model construction [2–5], user-behavior analysis [6], information dissemination [7], content recommendation [8], and so forth. Those research fields are closely associated. For example, information dissemination is mainly influenced by the user interest degree, number of users’ friends, and user’s behavior [12–15]. Mislove et al. [13] collected mass data from four social network sites; and they measured and analyzed the structures of online social networks. Zhao et al. [14] found weak connection in online social network, which has significant influence in evaluating the speed and the breath of the network information transmission. Kwak et al. [15] analyzed the topological structures of Twitter, and they found that the number of Twitter users’ followers is distributed according to a power law followed by an exponential cutoff. Letierce et al. [12] studied the label for the transmission of the information between users. At present, in the view of the network-structure-model researchers, user connection study and its weight measure are very popular. Yun et al. [16] analyzed five factors which influence mutual connections between users in Twitter. Chen et al. [17] analyzed the equivalence attributes of online users, which are mostly based on the user connection strength. In the field of content recommendation, some researchers focus on the category of the content, and then the content will be recommended to users whose interest matching degrees are high. Content recommendation is widely used in the electronic commerce system, video sharing sites, and other fields [18]. Among these applications, collaborative filtering is a mostly used technology; for a given user, it is used to recommend those whose interests are similar to him/her as his/her potential friends. Saito et al. [19] and Tang et al. [20] conducted a series of experiments and found that, due to the different preferences, users with similar interests behave differently in spreading their topics. Yang and Scott [21] found that the mention rate from relevant users is an important factor which influences many aspects of information transmission, such as speed, scale, and scope. Lerman and Ghosh [9] analyzed the influence of network structure in information transmission based on Digg and Twitter data.

3. User Source in Microblog

In the microblogging social network, users release microblogs to share information, and they can also interact with each other by forwarding or commenting on microblogs. The behaviors of the microbloggers are described in Figure 1. As shown in Figure 1(a), the static circle represents a microblogger; lines with positive arrows mean that the microblogger is forwarded or commented on by others. As shown in Figure 1(b), lines with negative arrows mean that the microblogger posts microblogs or comments on other blogs. On microblogging platform, everyone can post microblogs or communicate with each other. However, their popularities are different. We often find on microblogging platform that even two microbloggers post similar number of blogs; their blogs receive different responses (the total numbers of followers and comments are far different). It agrees with the well-known Matthew effect, namely, “the rich get richer and the poor get poorer.” Those celebrities are more frequently followed or commented on by others. To study this Matthew effect in microblogging environment, we will first put forward the concept of the source in microblogging field and then give the criteria to calculate one’s microblogging flux (activity). Our model is based on the electromagnetic field theory, which is first in its kind.

Figure 1: (a) The microblogger is followed or commented on by others; (b) the microblogger posts microblogs or comments on other users’ microblogs.

In order to calculate one’s microblogging flux (activity), we first introduce the concept of the source in the electromagnetic field theory. An electromagnetic field (also EMF or EM field) is a physical field produced by electrically charged objects. It affects the behavior of charged objects in the vicinity of the field. The electromagnetic field extends indefinitely throughout space and describes the electromagnetic interaction. The field can be viewed as the combination of an electric field and a magnetic field. The electric field is produced by stationary charges and the magnetic field by moving charges (currents); these two are often described as the sources of the field. As the distribution of the charge or current in space is uneven, the charge density and current density are put forward to describe the distribution of the source. The electric field is produced by stationary charges; there are three kinds of point charges in nature, which are (a) the positive charge, (b) the negative charge, and (c) the neutral charge as shown in Figure 2.

Figure 2: Three forms of charge in the nature.

The point charge is a source of electric field which can produce an electric field. Similar with the source of electromagnetic field, in microblogging social network, the flux (activity) is produced by microbloggers. Therefore, we can take every microblogger as a source in microblogging field. In electromagnetism, the magnetic flux (often denoted as ) through a surface is the component of the magnetic field passing through that surface. As shown in Figure 3, the magnetic flux is properly defined as the component of the magnetic field passing through a surface, where is the magnitude of the magnetic field (the magnetic flux density) having the unit of (tesla), is the area of the surface, and is the angle between the magnetic field lines and the normal (perpendicular) to .

Figure 3: The magnetic flux in a magnetic field.

According to the magnetic flux in a magnetic field, we also use this idea to study the microblogging flux. In the microblogging social network, the microblogging flux of a given blog is defined as the number of feedbacks (to be forwarded or commented on) subtracting one, as shown in the following formula:
where represents the forwarded number of this blog and represents its commented number. If the blog receives no feedbacks, we will reduce one as the punishment. For each microblogger, we add the microblogging flux of all his/her blogs. Therefore, the microblogging flux of a given user is simplified as
where Blogs represents the blog set of the given microblogger, represents his/her th blog, represents the forwarded number of the th blog, and represents the commented number of the th blog. According to formula (2), the microblogging flux of a given microblogger can be positive, negative, or zero. As shown in Figure 4(a), if a microblogger receives more feedbacks than the number of blogs she/he posts, we define him/her as a positive microblogging source; if a microblogger receives less feedbacks than the number of blogs she/he posts, she/he will be defined as a negative microblogging source (see Figure 4(b)); otherwise, the left bloggers are defined as the neutral microblogging sources (see Figure 4(c)). In Table 1, we roughly identify several microbloggers from Sina microblogging platform as positive, negative, or neutral ones.

Table 1: The parameters of microbloggers.

Figure 4: Three kinds of microblogging sources.

Thus, we can classify users in microblogging platform by considering their microblogging flux. For example, if one’s microblogging flux is greater than 0, we see him/her as a positive microblogger; if one’s microblogging flux is less than 0, we see him/her as a negative microblogger; otherwise, she/he is defined as a neutral microblogger. Figure 5 shows the distribution of three kinds of microbloggers randomly selected from the Sina Weibo sample, from which we can see that most microbloggers (59%) are positive; about 41% of microbloggers in the sample are negative; and only 1% of microbloggers are neutral.

Figure 5: The distribution of three kinds of microbloggers.

4. Popularity Evaluation Model for Microbloggers

In Section 2, we introduced the concept of microblogging source and classified microbloggers into three kinds according to their microblogging flux. We first propose Hypothesis 1 as follows.

Hypothesis 1. The popularity of a microblogger is decided by his/her microblogging flux.

In order to verify Hypothesis 1, we sort the microbloggers by their microblogging flux in descending order. Table 2 shows the microbloggers whose microblogging flux is arranged in top 16. We can manually compare their popularities by browsing their homepages in the microblogging platform. Take the user pair of ID1671526850 and ID1660209951 as an example. The microblogging flux of the former is much greater than that of the latter as shown in Table 2. ID1671526850 has 158098 blogs and about 499 thousand fans. And ID1660209951 only has 57981 blogs and about 466 thousand fans. In this context, we believe that ID1660209951 is more popular than ID1671526850. That is because every blog of ID1660209951 attracts more feedbacks; even the number of his fans is less than that of ID1660209951. From this example, we can find that Hypothesis 1 may not be very reasonable which needs to be improved.

Microblogging flux may not accurately evaluate one’s popularity; to address this problem, we propose a new evaluation metric which is called the microblogging flux density. We can learn from the electromagnetic concept that the microblogging flux is represented by the surface integral of a vector field, as shown in the following formula:
where is a vector field and is the vector area of the microblogging surface , directed as the surface normal. Conversely, one can consider the microblogging flux the more fundamental quantity and call the vector field the microblogging flux density. Often a vector microblogging field is drawn by curves (field lines) following the “flow”; the magnitude of the vector microblogging field is then the line density, and the flux through a surface is the number of lines. The microblogging flux density (microblogging field) is denoted by . In the microblogging platform, we believe one’s popularity is associated with the amount of feedbacks she/he receives, while the latter is also decided by the number of one’s blogs and fans. Therefore, the microblogging surface in formula (3) might be affected by these two factors, namely, the number of blogs (Blog_Num) and the number of the fans (Fan_Num). To verify our judgment, we propose three hypotheses as follows.

Hypothesis 2. One’s microblogging surface is mainly affected by the number of blogs.

Hypothesis 3. One’s microblogging surface is mainly affected by the number of fans.

Hypothesis 4. One’s microblogging surface is mainly affected by the number of blogs and fans.

Based on Hypothesis 2, the microblogging flux density can be calculated as the following formula, which is labeled as :
where is the number of one’s microblogs. reflects the average number of feedbacks of one’s microblog, which is similar to divergence theorem in electromagnetic field theory. On the microblogging platform, some bloggers post a large number of blogs every day to attract the attention of others. Some of their blogs are interesting, and they can acquire a lot of feedbacks. However, most of their blogs are trite, which might not be forwarded or commented on. Microbloggers with this feature are called “twuilt.” Though the microblogging flux of those twuilts is high, the average microblogging flux from each blog is very low. Those twuilts are not really popular. can reflect one’s popularity, which effectively reduces the influence from those twuilt IDs.

According to Hypothesis 3, the microblogging flux density can be calculated as in the following formula, which is labeled as :
where is the number of one’s fans. As the feedbacks of one’s blogs mainly come from his/her fans, considers the average number of feedbacks from one of his/her fans. On the microblogging platform, some bloggers hire multiple IDs (usually called “zombie IDs”) to increase their popularity. Those zombie IDs are fake identities through which members of Internet community create the illusion of support for someone, pretending to be a different person. As those zombie IDs seldom give feedbacks to other bloggers, they cannot be seen as one’s true fans. can reflect one’s popularity, which effectively reduces the influence from those zombie IDs.

Hypotheses 2 and 3 consider the effects to one’s popularity from his/her number of blogs and fans. The former can reduce the influence from those twuilt IDs; the latter can reduce the influence from those zombie IDs, while Hypothesis 4 combines these two effects and can lead to a new microblogging flux density () as the following formula:

fully considers all the factors that could affect one’s popularity. It reduces the influences from both twuilt IDs and zombie IDs. The detailed analysis of the three metrics will be discussed in the next section.

5. Experiments and Analysis

In order to validate the effectiveness of the above three popularity evaluation metrics, we will use real microblogging data and conduct a series of experiments. The data used in this paper is crawled from Sina Weibo, which is a Chinese microblogging website. Akin to a hybrid of Twitter and Facebook, it is one of the most popular sites, in China, in use by well over 30% of Internet users, with a market penetration similar to what Twitter has established in the USA. It was launched by SINA Corporation on August 14, 2009, and has 503 million registered users as of December 2012. About 100 million messages are posted each day on Sina Weibo. In this paper, the collected data includes 8,945 users (include 901 VIP users), 20,147,746 blogs, and 925,669,059 comments.

Currently the performance of popularity evaluation methods is evaluated by manual inspection. For each microblogger an effort is made to interpret him/her as a “real” popular star by browsing their homepages in the microblogging platform. For example, we check how many original blogs she/he has posted, how often his/her fans give the feedbacks, and how do his/her fans like his/her posted blogs. However, such anecdotal evaluation procedures that require extensive manual effort are noncomprehensive and limited to small networks. In the Sina Weibo, to improve their popularities, some microbloggers might pursue the VIP privilege. Those VIP users are more easily accessed by other bloggers. We found in Sina Weibo that most of those popular microbloggers are VIP users. Although it cannot prove that all the VIP bloggers are very popular in Sina Weibo, we can use the number of VIP users as indirect indicators to verify the effectiveness of our proposed measure models.

5.1. Comparison of Four Hypotheses

To verify the effectiveness of the blog based flux density , we select the microbloggers whose are arranged in top 1500 (about 15% users in the data set). Figure 6 shows those microbloggers with their fan number and , in which the red nodes represent VIP users. The number of VIP users in those 1500 microbloggers is 348, accounted for about 39% VIP users. Though some bloggers’ are very high, they are not VIP users, because the number of their fans is huge (see the blue crosses in Figure 6).

Figure 6: The microbloggers whose are arranged in top 1500; the red nodes represent the VIP users.

We take a similar experiment to verify the effectiveness of the fan based flux density . Figure 7 shows microbloggers whose are arranged in top 1500 with their blog number, in which the red nodes represent VIP users. The number of VIP users in those blogger set is 656, accounted for about 73%. Some microbloggers are not VIP users even when their are very high (see the blue crosses in Figure 7).

Figure 7: The microbloggers whose are arranged in top 1500; the red nodes represent the VIP users.

Then, we select the microbloggers whose are arranged in top 1500. Figure 8 shows them with their blog number, fan number, and . The number of VIP users (see the red nodes) is 862, which is accounted for about 95% VIP users. That means most of VIP users are included in those 1500 microbloggers.

Figure 8: The microbloggers whose are arranged in top 1500; the red nodes represent the VIP users.

To take the further validation of Hypothesis 1, we select the microbloggers whose microblogging flux is arranged in top 1500. Those users are shown in Figure 9, associated with their blog number, fan number, and the microblogging flux. The number of VIP users in those 1500 microbloggers is only 246, accounted for about 18% VIP users. That means most of VIP users are not included in those 1500 microbloggers.

Figure 9: The microbloggers whose microblogging flux is arranged in top 1500; the red nodes represent the VIP users.

According to the above experiments, we can conclude that Hypothesis 1 < Hypothesis 2 < Hypothesis 3 < Hypothesis 4. fully considers all the factors that could affect one’s popularity, and it is the best metric to evaluate one’s popularity in the microblogging platform.

5.2. Comparison with PageRank Algorithm

We also compare our evaluation model with classical PageRank algorithm. PageRank is a link analysis algorithm and first used by the Google web search engine, which assigns a numerical weighting to each element of a hyperlinked set of documents, with the purpose of “measuring” its relative importance within the set. The algorithm may be applied to any collection of entities with reciprocal quotations and references. The numerical weight that it assigns to any given element is referred to as the PageRank of and denoted by as shown in the following formula:
where are the pages under consideration, is the set of pages that link to , is the number of outbound links on page , and is the total number of pages. When calculating PageRank, pages with no outbound links are assumed to link out to all other pages in the collection. Their PageRank scores are, therefore, divided evenly among all other pages. In other words, to be fair with pages that are not sinks, these random transitions are added to all nodes in the Web, with a residual probability usually set to , estimated from the frequency that an average surfer uses his or her browser’s bookmark feature. The visualization of the steps of PageRank algorithm is shown in Figure 10. We initialize PR value of every node as 1; and then we update the PR value of each node according to formula (7) as shown in Figures 10(b)–10(e).

Figure 10: Visualization of the steps of PageRank algorithm.

PageRank algorithm has been applied to traditional microblogger popularity evaluation. To compare our evaluation model and the classical PageRank algorithm, we first construct an interactive network according to the fan list and watch list of microbloggers. Then, PageRank algorithm is applied to calculate the PR value of each blogger. Blogger IDs with high PR value will seem as popular stars on the microblogging platform. We sort the microbloggers by their PR value in descending order; the top bloggers are selected as Blogger . Similarly, we calculate the of every microblogger and sort them by their in descending order. The top bloggers are selected as Blogger . The intersection of the two sets is denoted as . In this context, the similarity of the results from the two algorithms can be calculated as follows:
where returns the number of elements in the set. To compare the results of two algorithms, we calculate with various values as shown in Figure 11. In our data set, the total number of VIP users is only 901, which accounted for nearly 10% of all the microbloggers. The actual number of star bloggers might be lower than this value. As shown in Figure 11, as grows from 1 to 1000 (accounted for nearly 10% bloggers in the data set), is very low. grows rapidly as grows from 1000 to 3518. gets its local maximum value (0.6) when is set as 3518. However, further increasing the value of from 3518 to 5987 after this point is counter-productive, as indicated by the significant drop in . Then, grows with the growth of and gets its global maximum value (1) when is set as 8945. From this experiment, we can find that the results of the two algorithms are different to a great degree.

Figure 11: The similarity of the results of the two algorithms.

VIP ratios in top bloggers sorted by the two algorithms are denoted as and , which are defined as follows:
where returns the number of VIP users in the blogger set. As shown in Figure 12, as grows from 1 to 1000 (accounted for nearly 10% bloggers in the data set), grows slowly, while for , it grows slowly when grows from 1 to 100; then it rises rapidly as grows from 100 to 1000. When is set as 1000, and . We observe that our blog-fan based flux density, , achieves the overall highest accuracy; apparently, it is better than the traditional PageRank algorithm in measuring one’s popularity in the microblogging platform.

Figure 12: VIP ratios in top bloggers sorted by the two algorithms.

5.3. The Current Popularities of Microbloggers

The popularities of microbloggers are measured from overall point of view. In practice, the activities of microbloggers are changing over the time; therefore, their popularities also change dynamically. In this subsection, we analyze the current popularities of microbloggers. For every blog of a given microblogger, we keep track of its feedbacks. The feedback information of microblogs posted by ID1087770692 from August 4, 2011, to May 15, 2013, is shown in Table 3. The current popularity of a given microblogger is calculated as follows:
where represents the blog set posted by the given microblogger one day; represents the number of blogs posted by him/her this day. As the number of one’s fans remains stable over the time, we use the total number of one’s fans to represent the number of fans on the given day. For example, ID1087770692 posted three microblogs on August 4, 2011, and the number of his fans is 33,616,621. Therefore, the current popularity on August 4, 2011, can be calculated according to formula (10), which is . We can perform the analogous method to calculate the current popularities on the other days.

Table 3: The feedback information of microblogs posted by user “1087770692” from August 4, 2011, to May 15, 2013.

We chose four microbloggers to compare their current popularities and analyze their changing over the time. As shown in Figure 13, the current popularities of the four microbloggers may vary in a period, but the overall trends are relatively stable. The current popularities of ID1087770692 are mainly concentrated in the range between 10−5 and 10−4; the current popularities of ID1038330705 grow from 10−5 to 10−3; the current popularities of ID1025582437 drop from 10−1 to 10−3; the current popularities of ID1041508671 are mainly concentrated around 10−4. The microbloggers may sometimes post some interesting original blogs, which will attract many feedbacks from others; then his/her current popularity on that day may be a very high value. In general, the feedbacks of one’s blogs mainly come from his/her close friends; therefore, the number of feedbacks of one’s blog will remain stable over the time.

Figure 13: Current popularities of microbloggers from 2011 to 2013.

6. Conclusion

The popularities of microbloggers are hard to evaluate based on transitional network structure models (PageRank algorithm [14]). Some microbloggers may build a very long watch lists, the network built based on fan lists and watch lists cannot truly reflect one’s connection strength, or popularity. In this paper, we proposed an electromagnetic field theory based model to analyze the popularities of microbloggers. The concept of the source in microblogging field is first put forward, which is based on the concept of source in the electromagnetic field; then, one’s microblogging flux is calculated according to his/her behaviors (send or receive feedbacks) on the microblogging platform; finally, we used three methods (, , and ) to calculate one’s microblogging flux density, which can represent one’s popularity on the microblogging platform. In the experimental work, we evaluated our model using real microblogging data and found that can best reflect one’s popularity compared with other two metrics. We also compared our model with the classic PageRank algorithm; and the results show that our model is more effective and accurate to evaluate the popularities of the microbloggers. The contributions of this paper can be summarized as follows: (1) the proposed popularity evaluation metric- is effective and reliable to evaluate the real influence of bloggers in Sina microblogging platform; (2) the popularities of microbloggers are different over the time; however, their overall trends are relatively stable. Some big oscillations may happen due to the contents of their released blogs.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgment

This work is supported by the NUAA Fundamental Research Funds (NS2013090).

K. Lerman and R. Ghosh, “Information contagion: an empirical study of the spread of news on digg and twitter social networks,” in Proceedings of the AAAI Conference on Weblogs and Social Media, pp. 90–97, 2010.

I. Guy, N. Zwerdling, I. Ronen, D. Carmel, and E. Uziel, “Social media recommendation based on people and tags,” in Proceedings of the 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '10), pp. 194–201, July 2010.View at Publisher · View at Google Scholar · View at Scopus