Archive for the 'Blogging' Category

Financial Times has a long article describing the The rise and fall of MySpace. It’s a story full of bad timing, missed opportunities, suits vs. geeks, personalities, and, I suppose, random chance events. I hope at least a few fossils from our age will be preserved for future generations to study.

It’s a little busy, but it overlays a lot of information over the world map.

“The map visualises the number of active bloggers, social networkers, video sharers, photo uploaders and microbloggers. The length of the curve represents the penetration and the size represents the universe size. We have also included the actual numbers so you can use and apply the universe estimates.”

I was surprised to see the variation in popularity of the different modalities.

Twitter is adding support for geotagging tweets to their API which will make Twitter a richer source of real-time news. The Twitter blog reports:

“Twitter platform developers have been doing innovative work with location for some time despite having access to only a rudimentary level of API support. Most of the location-based projects we see are built using the simple, account-level location field folks can fill out as part of their profile. Since anything can be written in this field, it’s interesting but not very dependable.

We’re gearing up to launch a new feature which makes Twitter truly location-aware. A new API will allow developers to add latitude and longitude to any tweet. Folks will need to activate this new feature by choice because it will be off by default and the exact location data won’t be stored for an extended period of time. However, if people do opt-in to sharing location on a tweet-by-tweet basis, compelling context will be added to each burst of information.”

This opens up lots of interesting opportunities but there is still room for geotagging from conent. There are more than one relationship between a Tweet (or any utterance) and a location. They include both were the tweeter was when it was issued but also the location of the event or object that’s the tweet’s subject.

For example, the Baltimore police use twitter to inform the press and public about about significant crimes, major traffic problems and other events. There are 10-15 tweets a day in this stream, all sent by an officer in the BPD Public Affairs department. The majority of the tweets mention a location (e.g., “Shooting on Lafayette Ave, Suspect in Police custody, handgun recovered.”) but are, I assume, sent from Public Affairs office. Baltimore city covers a large area, more than 80 square miles. Many residents or reporters will be interested only in events in or effecting the neighborhoods where they live, work or pass through when commuting.

I also wonder if there are more opportunities for Twitter to add semantic metadata to Tweets via their API.

While it’s tempting to poke fun at the apparent contradictions involved, it’s easy to see a difference. Its well known that there are many vulnerabilities on the Web that can result in compromising a computer and that they are more likely to be encountered in open, popular environments, like social media systems. So it’s prudent to limit access to some of these from networks like NIPRNET that are used for sensitive information. On the other hand, we assume that the computer used by Admiral Mullen and his staff for public announcements and PR are on conventional networks, so the risks asscociated with security problems are greatly reduced.

Elinor Mills of cnet reports that the DOS against twitter, facebook, livejournal and blogger were focused on a single Russian blogger using the name Cyxymu (??????).

A pro-Georgian blogger with accounts on Twitter, Facebook, LiveJournal and Google’s Blogger and YouTube was targeted in a denial of service attack that led to the site-wide outage at Twitter and problems at the other sites on Thursday, according to a Facebook executive.

The blogger, who uses the account name “Cyxymu,” (the name of a town in the former Soviet Republic) had accounts on all of the different sites that were attacked at the same time, Max Kelly, chief security officer at Facebook, told CNET News.

“It was a simultaneous attack across a number of properties targeting him to keep his voice from being heard,” Kelly said. “We’re actively investigating the source of the attacks and we hope to be able to find out the individuals involved in the back end and to take action against them if we can.”

The special issue, invites contributions that show how synergies between Semantic Web and Web 2.0 techniques can be successfully used. Since both communities work on network-like data structures, analysis methods from different fields of research could form a link between those communities. Techniques can be – but are not limited to – social network analysis, graph analysis, machine learning and data mining methods.

Yesterday we discovered that our ebiquity blog had been hacked. It looks like a vulnerability in our old WordPress installation was exploited to add the following code to the top of our blog’s main page.

This code caused URLs like https://ebiquity.umbc.edu/?qq=1671 to redirect to a spam page. We’ve upgraded the blog to the latest WordPress release, which hopefully will prevent this exploit from being used again. (Notice the reversed URL — LOL!)

We discovered the problem though a clever trick I read about last year on a site I’ve forgotten (maybe here). We created several Google alerts triggered by the appearance of spam-related words on pages apparently hosted by ebiquity.umbc.edu. For example:

adult OR girls OR sex OR sexx OR XXX OR porn OR pornography site:ebiquity.umbc.edu

viagra OR cialis OR levitra OR Phentermine OR Xanax site:ebiquity.umbc.edu

I would get several false positives a month from these alerts triggered by non-spam entries on our site. In fact, *this* post will generate a false positive. But yesterday I got a true positive. Looking at the log files, I think I got the alert within a few hours of when our blog was hacked. So I am happy to say that this worked and worked well. Without this alert, it might have taken weeks to notice the problem.

The results of this Google search reveal many compromised blogs from the .edu domain.

“Which languages make programmers the happiest? … I decided to do a little market research. I scraped the top 150 most recent tweets on Twitter for the query “X language” where X was one of {COBOL, Ruby, Fortran, Python, Visual Basic, Perl, Java, Haskell, Lisp, C}. Then I asked three people on Amazon Mechanical Turk to verify that the tweet was on the topic. If so, I asked if the tweet seemed positive, negative or neutral. …”

We maintain Planet Social Media Research (SMR) as a feed aggregator for a set of blogs relevant to research in social media systems. A few days ago I noticed that it wasn’t including new posts from some of the blogs. After updating the Planet Venus software we use and poking around I discovered that our server is unable to access any feeds that resolve to Feedburner.

Apparently Feedburner has a blacklist of IP addresses that it blocks and our server must now be on it. We have a request in to straighten this out and hope that everything will be back to normal very soon. ( I was to get our own blog back onto Planet SMR because I reconfigured the system to revert to the old, non-Feedburner feed.)

We’ve not yet heard from Feedburner/Google and don’t know why we are on their blacklist. It’s unlikely to be a result of our accessing feeds too frequently: we rebuild the site and aggregated feed once an hour and only about ten of our feeds resolve to feedburner.

My speculation is that this is collateral damage in the global war on spam. The easiest way for splogs (spam blogs) to get content is to hijack feeds from other blogs. Web spammers can do even better at disguising their splogs as legitimate sites if they aggregate several feeds that are topically related.

One way to fight such splogs is to deny them access to the feeds. So Google could be trying to protect Feedburner users and also be a good steward of the the Web environment by blocking suspected web spammers from the feeds hosted by Feedburner.

So, my guess is that the Google thinks that the Planet SMR site is a splog. We are not, of course. We only include the feeds of blogs that want to be on SMR. We also do not host any ads, which is a motivation for most splogs.

If our speculation is right, and Google is blocking our access because it thinks we are a splog site, then there will be many other legitimate feed aggregator sites that have or soon will have this problem.

By the way — we are always interested in suggestions for new blogs to add to Planet SMR. If you have or know of one, contact us as planet-smr at cs.umbc.edu.

update 5/8: We’ve identified and solved the problem, thanks to Google Freebase ‘community expert’ Franklin Tse. The problem was due to our having an old entry for the freebase IP address in the server’s /etc/hosts table. I think we added when we were having some technical difficulties some years ago and wanted to keep our key services running smoothly. I guess the trouble with quick temporary hacks is that they’re easy to forget and come back to bite you.

“Frankly, I think a lot of twittering is somewhat faddish, whereas I never thought Facebook was. … People I interviewed and surveyed would talk of serious feeling of deprivation without Facebook and I’ve hardly heard anyone say that about twitter,” Zeynep Tufekci, an assistant professor who teaches the sociology of technology at the University of Maryland, Baltimore County, wrote in an e-mail. “Will people Twitter five years from now? Perhaps, but I would not be surprised if they did not, or at least as much.”

Traditional newspapers are in a crisis. Last week the 150 year old Rocky Mountain News published its last issue and the Philadelphia Inquirer filed for bankruptcy. Experts have been saying for some time that the newspapers need to focus on one aspect that can not be commoditized — local news. It’s also clear that news content delivered via ink on dead trees is not a working model for the future.

The New York Times is about to announce that it is starting a hyperlocal product called The Local working with our students at CUNY’s Graduate School of Journalism. PaidContent has the story early. So I’ll tell you about the school’s and my involvement and plans.

At CUNY, we were working on a hyperlocal plan of our own, aimed at taking one New York neighborhood and turning it into the ultimate hyperlocal community as a showcase to both demonstrate how a community could be empowered to report on itself and to create a laboratory where our students could learn to interact with the public in new and collaborative ways. The problem with teaching interactive journalism, which is what we call my department, is that students don’t have a public with whom to interact.

Late last night Facebook CEO Mark Zuckerberg announced in a blog post, Update on Terms, that they have rolled back the recent changes to their Terms of Service agreement and restored the previous one.

“Many of us at Facebook spent most of today discussing how best to move forward. One approach would have been to quickly amend the new terms with new language to clarify our positions further. Another approach was simply to revert to our old terms while we begin working on our next version. As we thought through this, we reached out to respected organizations to get their input.

Going forward, we’ve decided to take a new approach towards developing our terms. We concluded that returning to our previous terms was the right thing for now. As I said yesterday, we think that a lot of the language in our terms is overly formal and protective so we don’t plan to leave it there for long.”

In his post, Zuckerberg continued by observing that with 175 million members, if it were a country, it would be the sixth most populated one in the world. Of course, sometimes a population revolts and lays claim to certain unalienable rights, among theme being life, liberty, pursuit of happiness and ownership of one’s online content.

“You may remove your User Content from the Site at any time. If you choose to remove your User Content, the license granted above will automatically expire, however you acknowledge that the Company may retain archived copies of your User Content.”

This revision is dated 23 September 2008. Curiously, I checked the Internet Archive to review the history of FB’s TOS but found that there are no archived copies after 12 October 2007. I can only imagine that FB asked the Internet Archive to stop saving copies of this public page. I note that the last archived copies of many of their public pages (e.g., privacy policy, developers page, etc.) are also from 2007. These pages are not blocked by the FB robots.txt and are normally accessible to anyone, so it must be by a specific request that they not be archived.

That’s too bad. Having an easy way to see how the policies of important social sites like FB evolve would be a great resource to those who study online social media as well as to many curious users.