Here is the raw mysql-data [35mb] for all tweeps and their internal relations. The format is a mysql-database, and the easiest is to import it to a local mysqldatabase and then export it a format of your choice (such as gefx, gml or similar). The file, of course, is quite big. Have fun. If you use the data, please let me know!

Data is collected from Twitter’s API (api.twitter.com). For every account tested, we download the 75 most recent tweets. The language of these tweets are analyzed (as one text), after removing hashtags, links and user names. The language of the tweets are identified through the open source library Pear LanguageDetect. The language identification recognizes specific three letter combinations that are specific for that language (Cavnar & Trenkle, 1994).

After an account has been identified as Finnish, all friends and followers of that account will be add to a queue to be analyzed. This process is iterated until all accounts (and there followers and friends, and their followers and friends, etc) have been scanned, and no more Finnish speaking accounts are found.

Users not recognized as Finnish, accounts without tweets and protected accounts are excluded.

Schematic of Twitter Census methodology

All together 1,333,448 accounts have been scanned (plus the ones identified as Finnish). This is the total of unique accounts following or being followed by a finnish speaking account. Of these 222,080 have written zero tweets and can therefor not be analyzed (but MANY are “spam” accounts), 62,834 accounts are protected (and can not be analyzed), 1016 are suspended (normally due to spamming activities), 474 can not be found and the rest (1,047,044) have been identified as writing in another language than Finnish.

The data extracted about the Finnish speaking twitter account is the source for all the statistics and conclusions in Twitter Census. The relationships between the accounts are later used to create the network graph of the Twitter population.

Twitter in Finland

Twitter in Finland

Twittercensus for Finland will be released on 19th of February 11.00 (GMT+2) on this web site.

The site will contain statistics about Twitter in Finland, as well as a graph over all Finnish speaking twitter users active users in Finland. We will present the totalt number of users, active users, numbers of tweets sent and lots of more statistics! Stay tuned for more information.

A live presentation with the results of the Finnish twittercensus will published on this site February 19th.

And oh, a teaser! In the next post you can find a sample of the Finnish twitter graph!