#StrataConf Dreamcatcher

I’ve been listening to the #stratconf live feed this morning, tales of goodness relating to some of the things we can start to do with “Big Data”. I also had a peek at the Twitter feed, which looked to be swamped by spambots and pr0nbots…

This amused me no end – I imagine that IRC is still the favoured communication channel for big data developers, but Twitter is the home of big data’s social media loving evangelists. To be in any way useful to observers, the backchannel needs filtering, something that I suspect many of the attendees might have solutions for…

One of the reasons I was looking at the hashtag feed was to sketch a quick map of the folk commonly followed by recent tag users. Whilst the bots tend to fall out of making any contribution to the map (the way they follow and/or are followed by other accounts using the hashtag is atypical compared to “legitimate” members of the hashtag community, so they tend to get filtered out) the large number and quick fire tweeting behaviour of the bots trashes my sampling method: I tend to grab tweets using the Twitter search API, which gives me access to the 1500 most tweets containing whatever search term I use. If 50% of those tweets are spam, it reduces my sample size… (I guess I need to hook up a streaming API collection mechanism…If you have a tweepy recipe to share for collecting samples of N tweets from a stream on a particular search phrase, please post a link (or code) in the comments;-)

What we really need is a filter. A dreamcatcher, maybe?

• “Long ago when the word was sound, an old Lakota spiritual leader was on a high mountain and had a vision. In his vision, Iktomi, the great trickster and searcher of wisdom, appeared in the form of a spider. Iktomi spoke to him in a sacred language. As he spoke, Iktomi the spider picked up the elder’s willow hoop which had feathers, horsehair, beads and offerings on it, and began to spin a web. He spoke to the elder about the cycles of life, how we begin our lives as infants, move on through childhood and on to adulthood. Finally we go to old age where we must be taken care of as infants, completing the cycle. But, Iktomi said as he continued to spin his web, in each time of life there are many forces, some good and some bad. If you listen to the good forces, they will steer you in the right direction. But, if you listen to the bad forces, they’ll steer you in the wrong direction and may hurt you. So these forces can help, or can interfere with the harmony of Nature. While the spider spoke, he continued to weave his web. When Iktomi finished speaking, he gave the elder the web and said, The web is a perfect circle with a hole in the center. Use the web to help your people reach their goals, making good use of their ideas, dreams and visions. If you believe in the great spirit, the web will filter your good ideas and the bad ones will be trapped and will not pass.”
• Native Americans believed that dreams were floating in the air.
• The Bad dreams would get caught in the web and expire when the sun rose while the good dreams would go through the center and then flow down through the feathers to the sleeping individual to come to them.
[Native American Dream Catchers]

So how do we catch the bad dreams, the unwanted tweets from the spam bots, and let the useful tweets through? Or how do we let the spam tweets pass through and capture the good ones? Here’s a quick sketch I made of the #strataconf hahstag community, showing how a the people who sent a sample of 1500 recently so-tagged tweets follow each other, layed out in Gephi using the ARF layout function with nodes sized according to eigenvector centrality:

The unconnected grey nodes in the upper right sector are the bots, typically. The connected nodes are folk who are interested in the topic and who follow each other. Even if the spam bots follwed some of these parties, we could still identify them, for example by sizing node in-degree using a non-linear mapping:

So in my mind’s-eye I have a tweetcatcher that catches tweets from folk who are part of the greater connected component by virtue of having one of more folk in the community follow them and discards the rest…