[Eureka] – “Will you walk down the high street while one of our other dudes vacuums the street? We’ll give you 10% of the sales”

[Coco] – “Deal! Can I wear what I want? If I’m gonna look mad I might as well do it in style….”

[Eureka] – “Deal!”

(Disclaimer: The above is ALL MADE UP)

Back of the Beermat later….

Facebook views as I took the screenshot: 15,777,263….. nice.

One percent convert to sales? A long shot but hey, it’s madness this morning.

So 157,772 sales at $219 as let’s be honest you want the one that Coco get’s someone to clean the street with…. $34,552,205.97. Nice.

Coco walks about with $3.4m in her back pocket (assuming the getup has pockets).

Not bad for an hour’s work, a bit of mockery on Facebook and Youtube, so odd headlines about you but hey, the exposure is priceless. Eureka have saved a fortune on Youtube CPM fees and a full marketing campaign.

That doesn’t even take into account the outfit and what the baby is wearing. Now if you could scan the image into an app and find out about it…… Oh Kim’s working on that already….

The Slightly Longer More Involved Answer

While the “yes I could” answer still stands, the quality of the answer is a different matter altogether. Mainly because the variables are completely wayward at this point. Take the population of London, 8.539m and the two friends Tim’s bumped into.

(1 / 8,539,000) * 100 = 0.00001164%

We really need a better set of variables to work this out in a more refined manner.

How many people does Tim know?

What time of day was the Tube journey?

What’s the average capacity of a full London Underground train?

How Many People Does Tim Know?

Psychologist Richard Wiseman says we know about 300 people by first name. Now Tim’s Twitter profile maintains he has 1,400+ followers but let’s be fair he could have bought some of those 😉 – so I’m sticking with 300.

What’s The Average Capacity of a Full London Underground Train?

Different trains, well they have different carriages and capacities. So we’ll go with an average. There’s a nice list on Wikipedia. So I’ve got a Steve Reich (let’s see who gets that joke!) number of 816.

What Time of Day Was Tim Travelling?

Passenger volume on the Underground is not a constant. You can have a percentage of capacity (or over capacity in the morning/evening rush hour), as the Economist has previously reported.

So depending on what line, the time of day and how many stops between departure and destination has a very large bearing on how many travellers Tim will encounter. Now as I don’t have that information to have our result is going to a fairly wide tolerance as not to be accurate.

Let’s Try and Work Something Out

So assuming all of Tim’s 300 friends are on the underground at the same time in a busy station at a busy time of day. There’s no real nice way to work this out, looking around there are different theories but I’ll plump with this one.

(Timmy’s Friends + People On The Train) / London Population

(300 + 816) / 8,539,000 = 0.01%

Now that assumes a lot, the train is full and no one gets on or off between the stations. Even if over 10 stops 200 people get off and 200 new people get on, so 2816.

(300 + 2816) / 8,539,000 = 0.03649%

Still a small amount.

But that’s the end, because Tim bumped into two people he knows.

(299 + 2816) / 8,538,999 = 0.03647%

0.03649 * 0.03647 = 0.001330%

I’m still not 100% convinced that’s correct, for a varying number of reasons. The variables for a start are so inaccurate as we don’t really know them, we’re making guesses. You could safely add a 20% tolerant number line each side and still be way off the mark.

“First, it’s amazing that we can put a number to something that you might think of as random, like running into a friend. Second, the number delivered by our spectacular calculation was meaningless. No way everyone in New York is going to be outside at the same time and distributed randomly so I could run into them in a controlled way. Fuggaboutit! As the statistics professor put it, “These assumptions are ridiculous, of course!””

Cameron Will Be Gone Tomorrow(?)

There is a Better Gauge

Twitter with it’s automated tweets and “let’s see if we can get this trending” is something I certainly don’t trust. It’s not a gauge of X-Factor or The Voice winners so I don’t really give it much hope for anything else these days.

Edward Snowdon on the other hand did hit the sentiment right.

An even better gauge on predicting the future is to see who’s putting real money on it.

Show Me The Odds….

And low and behold, there’s a couple of open books on when David Cameron will no longer be Prime Minister. For him to leave in 2016 there are two odds 2/1 and 7/4.

Betting odds are just another way of showing probability. Easy worked out too. So 2/1, we can call A = 2 and B = 1, A/B. To work out the percentage probability of those odds as % = B / (A + B).

I can even wrap that up in a Clojure function:

(defn calc-prob [a b] (double (/ b (+ a b))))

So Ladbrokes are giving us 2/1, let’s have a look with my new function.

user> (calc-prob 2 1)
;; => 0.3333333333333333
user>

So 33% chance…. ok, let’s look at Betfair at 7/4.

user> (calc-prob 7 4)
;; => 0.3636363636363636

Let’s Consult Sherman Kent

Sherman Kent retired from the CIA in 1967, one of his legacy’s though was a chart, a real simple one, on the potential outcome based on a probability. A “fair chance” of success was defined as 3 to 1 against success by an advisor, Kent needed a way of interpreting what “fair chance” and words like “probable” were from advisors, so he came up with this table.

Certainty (%)

General Area of Possibility

100%

Certain

93% (+/- 6%)

Almost certain

75% (+/- 12%)

Probable

50% (+/- 10%)

Chances about even

30% (+/- 10%)

Probably not

7% (+/- 5%)

Almost certainly not

0%

Impossible

So looking at our 33-36% betting probability of Mr Cameron packing up in 2016, it’s looking as a “probably not”.

To Summarise

When people put money on things they have a certain confidence that the event is going to happen. That sorts out the serious folk from the armchair opinions straight away. So it’s a good idea to consult these odds to see if they agree or disagree with the hypothesis. It’s just another piece of information.

Thrown in with a quick Clojure lesson, betting probability and a history lesson on Sherman Kent, well that’s not a bad evening’s work.

What does your to do list look like today? Don’t worry, it’ll all be over soon according to a group reported in the Guardian yesterday. Now I, for one, am not amused by this news at all, not today of all days, there’s too much cool stuff coming up over the next period so, well the end of the world can just stop right there.

All The Coin Flips, Dead or Alive?

I’ll keep it simple, there are only two outcomes. We’re either still breathing or we’re all done for. Now there’s the best part of 40 predictions that I’ve seen predicting the end of the world. So 0.5 to the power of 40…..

user> (Math/pow 0.5 40)
9.094947017729282E-13

That’s a lot of zeros. 0.0000000000009094% chance it is then. I think I’ll get a bottle of milk in the morning after all.

The Story So Far

In previous posts I’ve covered basically loading data in Spark (with Sparkling in Clojure) and doing some half funky stuff with it. That’s all very well and a good point for starting with, but it’s a touch limiting. Ultimately it’s very easy to get some numbers out, crack some percentages and plot a 2d graph, Google Map or infographic.

What I want to do is something far more interesting than that (in my eyes), use some machine learning to create new things based on what we have.

Markov Chains

With a sufficient amounts of text we can do some interesting things. The nicer thing about Markov Chains is they are simple in terms of how they work.

With a corpus of text loaded we can create some fresh output text. More text, better results. A Markov Chain is will randomly walk an existing lookup, based on the corpus text, and randomly select the next word to use. By looking at the previous words in the original corpus the chain can weight what the next random word should be.

Examples I’ve seen have created Paul Graham startup stories and Garfield cartoons. I could create my own St Vincent song, in fact that’s what I’ll do.

How To Create New St Vincent Songs

“Jase, I think you might like this….”, said my dear friend, sound engineer and my soundscape recordist, Dez Rae. He was right. That was in 2010/2011 before rock royalty beckoned for Annie Clark (and rightly so)… I bought what I could on the spot, it was so unique.

The great thing is the variety of songs, no two come near each other and no two albums are the same.

The Corpus of Annie Clark

In a text editor I’ve copied/pasted the lyrics from the Strange Mercy album.

I spent the summer on my back
Another attack
Stay in just to get along
Turn off the TV, wade in bed
A blue and a red
A little something to get along
Best find a surgeon
Come cut me open
Dressing, undressing for the wall
If mother calls
She knows well we don't get along

An album full of lyrics (all copyright to Annie Clark I hasten to add), all the blank lines taken out, that’s our corpus.

Markov Chain Code In Clojure

Now I need some code to so the Markov Chain, I’m not writing it this time, someone else has done the work far better than I could of in Clojure so I’m using his.

Like I said, with a corpus of text loaded in the program will look at next words and create a lookup of words and scores. When I generate new sentences the next word will be governed by the lookup table and word scores. Simple.

markov.core> (-main "/Users/jasonbell/Documents/stvincentlyrics.txt")
("Oh little one I guess it makes my mulling days, through my lesson" "Chloe in just to get along" "Your hometown is" "I've told whole lies" "Let's not a party I owe you ever really care for me?" "But when you ever really stare at you could take us?" "Chloe in the tiger" "My own heels" "Did you say it was the piles\"" "While you" "Heal my clothes on" "But when you went off the tiger" "I've told whole lies" "Bodies, can't you can limp beside you ever really stare?" "Tried so they left more")

Which looks pretty neat….

Oh little one I guess it makes my mulling days, through my lesson Chloe in just to get along Your hometown is I’ve told whole lies Let’s not a party I owe you ever really care for me? But when you ever really stare at you could take us? Chloe in the tiger My own heels Did you say it was the piles While you Heal my clothes on But when you went off the tiger I’ve told whole lies Bodies, can’t you can limp beside you ever really stare? Tried so they left more

It’s still copyright to Annie Clark, they’re still her words just a little more random. If I was going for a title, “My Mulling Days” would be a front runner.

I could have put all the lyrics from all the albums in and come up with a more refined lyric set, but as a test and a wee tribute to one of my favourite artist’s, it’s a good start.

Do We Need An Executive?

So it looks like Stormont is getting a longer break than was originally planned. Which means that NI open data is going to be thin on the ground for new MLA questions. So in the meantime let’s turn the building into a Data Centre (we could ask Arlene if INI will fund it, she’s still there, she’s managed to hold on things….)

So I’ve got my new data centre.

With no MLA’s asking questions though we want to generate some to give the impression that something is happening up there. All those potential FDI clients will want to see the powerhouse working…. If we do a well enough job we would let the Markov Chains just do the work altogether but let’s not get ahead of ourselves just yet.

Repurposing NIAssembly Spark Code

I’m going to extract the question text from the MLA questions. I’m going to use the NI Assembly Spark code (you can read part 1 and part 2 if you want to know the inner workings) and extract just the text.

(:questiontext question)) qs))) mqrdd))
#'mlas.core/qtext
mlas.core> (spark/first qtext)
("To ask the First Minister and deputy First Minister for an update on the delivery of their Programme for Government 11/15 commitments." "To ask the First Minister and deputy First Minister for an update on the delivery of their Programme for Government 11/15 commitments." "To ask the Minister of Enterprise, Trade and Investment whether any of his departmental responsibilities have been affected by the actions of any proscribed organisations since 2011.")
mlas.core>

That’s the first element of the RDD and it has three questions. There’s a lot more…. a whole lot more.

I want to save this out as a text file which requires a bit more mapping.

Random MLA Question Generation

With 94,000+ questions to train my Markov Chain I’m expecting some interesting results. I only want to generate one question at a time so I can remove the loop (where I was generating 15 lines for generating St Vincent lyrics.

I’m going to run this from the REPL so I’m not reloading and reindexing all the text. Let’s create some MLA questions for next week.

markov.core> (def markov (transform (lazy-lines "/Users/jasonbell/Documents/mlaquestions.txt")))
#'markov.core/markov
markov.core> (generate-sentence markov)
"To ask the First Minister of Finance and deputy First Minister what steps are entitled to ensure greater weight is the reasons that no reports into the Housing Executive Gateway Reviews his Department has been allocated to outline the Minister for Social Services and to a CCEA test; and (vi) South Armagh city area."
markov.core> (generate-sentence markov)
"To ask the Ethnic Development what recruitment process used to detail, broken down by (i) who are assessed as possible help graduates in the Minister and Personnel for each spouse or not personally signed off a whole."
markov.core> (generate-sentence markov)
"To ask the cost, of Ulster in the Minister of order an organisation, broken down by Health and Learning for exemption."
markov.core> (generate-sentence markov)
"To ask the Minister of the last three years."
markov.core> (generate-sentence markov)
"To ask the First Minister what sentences would bring forward to July bonfires on the progression on planning application for rural area of Health, Social Services Directive; and location and what they are assisting these guidelines; and Leisure for Social Services and Rural Development what additional counselling, including those in 2008/09."
markov.core> (generate-sentence markov)
"To ask the First Minister and (ii) if so (ii) whether students with identities outside the number of the Employment and whether the Office of the Environment Minister."
markov.core>

To be honest that was far too much fun!

Taking It Further

If you have access to plenty of text then you can run Markov Chains to produce new content with little difficulty. For a more refined method it’s worth looking at Artificial Neural Networks which is being used by some publishers for content creation.

All in all, to save Northern Ireland from having no news whatsoever…. well I’ve done my bit 🙂

Artists can command power, it’s a universal law. Madonna did it, Lady Gaga did it and now Taylor’s doing it too. Fine, but this time it didn’t go far enough.

While correctly arguing that all artists should be paid for their creativity and, so it seems, getting Apple to reverse a decision on not paying artists for the streaming trial period. Smaller artists still lose out in the long run.

The power law in action once again, only the top artists will make the income, the rest will scramble around the long tail.

What should have really been discussed is the value for each stream across the entire lifetime. It falls way below anything that an artist got in traditional CD sales. And while the internet has created the vast distribution network the long term payouts aren’t that great.

Taylor should have added another paragraph about the amount of money paid to artists.

Yesterday evening I posted a bunch of predictions without resorting to data mining, Twitter analysis or reading anything by Nate Silver. Just good old guessing. In terms of a result then guessing didn’t do me too bad. Result 14/24 (58.3%)

[Update: Turns out that FiveThirtyEight’s predictions in the “top six” were the same,we got 5/6 (83%) and missed on the best director. I would have liked to have seen how Nate and Co. managed on the other categories which are much harder to predict.]