While working on the Sporting Life* website for the Press Association I was working on a Perl script, quite a beefy one, to populate pools coupons so people could play online.

All morning I was fixated on a bug and I couldn’t see the wood for the trees. My boss sat opposite but didn’t say a word, nor did I realise he was teaching me. After a while he decided the time was right, “Jase, go for a walk.”, I was blunt, “No, not until I’ve fixed this bug….”, “Jase, go for a WALK!”. I got the hint…..

The Press Association car park is a fair size so I did a lap, just the one. All the while during that lap I was talking under my breath about such an absurd command from my boss. My first proper programming job and I was than impressed…..

That all changed in an instant. I opened the door to the office, walked to my desk and before I even sat down pointed at the screen and said, “Oh look, there’s a comma missing….”, made the correction and it worked first time.

Stuck with a problem? Go for a walk.

* Two milestones of my programming career being one of the first involved in the very first online betting platform and second, the first online pools coupon….. this coming from the man who has no interest in sport at all.

Who Let Him Up There Again??

Last year at ClojureX I did an introduction to Onyx, this year it’s about what I really learned at the coal face. I’ll be talking about how I bled all over Onyx with a really big project.

This time though, no naff jokes, no Strictly Come Dancing and Linear Regression*, no temptation to use that Japanese War Tuba picture. It will be about designing streaming applications, task life cycles, heartbeats, docker deployment considerations and the calculating log volume sizes for when you’re on holiday.

I’m looking forward to it. If you are interested in the current schedule you can read that here, if you want more information on the conference then that’s on the SkillsMatter website.

* If you’re interesed the Darcey Coefficient is (as a Clojure function):

The Prologue.

Recently I’ve been very curious, I know that alone makes people in tech really nervous. I was curious to find out the first mentions of BigData and Hadoop in this blog, April 2012 and the previous year I’d been doing a lot of reading on cloud technologies and moreover data, my thirty year focus is data and right now in 2017 I’m halfway through.

The edge as I saw it would be to go macro on data and insight, that had been my thought ten years earlier. The whole play with customer data was clear in my mind then. In 2002 though we didn’t have the tooling, we made it ourselves. Crude, yes. Worked, it did.

When I moved to Northern Ireland I kept talking about the data plays to mainly deaf ears, some got it. Most didn’t. “Hadoop, never heard of it”. Five years later everyone has heard of Hadoop… too late.

It’s usually about now we have a word cloud with lots of big data related words on it.

Small Data, Big Data Tools

Most of the stories I hear about Big Data adoption are just this, using Big Data tools to solve small data problems. On the face of it the amount of data an organisation has rarely amounts to the need for huge tooling like Hadoop or Spark. My guess is (and I’ve seen partially confirmed) that the larger platforms like Cloudera, MapR and Hortonworks compete on a very narrow field of real big customers.

Let’s be honest with ourselves, Netflix and Amazon sized data are more deviations of the mean than the mean itself and the probability of it being given to you is very small unless it’s made public.

I personally found out in 2012 when I put together Cloudatics, using big data tools is a very hard sell. Many companies just don’t care, not all understand the benefits and those who cared still didn’t see how it would apply to them. Your pipeline is slim, at a guess 100:1 ratio would apply, that was optimistic then let alone five years on.

Most of us aren’t near “Averaged Sized Data” let alone Big Data.

When first met Bruce Durling back in late 2013 (he probably regretted that coffee) we talked about all the tools, how there’s no need to write all this Java stuff when a few lines of Pig will do and how solving a specific problem with existing big data tools was far better than trying to launch a platform (yup, know that, already tried).

What Bruce and I also know that we work with average sized data…. it’s not big data but it’s not small data. Do we need Hadoop or Spark? Probably not, can we code and scale it on our own, yes we can. Do we have the skills to do huge data processing, you betcha.

I sat in a room a few weeks ago where mining 40,000 tweets was classed as a monumental achievement, I don’t want to burst anyone’s bubble, it’s not. Even 80 million tweets is not a big data problem, neither an average sized data one. On my laptop doing sentiment analysis took under a minute.

Now enter all life saving AI!

And guess what, it looks like the same mistake is going to be repeated. This time with artificial intelligence. It’ll save lives! It’ll replace jobs! It’ll replace humans! It can’t tell the difference between a turtle and a gun! All that stuff is coming back.

If you firmly believe that a black box is going to revolutionise your business then please be my guest. Just be ready with the legals and customer service department, AI is rarely 100% accurate.

Like big data you’ll needs tons of data to train your “I have no idea how it works it’s all voodoo” black box algorithm. The less you train the more error prone your predictions will be. Ultimately the only the only thing it will harm is the organisation who ran the AI in the fist place. Take it as fact that customers will point the finger straight back at you, very publicly, if you get prediction wildly wrong.

I’ve seen Google video and Amazon Alexa voice classification neural works do amazing things, the usual startup on the street may have access to the tools but rarely the data to train. And my key takeaway of learning since doing all that Nectar card stuff, without quality data and lots of it, you’re fight will be a hard one.

I think there is still a good few years at the R&D coalface trying to figure it all out where AI could fit properly. Yes jobs will be replaced by AI, new jobs will be created. Humans will sit aside robotic machines that take the heavy lifting away (that was going on for a long time before the marketers got hold of AI and started scaring the s**t out of people with it.

It’s not impossible to start something in the AI space and put it on the cloud, though, the costs can add up if you take your eye off the ball. The real question is, “do you really have to do it that way? Is there an easier method?”. Most crunching could be done on a database (not blockchain may I add), hell even an Excel spreadsheet is capable for some without the programming knowledge or money to spend on services.

Popular learning methods are still based on the tried and true methods: decision trees, logistical regression and k-means clustering, not black boxes. The numbers can be worked out away from code as confirmation, though who does that is a different matter entirely. The most well known algorithms can be reverse engineered: decision trees, Bayes networks, Support Vector Machines, Logistic Regression there’s maths laid down bare showing how they work. The rule of thumb is simple: if traditional machine learning methods are not showing good results then try a neural network (the backbone of AI) but only as a last resort, not the first go to.

If you want my advice try the tradition, well tested, algorithms first with the small data you have. I even wrote a book to help you…..

The (prepare-params)function is now useless and removed. Using the into function we create a single vector of instructions to pass into sh this includes the Rscript command, the filepath and mapping through the values in the parameters.

Instead of running sh on it’s own I’m applying the vector against sh. When it’s run against the R script we get the following output:

Now we’ve got what we’re after, separate entries being registered from within the R script. The R script will have to deal with the argument input, converting the strings to numbers but we’re passing Clojure things into R with parameters.

Mrs. Trellis, as the basics go, job done. I’m sure it could be done better. Each case is going to be different so you’ll have to prepare the vector for each R script you work on.

[Eureka] – “Will you walk down the high street while one of our other dudes vacuums the street? We’ll give you 10% of the sales”

[Coco] – “Deal! Can I wear what I want? If I’m gonna look mad I might as well do it in style….”

[Eureka] – “Deal!”

(Disclaimer: The above is ALL MADE UP)

Back of the Beermat later….

Facebook views as I took the screenshot: 15,777,263….. nice.

One percent convert to sales? A long shot but hey, it’s madness this morning.

So 157,772 sales at $219 as let’s be honest you want the one that Coco get’s someone to clean the street with…. $34,552,205.97. Nice.

Coco walks about with $3.4m in her back pocket (assuming the getup has pockets).

Not bad for an hour’s work, a bit of mockery on Facebook and Youtube, so odd headlines about you but hey, the exposure is priceless. Eureka have saved a fortune on Youtube CPM fees and a full marketing campaign.

That doesn’t even take into account the outfit and what the baby is wearing. Now if you could scan the image into an app and find out about it…… Oh Kim’s working on that already….

An interesting conversation came up during a tea break in London meeting this week. How do run R scripts from within Clojure? One was simple, the other (mine) was far more complicated (see the “More Complicated Ways” section below).

So here’s me busking my way through the simple way.

Run it from the command line

The Clojure Code

Using the clojure.java.shell package gives you access the Java system command process tools. I’m only interested in running a script so all I need is the sh command.

(ns rinclojure.example1
(:use [clojure.java.shell :only [sh]]))

The shfunction produces a map with three keys: an exit code (:exit), the output (:out) and an error (:err). I can evaluate the output map and ensure there’s no error code, anything that’s not zero, and dump the error or if all is well send out the output.

The R Code

I’ve kept this function simple, I’m only interested in running Rscript and checking the error code. If all is well then we show output, otherwise we send out the error.

The now preferred way to run R scripts from the command line is the Rscript command which is bundled with the R software when you download it. If I have R scripts saved then it’s a case of running them through Rscript and evaluating the output.

Here’s my R script.

myvec <- c(1,2,3,2,3,4,5,4,3,4,3,2,1)
mean(myvec)

Not complicated I know, just a list of numbers and a function to get the average.

Running in the REPL

Remember the error is from the running of the command and not within your R code. If you mess that up then those errors will appear in the :out value.

Easy enough to parse by removing the \n and the [1] line which R have generated. We’re not interacting with R only dumping out the output from it. After that there’s an amount of string manipulation to do.

Expanding to Multiline Output From R

Let’s modify the meantest.Rfile to give us something multiline.

myvec <- c(1,2,3,2,3,4,5,4,3,4,3,2,1)
mean(myvec)
summary(myvec)

Nothing spectacular I know but it has implications. Let’s run it through our Clojure command function.

We have no referencing to what the number means, if the min, max, average etc. At this point there would be more string manipulation required and you could convert them to keywords or just add your own.

More Complicated Ways.

With the R libraries exists the RJava package. This lets you run Java from R and R from Java. I wrote a chapter on R in my book back in 2014.

It’s not the easiest thing to setup but worth the investment. There is a Clojure project on Github that acts as a wrapper between R and Clojure, clj-jri. Once setup you run R as a REngine and evaluate the output that way. There’s far more control but it comes at the cost of complexity.

Keeping Things Simple

Personally I think it’s easier to keep things as simple as possible. Use Rscript to run the R code but it’s worth considering the following points.

Keep your R scripts as simple as possible, output to one line where possible.

Ensure that all your R packages are installed and working, it’s not idea to install them during the Clojure runtime as the output will become hard to parse. Also make sure that all the libraries are running on the same instance as your Clojure code.

In the long run have a set of solid string manipulation functions to hand for dealing with the R output. Remember, t’s one big string.

A short post but an important one. It’s one of the most interesting plays I’ve seen to push a time critical offer. And it’s an interesting one to break down a little bit. So, in the great Gary Vaynerchuk tradition let’s get micro on this a little bit.

Buy My Stuff, In Exchange I’ll Give You My Time

So to push a two hour conference here’s the deal, you buy two cases of wine, selected by Gary, for $479.99. There’s no “buy tickets to this event”, no GetInvited or EventBrite links to buy access (and giving another supplier revenue). It’s a simple buy this and you’ll get what Gary is offering, a place at the conference.

Time critical offers are a mix of components. Get them right and you can measure the success:

An item, could be an appointment, a session or a stock item. In this case it’s wine.

A time limit. Here’s it’s the day of the conference, October 14th. Assume that with the audience size (see on the image it’s viewed over 206,000 times) that the offer will sell out beforehand. Scarcity accelerates demand.

A clear outline of the overheads involved, more on this in a minute.

We now have the elements of a formula:

Item retail price * available = total potential incremental revenue

Not a lot to it really…..

$479.99 * 200 = $95,998

Not bad going. A call to action and incremental revenue. Perfect. At a guess there’s a clear 30% profit margin once you take off sales tax, salaries but there’s no room hire or, I’m assuming, paying Gary to active for two hours (and the rest). Overhead reduction means profit increase.

Conclusions

The scene is simple really: know your audience, know your stock and know your numbers. The time frame it critical, there are customers who want your product and don’t want to lose out.

Find them by the medium that they consume (Snapchat, Instagram, Facebook and Twitter etc) and deliver the message. If you can personalise it then even better, that takes effort though.

In my opinion Gary executed it perfectly, the results though will be in the point of sale. That’s the measure.

It’s not often I watch Dragon’s Den and get a little bit exited. Okay I kind of knew that investment wouldn’t be on the table but the opportunity is. What concerned me was that it’s Erika’s gig, she is the stylist, the brand and that brings it’s own problems as growth happens.

Time is the main metric

Throughput of orders and recommendations takes time. The three boxes a year is very similar to Tesco’s “four Christmas’ a year” concept for Clubcard vouchers.

If you reduce the time and you put more orders through. Doing it on your own is possible but growth can only be taken the point of the number of boxes you can put together in one day.

So if we can find a way to save time we can process more. And there are two key aspects that will make that happen: customer preference data and product attribute data. If you can marry those two then you are on the way to improving process. I don’t know how Erika is doing it right now, from the pitch it sounded like it was all a manual process. I could be wrong.

Machine Learning Can Help

The main focus here is to get machine learning to automate the selection process for Erika, some form of match making algorithm, the who-gets-what selection that gives a list of preferred items to to box.

The final say is with Erika, not the algorithm, and that’s the important part as the customer is still paying for a personal service so there needs to be involvement. Machine learning aids the process but does not take over.

Measure Everything

Peter Jones main beef was over returns which is a reasonable concern. We know what products are going out (from our theoretical system) and we know that some products are going to come back. This becomes a self learning system, items that worked and items that didn’t are fed back into the system so the recommendations can improve.

Be certain of one thing, you will never have a perfect prediction but you can feed as much data back into the algorithm to ensure that your error rate starts to reduce. Once you are increasing certainty then you are reducing the chance of returns. That starts to increase the value of the customer and therefore increases the bottom line.

The matter of held inventory was also an issue, using an automated recommendation there’s a process that could, over time, minimise the stock holding by Dappad and just be able to order in a just-in-time basis. Automate the recommendation across the user base, order from the suppliers required quantities and then box appropriately.

Summing Up

There’s nothing here that I have presented that’s out of the ordinary nor anything that would worry me as a customer. It’s just taking a look at the supply chain process and seeing what could be improved with a little automation and algorithmic learning.

The questions in my head right now:

If you introduced 4 boxes a year instead of three what’s the impact to turnover?

Can you use Zara supply chain learning to Dappad and get down to near zero stock?

Would the introduction of some form of artificial intelligence or machine learning reduce the returns by 30%? If so what’s the financial uplift?

Can you replicate to different bands of customer: low spend, mid spent, luxury markets.

Ultimately all five Dragons passed on Dappad and for once in my life I actually think that Touker Suleyman missed a trick here….. no #toukertime this time.

So I had a lot of fun talking loyalty, data and vouchers and generally dissing social media at Smart Retail last week. And while I enjoyed Adoreboard’s presentation I can stay silent no more, there’s one part I don’t agree with and it’s all to do with that slide on fashion retail.

The original blog post is here, it’s worth a read as it’s important to the context on what I’m about to say. Emotional metrics are fine, I’ve got nothing against that but they are not to be compared with others in the same space.

So what follows is merely my opinion but with some more numbers to back up my assumptions.

Why Does It All Matter?

I see a lot of these comparison reports. And like a data trail they are left on the internet for all to see. Now then, these findings are open to discussion but they do have impact in some quarters.

Take JP Morgan’s post on Bitcoin being fraudulent. The cynic in me sees it like this. JP Morgan are investigating blockchain technology for a long time, one which Bitcoin is built, so why diss it. Perhaps in the knowledge that if you do that then the price will drop. Markets are driven by emotions. So after the post the price of Bitcoin dives for a very short period of time, guess who had the highest purchase volume….. JP Morgan. I’ll let you derive your own conclusions from there, I have my own.

Same thing applies here, when you are talking about five fashion brands well valuation matters. And as markets are emotionally driven it can do as much harm as it can highlight a product. Don’t think posts have no ripple effect, they do.

Nothing gives you the fear of responsibility than a complete stranger walks up to you at an international conference saying, “Hi, I subscribe to your blog”.

One Dimension isn’t enough

Twitter data is dreadful, that’s the plain and simple truth of the matter. I’ve done enough of it over the last seven years to know. I personally can’t value it as a single data source. As well as that the quantity of data to get insight from, well the more the merrier. From the article “We analysed over 6,000 mentions of five of the leading online fashion retailers”, that’s not a lot of tweets and I wondered what day of the week, what time of the day etc etc?

As we don’t know the percentage mentions of those 6,000 tweets we don’t really know the true value of those scores. Were there 70% mentions of New Look and only 10% of Zara for example. These kinds of breakdowns need to be reported so we get the balanced view. Was the score weighted according to how many tweets were ranked against each retailer….. I ask a lot of questions.

The simple upshot it that you’ll get results from a small data sample but I’d like to see something over a million, twenty million or a hundred million tweets. And don’t give me the cost and processing power, that’s utility stuff and right now it’s cheap. Many knock Hadoop now but it’s the first thing I’d go for in something like this. And it wouldn’t take long either. I’ve done sentiment studies with 8 million tweets and they were processed in just over 40 seconds.

So let’s introduce a second measure. There’s a few to choose from, I’ll go through each here. I need another metric to go against. In fact I’ve got three: the number of Twitter followers that brand has, the number of Facebook page likes and, finally, the number of physical stores.

Reverse Adoreboard against per 1000 Twitter Followers

Firstly, I know what you’re thinking, “what’s a reverse Adoreboard“, well the index score gives the positive emotion index. I want the complainy whiney index version of that…. it needs a nicer name so it’s a Reverse Adoreboard. I’m assuming here the score is based on 0-100 which is interesting in self as it means the top fashion retailer is still below 50% in customer satisfaction. I digress….. a reverse score is 100 minus the Adoreboard score.

The 6000 tweets is fine, what we don’t know is the number of followers each brand has. Well I made a cup of tea and found out. Once we have then then we can find out the RA per 1000 followers. My calculation was easy enough.

Reverse Adoreboard Score / (Followers / 1000)

Retailer

RA

Twitter Followers

RA / T1000

Top Shop

58

1,330,000

0.0436090225

Zara

76

1,270,000

0.0598425196

Asos

64

1,040,000

0.0615384615

Boohoo

70

482,000

0.1452282158

New Look

55

362,000

0.1519337017

When you rank by the per thousand score from smallest to largest this changes the standings quite significantly, when you balance the negative score per thousand twitter followers for the brand then New Look actually come out bottom and Zara came out second.

Reverse Adoreboard against per 1000 Facebook Page Likes

Okay that was Twitter, let’s look at Facebook while we’ve got some numbers to work with. Using the same Reverse Adoreboard score how do the retailers stack up RA per 1000 Facebook page likes?

Retailer

RA

Facebook Page Likes

RA / FB1000

Zara

76

25,907,851

0.00293347371

Top Shop

58

4,277,568

0.01355910648

Asos

64

5,035,399

0.01271001563

New Look

55

3,426,041

0.01605351483

Boohoo

70

2,534,629

0.02761745407

Fashion retailers get far more attention on Facebook than on Twitter, I think that’s important to point out. The other interesting fact here is that Zara’s presences is 1.69 times more than the other four combined. So when you run the RA score against the page likes then Zara just runs ahead of the competition.

I have to be careful here as the RA metric really applies to Twitter users and not Facebook ones. You’d have to run the study again on Facebook customer experience data to get a better idea but something tells me that Zara would still come out on top but that’s a gut feeling and not to be trusted. You need the data.

Reverse Adoreboard against per 1000 Physical Stores

Asos and Boohoo don’t get counted here as they don’t have physical stores but are purely online.

Retailer

RA

Physical Stores

RA / PS1000

Zara

76

7,000

10.85

New Look

55

1,160

47.41

Top Shop

58

500

116.00

This is really as a guide, online and offline customer experiences are different beasts. A better gauge would be refunds from point of sale, there’s a good chance that complaints aren’t actually recorded but the reaction is merely dealt with.

In terms of Zara’s high RA score it comes out highest based on the number of stores that it has. I’d expect to see that. Even if there was a 10-15% tolerance in the scores Zara still comes out on top. As Zara’s core business is not online but in store then it should come as no surprise.

Conclusions?

From the day I read the Adoreboard blog post I’ve never agreed with the results. What I have presented here, while not perfect, is an alternative view with extra data points. It’s only when you introduce a second metric that you can drill down into the results and get better insight.

Each brand performs well in their own way. You’d expect Asos and Boohoo to nail the online space as it’s their core business but they do a good job of staying middle of the road in terms of performance. For my money both Zara and Top Shop are doing a better job of New Look in terms of balanced ranking on Twitter and Facebook.

The Adoreboard index is fine, it ranks emotion but it’s only a single view in my opinion. Which is fine when one brand role up and want to see the emotional responses for their own brand. Once you bring in competitors then the results are very open to interpretation. As a blog post it is good. As a system it’s good, please don’t take this as me knocking Adoreboard because I’m not. I’m exploring the meanings of the original blog post that I disagreed with. As there’s context missing it’s always going to be an opinion whether things are right or wrong.

Best course of action: run the whole analysis again with a million tweets.

The SmartRetail Conference is taking place on Thursday 28th in the Culloden, Belfast. I was asked to talk about customer loyalty and my experiences with loyalty based discounting, something I covered with uVoucher.

So, day off booked. Slides are done.

And yes, I will talk Tesco Clubcard and the supply chain wonders of the Zara fashion chain. It’s all about the data.

This is a great opportunity for anyone in retail to network and learn some new things. You can pick up tickets on the SmartRetail website.