Monthly Archives: March 2017

(The post here reflects my own thoughts and may not be the thoughts of my employer, just putting that out there now to avoid any confusion)

The Shift

Over the last few months there’s been a shift. A movement from web sites and apps that do stuff (mostly useful, some utterly useless) to a move refined thinking on process and insight.

During the weekend I was look at the funding patterns of artificial intelligence startups. Handily KDNuggets (the place you look for anything on data mining and machine intelligence) had a piece on 50 of the “top” companies right now in AI.

The 50 to Watch

Company

Sector

Investment ($m)

InsideSales.com (Provo UT)

Ad Sales

251.2

Persado (New York NY)

Ad Sales

66

APPIER (Taipei Taiwan)

Ad Sales

49

DrawBridge (San Mateo CA)

Ad Sales

46

Zoox (Menlo Park CA)

Autotech

290

Nauto Inc. (Palo Alto CA)

Autotech

14.9

nuTonomy (Cambridge MA)

Autotech

19.6

Dataminr (New York NY)

BI

183.44

Trifacta (San Francisco CA)

BI

76.3

Paxata (Redwood City CA)

BI

60.99

DataRobot (Boston MA)

BI

57.42

Context Relevant (Seattle WA)

BI

44.3

Tamr (Cambridge MA)

BI

41.2

CrowdFlower Inc. (San Francisco CA)

BI

38

RapidMiner (Boston MA)

BI

36

Logz.io (Tel Aviv Israel)

BI

23.9

BloomReach (Mountain View CA)

Commerce

97

Mobvoi Inc. (Beijing China)

Conversation AI

71.62

x.ai (New York NY)

Conversation AI

34.3

MindMeld (San Francisco CA)

Conversation AI

15.4

Sentient Technologies (San Francisco CA)

Core AI

135.78

Voyager Labs (Israel)

Core AI

100

Ayasdi (Menlo Park CA)

Core AI

106.35

Digital Reasoning (Franklin TN)

Core AI

73.96

Vicarious (San Francisco CA)

Core AI

72

Affectva (Waltham MA)

Core AI

33.72

H20.ai (Mountain View CA)

Core AI

33.6

CognitiveScale (Austin TX)

Core AI

25

Numenta (Redwood City CA)

Core AI

24

Cylance (Irvine CA)

Cyber Sec

177

Darktrace (London UK)

Cyber Sec

104.5

Sift science (San Francisco CA)

Cyber Sec

53.6

Kensho (Cambridge MA)

Fintech

67

Alphasense (San Francisco CA)

Fintech

35

iCarbonX (Shenzhen China)

Healthcare

199.48

Benevolent.AI (London UK)

Healthcare

100

Babylon health (London UK)

Healthcare

25

Zebra medical vision (Shefayim HaMerkaz Israel)

Healthcare

20

Anki (San Francisco CA)

IOT

157.5

Ubtech (Shenzhen China)

IOT

120

Rokid (Hangzhou Zhejiang China)

IOT

50

Sight Machine (San Francisco CA)

IOT

44.15

Verdigris tech. (Moffett Field CA)

IOT

16.1

Narrative science (Chicago IL)

Text Analysis

29.4

Captricity (Oakland CA)

Vision

51.9

Clarifai (New York NY)

Vision

40

Orbital Insight Inc. (Mountain View CA)

Vision

28.7

Chronocam (Paris France)

Vision

18.35

Zymergen (Emeryville CA)

Other

174.1

Blue river tech (Sunnyvale CA)

Other

30.4

Key Summary

Minimum Investment – $14.9m

Maximum Investment – $290m

Average Investment – $73.26m

Number of companies listed – 50

The listed companies were “ones to watch”, that doesn’t take into account the other 10,000 or so that will be in stealth, not on anyone’s radar or just making sales and getting on with it.

For me one concern is the lower investment limit, $14.9m, I’ve not seen any NI company raise that amount of investment. And I’ve spent time thinking about why that could possibly be.

All the startups are donkey’s. They’re just not worth that amount.

All the founders are playing the Northern Ireland funding game, raised their little $1m and can’t raise as they’ve already lost 20-25% of the company.

There’s no actual IP or product.

There’s no customers.

There’s no problem being solved.

That’s off the top of my head, if I really went all mind palace on it I’d probably come up with another ten reasons.

The Talent Pool

The much lauded reason for FDI companies setting up shop in Belfast and, occasionally, Derry.

“you have graduates – there’s a lot of talent in Belfast” From the BT, here.

Which I read as, “There’s plenty of cheap graduates looking for a job in Belfast, we can exploit that and reduce our bottom line.”

It’s time to seriously question this marketing message, yes there are some very talented graduates in Northern Ireland. Are they ready for the market where they are needed? Debatable. Do they fill the gap of what’s really missing, no they don’t.

It still skirts around the issue for any startup, a complete lack of good CTO talent. What I’m seeing more and more of are companies setting up, getting that free government money (startup DLA, if you will) and handing out vanity titles like there’s no tomorrow. I’ve written and spoken about this many times before, if you want to read it again then have a look at this.

Good CTO’s in NI are hard to find, plain and simple. The reason for this is simple too, they’re pretty much in great jobs with large employers with a deal too good to lose and don’t think it was a fluke, the large companies engineer it that way, they obviously don’t want to lose good talent when they see it.

Jumping to a startup with a very questionable runway is a huge risk. Look at yourself in the mirror and ask yourself, “Am I worth the risk to my employees, my C levels and most importantly my customers?”.

If you flinch or can’t do it then you obviously need a session with Wendy Rhodes.

NI Needs a BIG WIN

If you think you’re on the starting wave of AI technology then you’re already five years too late. The same mistake was made with BigData opportunities. What I personally believe is required right now is for someone to bring a product along that is so unique and solves a problem better than anyone else that the rest of the world can’t do anything but look.

This thing also needs to IPO big time and make the founders and early stage investors so rich that people look at Northern Ireland as the place. The time is now to stop kidding ourselves and thinking we’re at the start of a wave, you’re already behind. Still thinking that social media data is going to make you (and others) rich, I doubt it, that edge is long gone.

There’s little point building tools, it’s hard to create revenue with programming tools and API’s, solve a problem better than anyone else so it can’t be ignored. The tools to do AI and Machine Learning are plentiful, whether it be TensorFlow, Weka or what have you. Search hard enough here and you’ll find posts on those technologies. At the end of the day the programming side isn’t that difficult when you have good coders who understand the logic.

I firmly believe it can be done, I just think the thinking needs to change, stop listening to salary paid government PR (use them, fine, but weigh up what’s being said) and focus on idea, IP and customer.

Kick ass product

Kick ass team

More than $7m in investment

An edge that no one can ignore.

Main focus to remain in NI and IPO.

Your focus needs to be three standard deviations to the mean, that’s where the risk and the potential rewards are.

And keep this in mind, AI is not about replacing jobs, it’s about focusing on the job creation and creating new jobs that currently don’t exist. It’s an exciting time to be here but NI but you have some serious catch up to do.

Beltech 2017

I’m on the panel at Beltech 2017, “Public Debate: The Impact of AI on our World” at 6pm though I’ll be there most of the day on behalf of Mastodon C. So feel free to catch up with me there.

If another company can process thirty million messages a second, you can too.

Bonus: Tea Solves Everything

For the majority of users the defaults are there and they kind of work, your messages are small and there’s enough volume on the box to be able to relax. If you are working on local development then there’s a good chance you don’t even consider such things, once things go live though then it’s a different matter.

Message Retention

There are two methods that are available to you for setting retention of messages on Kafka, firstly by time a message is in the log and then by log size.

log.retention.hours, log.retention.minutes, log.retention.ms

Yes there are three but they all do the same thing. How long messages are retained in the log by time. The default is 168 hours (which is seven days). You can use either hours, minutes or milliseconds as they all set the same thing. If more than one setting is present then the lowest unit size is used.

log.retention.bytes

You can retain messages expressed as a the total number of bytes of the messages in the log. The retention bytes is set and applies to per partition so a topic with three partitions and a log.retention.bytes of 1GB is 3GB bytes retained at the very most. If you increased the partition count by one on the topic for example then the retention bytes will then increase to 4GB.

The two types of log retention, size and time, can be used together. If both are set then messages are removed when the either of the settings are satisfied. If you have a retention time of 1 day and 2GB retention in size then the log rules will be applied if you have over 2GB before the one day period is up.

message.max.bytes

Producers are limited to the size of messages they can produce. The default is 1mb in size, if a producer is sent a message over that then it will not be accepted. The setting refers to the compressed size of the message, so the message itself can be over the set size unpressed.

While you can set Kafka to use larger message sizes this does have performance impact across the network and I/O throughput. So it’s worth sitting down with a pen and paper (or a spreadsheet) to gauge the average message sizes and adjust the settings accordingly.

In Investment, the Rule of 72

The Rule of 72 is a simple calculation used in accountancy and investment. Simply, if you divide the interest rate by 72 you get the number of periods your investment will double. I have £100 and an interest rate of 10% per year then it will take 7.2 years (72/10 = 7.2) for my money to double.

In Realtime and Streaming Applications

Workflow throughput is everything. How consumers perform and behave will have a knock on effect to the number of messages you can process. So please allow me to present, The Streaming Rule of 72.

In realtime and streaming applications, the rule of 72, the rule of 70 and the rule of 69.3 are methods for estimating the volume of messages able to be processed doubling time. The rule number (e.g., 72) is divided by the percentage gain per period to obtain the approximate number of periods (usually seconds) required for doubling.

For example if 100 messages are flowing through the system per second and by changing the workflow timeouts, increase a container shared memory volume or extend a heartbeat time out for example, then measure throughput again and we process 130 messages a second. That’s a 30% increase ((130-100)/100) so with the streaming rule of 72….. we will double the message volume every 2,4 seconds.

With a bunch of applications acting as consumers to a Kafka stream it appears to be a Google dark art to find any decent information of what’s going where and doing what. The big question is, where is my application up to in the topic log?

After hours of try, test, rinse, repeat, tea, pulling hair, more tea, Stackoverflow (we all do, get over it) and yet more tea, this dear digital plumber was looking like this….

….but in the male form, less angry and not using tables, just more tea.

The /consumer node in Zookeeper is a bit of a red herring, your application consumer group ids don’t show up there but ones from the Kafka shell console do. This makes running the ConsumerGroupCommand class a bit of a dead end.

Most consumers are basically while loops collecting a number of records and using the poll() method to update the consumer offset of any records not dealt with, basically saying “I’m up to here boss! I didn’t read these though”. It also acts as the initial link with the Kafka Group Coordinator to register a new consumer group. Those consumer groups do not show up where you expect them too.

Finding Out Where the Offset Is

At this point Zookeeper isn’t much help to me, using the Zookeeper shell doesn’t give me much to go on.

Any consumer applications you have running should show up in the offset log. In this example I have two applications running from the same topic (topic-input) on one partition. So I can see from here that my-stream-processing-application is up to offset 315 in the topic while my-other-processing-application is further ahead at 504. That could potentially tell us there’s an issue with the first application as it appears to be way behind in the topic.