ln -s /etc/my_ideas{IT,data storage,business} /usr/share

TechUnplugged – Storage afterthoughts

It has been a long time since I published something on my blog. Since then my first child was born, I changed my job profile, I traveled a lot so it might be understandable but not excusable. I work full time now on building a small european storage company. This is quite an unusual story in a mature market of over 100 billion euros each year where all the companies come from US. But that’s why it’s so fun.

I was invited by Enrico Signoretti to attend Tech Unplugged, a conference dedicated to topics related to storage, virtualisation and cloud computing. Most of the industry influencers were present including a series of vendors that sponsored the event. The format of the conference was very interesting with independent analysts, storage users, infrastructure and data centre administrators and vendors sharing their insights and products. It was organised much more on the concept of an Unconference where people from the audience were encouraged to actively participate. It might be interesting for the future to follow the concept which is present at software dev conferences where they go one step further with some hands on activities, maybe even a hackaton where people can come with ideas (high level), draw them on a whiteboard, or a piece of paper. I think it would make it more entertaining. Something like this.

It all started with containers by Nigel Poulton. A very enjoyable presentation, the speaker is a very gifted presenter. Now, containers – bloody containers, can this be as disruptive as VMware 1.0? From my stand point it’s absolutely amazing how a technology that has been around for so many years (like 15) starting from jails in BSD, continuing with solaris zones can be such a hot topic today. The answer is quite simple: timing and context. Regarding context – Linux has won the server OS battle either we like or not (I don’t). Everything in this world today is powered in a way or another by software – and software needs an army of developers to sustain it. 15 years ago when containers first appeared being a software developer – it would be to hipsterish to mention open-source software developer – was still something quite unique. Today, things have changed and the developer community is growing at a staggering pace. They needed portable, environment agnostic (no more “it works on my machine”) development environments. And they did it – on Linux. The timing was right and it was an instant hit. I like this quote from Anna Wintour: “It’s always about timing. If it’s too soon no one understands. If it’s too late, everyone’s forgotten.”

There is still a long way for containers to become mainstream but the signs are very promising – every single cloud provider out there has started offering support for them, VMware is coming out with Photon, Microsoft forked Docker, container friendly low footprint OS’es are born, like CoreOS, so this is definitely a very hot space. Everyone talks about micro-services oriented architectures and containers are the best fit for it.

I find a very interesting similarity between ‘hot-containers’ and software development cicle. In object oriented software, like in Unix philosophy, you should have single small entities (classes, modules) that do one thing, and do it well. And I totally agree with that. It makes your architecture modular and clean. But like everything in life, it’s all about trade offs. Once you do that, most of the complexity moves into integrating components. Most of the bugs, and probably the hardest to spot move at the intersection of different domains, modules and components. That it something we should think about in my opinion when working with micro-services, containers and the like.

Chris M Evans had the next presentation. He focused on the following ideas:

storage should be based on policy. I think the idea here is related to the abstraction mindset: the user should not be thinking “I need to deploy a LUN”. The way of delivery is not what’s important but arguments like consistent performance, tolerance to failures, efficient data placement.

the next generation storage should be hardware agnostic. What I fear the most here is that people don’t understand what agnostic means. And VMware with VSAN and other “software defined only” solutions make it harder for them to understand. Hardware agnostic does not mean in any way that hardware does not matter. If you are going to put crap underneath it will still run like crap. It’s just like building a house on a mud foundation. As I mentioned also during the conference my idea about the future is having “customised commodity hardware”. That means blueprints – complete BOM’s – of well designed storage architectures. I don’t agree that we should stop thinking about hardware architecture anymore. It’s like in Agile software development, where people misunderstood the concept of not doing big design upfront. They eliminated the word “big” and they don’t do design anymore. That’s why lately i’ve seen some of the shittiest software ever written (sorry for the word). And, in the end I know only one company that has succeeded in doing a software that works on everything, and that is Microsoft.

divergence and convergence in storage are happening at the same time: from a divergence stand point one size does not fit all anymore, a lot of the solutions out there are becoming more and more specialised to different workloads and specific use cases; from a convergence standpoint we see the rise of the hyper converged storage solutions mainly for economic and usability reasons.

The observation I liked the most from the presentation was: don’t buy technology, don’t buy what you don’t need, buy what brings real value to your business. Evaluate your options, use online resources and apply the wisdom of the crowd. It will help you on the long run. I can’t agree more. The sole purpose of why be build technology is to solve real life business problems, to bring value, to improve someone’s life and if we don’t do that, we, as tech guys, failed. At least that’s how I see it.

The next presentation was from Stephen Fosket. His presentation was “cloudy”. The context of divergence presented in an earlier presentation was again present but now talking about private, public and hybrid cloud.The accepted conclusion is that it’s hard to believe we are going to have a one site fits all model. I love analogies and in this presentation I found a great one. Think about speciation – like Darwin did. From a common ancestor species have evolved because they had to adapt for different needs and to fit in different environments, at different times. Look at private, public and hybrid cloud exactly in the same way. We are an evolution of the neanderthal after all. As geological eras came in waves as does technology, but IT is slow in adopting technology. And like in the animal reign some survived and some didn’t – it’s about survival of the fittest.

Hans de Leenheer continued with a very ‘hot topic’ – hyper convergence or hype convergence as he called it. The presentation was at a completely different level of abstraction then people expected. It was hyper convergence seen from a business perspective. What I was left out of the presentation:

hyper convergence is not a standard and probably will not be any time soon.

minimise the infrastructure maintenance and implementation cost.

It’s important to understand that a predictable building block model for scaling infrastructure, both financially and technically, is what’s gonna drive the datacenter acquisition model. My thoughts on this: I like presentations that are very much anchored in business uses cases. In the end, that’s what really matters.

Next, Martin Glassborow aka StorageBod had a very entertaining presentation. He was talking about a very specific use case where their storage needs increase every month with 2.5 PB! So, we can struggle to find solutions but in the end everything is a lie. We can’t manage data growth. We can cope with it. How? Drinking so we don’t loose our minds. Funny way to present it. Now the challenges having that quantity of data to manage are completely different from what any of us is used to. The solution to this is not actually managing the data but managing wisely the people who create the data. From a technical perspective, deploying scripts that gather usage statistics probably like access times, frequency of file opens/user and putting then in an excel file. Yes excel files, at least this is what they do. Then you can manage the guy who just puts data, never uses it and never deletes it. So in the end you end up managing the root cause not the consequence. Maybe for that quantity of data is the only sane way to do it.

Enrico Signoretti continued with a presentation about the future of the storage world. A lot of the features that a few years ago seem unique are now omnipresent: deduplication, snapshots, replication, compression. That is probably the base where most of the storage products start nowadays. Some interesting trends are emerging:

data aware storage: the storage is not a black box anymore and is capable of understanding what kind of data the users are handling and how. This brings endless possibilities of innovation especially from a business perspective. For a company to have deeper insights about how the data is moved and handled inside the company can boost their business intelligence considerably. There are a few very interesting start-ups that are doing a great work in this area already.

data analytics. This can include prediction engines, comparative analytics with what other users are doing, how are they using the same devices and what performance are they getting out of it. This is helpful to understand if you are misconfiguring your storage or your applications for different kind of workloads. The best way to do this is to have the analytics sent in a cloud, for sure.

The general trend in software nowadays is to push the intelligence up the stack as much as possible. Is this a good thing? It depends. Innovations nowadays happen more close to the user and that is probably because they can be perceived better from a business, usability and experience perspective, and with that, I’m fine. But we are coming back at the same reasoning I had before regarding software defined hardware agnostic storage. As long as innovations stop at the OS level, filesystem level, or any other system related stuff everything might become shaky. You start building on mud again (i don’t want to start on systemd, really, no). No more kernel level features, let’s move everything into user space just to be on the safe side. System’s programming is hard, very hard and nowadays you find less and less people who do it. It’s an education thing, and considering where the industry is heading is a career choice. To be a system programmer all that stuff you learned in your computer science lessons, that’s for real, that’s your daily work. Ok, enough with this nonsense.

So what’s the next storage going to be? It depends. From a business perspective the answer is very much market dependent. The supply/demand curve, thus the price elasticity of storage is not the same worldwide. The technology innovation curve is very much ahead of ‘real life’ scenarios in my opinion. We really have to think what innovation means, and how to approach it. For someone who wants to build the next storage I think it’s fundamental to look at both graphs: the supply/demand curve and the technology adoption to understand more.

My strong believe is that innovation is really born at the intersection of different domains and disciplines combined with a good business approach – Medici Effect.

Overall it was a great experience and I hope it will happen again very soon. Till then, I’m a little bit worried as I am now thinking about 4 different new storage products and I’m quite sure that this will cause me a long series of sleepless nights.