Database Performance Tuning

Sunday, 2 November 2014

You probably have felt like this. You've devoted some intense time to solve a very complex and difficult problem. In the process, you've researched the field, made attempts to solve it in a few different ways. You've came across and tested some frameworks as a means of getting close to the solution without having to do it all by yourself.

You've discarded some of these frameworks and kept others. Cursed and blessed the framework's documentation, Google and StackOverflow all at the same time. You have made a few interesting discoveries along the way, as always, the more interesting ones coming from your failures.

You've learned a lot a now are ready to transfer that knowledge to your customer, so you start preparing documentation and code so that it is in a deliverable state. Your tests become more robust and reach close to 100% coverage. Your documents start growing and get data from your notebook, spreadsheets and test results. Everything comes together nicely and ready to be delivered as a coherent and integrated package, something that your customer will value, appreciate and use for some time in the future (and pay for, of course)

And along the way, you've committed to your version control of choice all your false starts. All the successes. You've built a rather interesting history.

Of course, at this point your customer will not see any of the failures, except when you need to refer to them as supporting evidence for taking the approach you propose. But that's ok, because the customer is paying for your results and your time, and he's not really interested in knowing how to get these. If your customer knew how to do it in the first place, you would not be there doing anything, after all.

But it is exactly on these final stages where the topic of accidental versus essential complexity raises its head. You've spent some time solving a problem, solved it and yet you still need to wrap up the deliverables so that they are easily consumed by your customer(s). This can take many different shapes, from a code library, a patch set or a set of documents stating basic guidelines and best practices. Or all of them.

The moment you solved the problem you mastered the the essential complexity, yet you have not even started mastering the accidental complexity. And that still takes some time. And depending on the project, the problem, and the people and organization(s) involved this can take much more time and cost than solving the problem itself.

Which nicely ties to the "Excel" part of the post, which at this point you're likely asking yourself what exactly Excel has to do with accidental complexity.

The answer is: Excel has nearly zero accidental complexity. Start Excel, write some formulas, some headers, add some format, perhaps write a few data export/import VBA scripts, do a few test runs and you can proudly claim you're done. Another problem solved, with close to 100% of your time devoted to the essential complexity of the problem. You did not write any tests. You did not used any kind of source control. You did not analyzed code coverage. You did not document why cell D6 has a formula like =COUNTIF(B:B;"="&A4&TEXT(A6)) You did not document how many DLLs, add-ons or JDBC connections your spreadsheet needs. You did not cared about someone using it on a different language or culture where dates are expressed differently. Yet all of it works. Now. Today. With your data.

That is zero accidental complexity. Yes, it has its drawbacks, but it works. These kind of solutions are usually what is described as "shadow IT", and hardly a day passes without you coming across one of these solutions.

What I've found empirically is that the amount of shadow IT on an organization remains roughly proportional to the size of the organization. You may assume that larger organizations should be more mature, and by virtue of that maturity would have eliminated or reduced shadow IT. Not true. And that is because the bigger the organization, the bigger the accidental complexity is.

If you look at the many layers of accidental complexity, you'll have some of them common to organizations of all size, mostly at the technical level: your compiler is accidental complexity. Test Driven Development is accidental complexity. Ant, make, cmake and maven are all accidental complexity. Your IDE is accidental complexity. Version control is accidental complexity.

But then there's organizational accidental complexity. Whereas in a small business you'll likely have to talk to very few individuals in order to roll out your system or change, the larger the organization the thickest the layers of control are going to be. So you'll have to have your thing reviewed by some architect. Some coding standards may apply. You may have to use some standard programming language, IDE and/or framework, perhaps particularly unsuited to the problem you are solving. Then you'll have to go thru change control, and then... hell may freeze before you overcome the accidental complexity, and that means more time and more cost.

So at some point, the cost of the accidental complexity is way higher than the cost of the essential complexity. That is when you fire up Excel/Access and start doing shadow IT.

Monday, 18 August 2014

From time to time I have to cleanup my hard disk. No matter how big my partitions are, or how bigger the hard disk is, there always comes a point where I start to be dangerously close to run out of disk space.

It is In these moments when you find that you forgot to delete the WAV tracks of that CD you ripped. That you don't need to have duplicate copies of everything you may want to use from both Windows and Linux because you can keep these in an NTFS partition and Linux will be happy to use them without prejudice.

And it is in these moments when I realise how much code I've abandoned over the years. Mainly in exploratory endeavours, I've written sometimes what in retrospective seem to be substantial amounts of code.

Just looking at abandoned Eclipse and Netbeans folders I find unfinished projects from many years ago. Sometimes I recognise them instantly, and always wonder at how subjective the perception of time is: in my mind that code is fairly fresh, but then looking at the timestamps I realize that I wrote that code seven years ago. Sometimes I wonder why I even thought that the idea was worth even trying at the time.

Yet here they are: a JPEG image decoder written in pure Java whose performance is about only 20% slower than a native C implementation. A colour space based image search algorithm complete with a web front end and back end for analysis. A Python arbitration engine that can scrape websites and alert of price differences applying Levenshtein comparisons across item descriptions. Enhancements to a remote control Android app that is able to drive a Lego Mindstorm vehicle over Bluetooth. That amalgamation of scripts that read EDI messages and extracts key data from them. Like seven different scripts to deal with different media formats, one for each camera that I've owned. And many more assorted pieces of code.

The question is, what I should do with this code? I'm afraid of open sourcing it, not because of patents or lawyers but because its quality is diverse. From slightly above alpha stage to close to rock solid. Some has test cases, some does not. In short, I don't feel it is production quality.

And I can't evade the thought that everything one writes starts in that state: we tend to judge the final product and tend to think that it was conceived in that pristine shape and form from the beginning. I know that's simply false: just look at the version control history of any open source project. But I want to have that smooth finish, clean formatting, impeccable documentation and fully automated build, test and deploy scripts from day one.

Yet some of this could be potentially useful to someone, even to me at some time in the future. So it is a shame to throw it away. So it always ends up surviving the disk cleanup. And I'll see it again in a few years and make myself the same question... why not have the equivalent of the code garage? Some place where you could throw all the stuff you no longer use or you don't think are going to use again and leave it there so anyone passing by can take a look and get the pieces if he/she is interested in them?

Monday, 14 April 2014

I can't resist on commenting this, because Heartbleed is the subject of countless debates in forums. In case you've been enjoying your privately owned tropical island for the past week or so, Heartbleed is the name given to a bug discovered in the OpenSSL package. OpenSSL is an Open Source package that implements the SSL protocol, and is used across many, many products and sites to encrypt communications between two endpoints across insecure channels (that is, anything connected by the internet is by definition insecure)

The so-called Heartbleed bug accidentally discloses part of the server memory contents, and thus can leak information that is not intended to be known by anyone else but the OpenSSL server. Private keys, passwords, anything stored in a memory region close to the one involved in the bug can potentially be transmitted back to an attacker.

This is serious. Dead serious. Hundreds of millions of affected machines serious. Thousands of million of password resets serious. Hundreds of thousands of SSL certificates renewed serious. Many, many man years of work serious. Patching and fixing this is going to cost real money, not to mention the undisclosed and potential damage arising from the use of the leaked information.

Yet the the bug can be reproduced in nine lines of code. That's all it takes to compromise a system.
Yet with all its dire consequences, the worst part around Heartbleed for me is what we're NOT learning from it. Here are a few of the wrong learnings that interested parties extract:

Security "experts" : this is why you need security "experts", because you can't never be safe and you need their "expertise" to mitigate this and prevent such simple mistakes to surface and audit everything right and left and write security and risk assesment statements.

Programmers: this Heartbleed bug happened because the programmer was not using memory allocator X, or framework Y, or programming language Z. Yes, all these could have prevented this mistake, yet none of them were used, or could be retrofitted easily into the existing codebase.

Open Source opponents: this is what you get when you trust the Open Source mantra "given enough eyeballs, all bugs are shallow" Because in this case a severe bug was introduced without no one realizing that, hence you can't trust Open Source code.

All these arguments are superficially coherent, yet they are at best wrong but well intentioned and at worst simply lies.

In the well intentioned area we have the "Programmers" perspective. Yes, there are more secure frameworks and languages, yet no single programmer in his right mind would want want to rewrite something of this complexity caliber without at least a sizeable test case baseline to verify it. Where's that test case baseline? Who has to write it? Some programmer around there, I guess, yet no one seems to have bothered with it. In the decade or so that OpenSSL has been around. So these suggestions are similar to saying that you will not be
involved in a car crash if you rebuild all roads so that they are safer. Not realistic.

Then we have the interested liars. Security "experts" were not seen anywhere during the two years
that the bug has existed. None of them analyzed the code, assuming of
course that they were qualified to even start understanding it. None of
them had a clue that OpenSSL had a bug. Yet they descend like vultures
on a dead carcass on this and other security incidents the demonstrate
how necessary they are. Which in a way is true, they were necessary much earlier ago, when the bug was introduced. OpenSSL being open source means anyone at any time could have "audited" the code and highlighted all the flaws -of which there could be more of this kind- and raised all the alerts. None did that. Really, 99% of these "experts" are not qualified to do such a thing. All bugs are trivial when exposed, yet to expose them one needs code
reading skills, test development skills and theoretical knowledge. Which is something not everyone has.

And we finally have in the deep end of the lies area we have the Open Source opponents perspective. Look at how this Open Source thing is all about a bunch of amateurs pretending that they can create professional level components that can be used by the industry in general. Because you know, commercial software is rigurously tested and has the backing support of commercial entities whose best interest is to deliver a product that works as expected.

And that is the most dangerous lie of all. Well intentioned programmers can propose unrealistic solutions, the "security" experts can parasite the IT industry a bit more but that creates at best inconvenience and at worst a false sense of security. But assuming that these kinds of problems will disappear using commercial software puts everyone in danger.

First, because all kind of sotfware has security flaws. Ever heard of patch Tuesday? Second, because when there is no source code, there is no way of auditing anything and you rely on trusting the vendor. And third, because the biggest OpenSSL users are precisely commercial entities.

However, as easy it is to say if after the fact, it remains true that there are ways of preventing future Heartbleed-class disasters: more testing, more tooling and more auditing could have prevented this. And do you know what is the prerequisite to do all these things? Resources. Currently the core OpenSSL team consists of ... two individuals. None of which are paid directly for development of OpenSSL. So the real root cause of Heartbleed is lack of money, because there could be a lot more people that could be auditing and crash proofing OpenSSL, if only they were paid to do it.

But ironically, it seems that there is plenty of money on some OpenSSL users, whose business relies heavily on a tool that allows to securely communicate over the Internet. Looking from this perspective, Heartbleed could have prevented if any of the commercial entities using OpenSSL had invested some resources on auditing or improving OpenSSL instead of profitting from it.

So the real root cause of Hearbleed lies in these entities taking away without giving back. And when you look at the list, boy, how they could have given back to OpenSSL. A lot. Akamai, Google, Yahoo, Dropbox, Cisco or Juniper, to name a few, have been using OpenSSL for years, benefitting from the package yet not giving back to the community some of what they got. So think twice before basing part of your commercial success on unpaid volunteer effort, because you may not have to pay for it at the beginning, but later on could bite you. A few hundred of millions of bites. And don't think that holding the source code secret you're doing it better, becase in fact you're doing it much worse.

Monday, 26 August 2013

You know, security is lately one of my biggest sources of irritation. More so when I read articles like this one. On the surface, the article is well written, even informative. But it also shows off most of what is currently wrong with computer security.

Security, like most other areas of the IT world, is an area of specialization. If you look around, you'll see that we have database, operating system, embedded system, storage and network experts. While it is true that the job role that has the best future prospect is the generalist that can can also understand and even learn deeply any subject, it is also true that after a few years of working focused on a specific subject, there is a general tendency to develop more deep knowledge in some subjects than others.

Security is no different in that regard, but has one important difference with all the others: what it ultimately delivers is the absence of something that is not even known. While the rest of the functions have more or less clearly defined goals in any project or organization, security can only provide as proof of effectiveness the lack -or a reduction- of security incidents over time. The problem is, while incidents in other areas of computing are always defined by "something that is not behaving as it should", in security an incident is "something that we did not even know could happen is actually happening"

Instead of focusing on what they don't know, the bad security focus on what they know. They know what has been used so far to exploit an application or OS, so here they go with their vulnerability and antivirus scanners and willingly tell you if your system is vulnerable or not. Something that you can easily do yourself, using the exact same tools. But is not often you hear from them an analysis of why a piece of code is vulnerable, or what are the risky practices you should avoid. Or how the vulnerability was discovered.

And that is part of the problem. Another part of the problem is their seemingly lack of any consideration of the environment. In a similar way to the "architecture astronauts" the security people seem to live in a different world. One where there is no actual cost-benefit analysis of anything and you only have a list of know vulnerabilities to deal with, and at best a list of "best practices" to follow. Such as "don't use bcrypt"

And finally, security guys are often unable to communicate in a meaningful way according to their target audience. Outside a few specialist, most people in the IT field (me included) lack the familiarity with the math skills required to understand the subtle points of encryption, much less the results of the years of analysis and heavy number theory required to even attempt to efficiently crack encryption.

Ironically, the article gets some of these points right. At the beginning of the article, there is an estimation of cracking cost vs. derivation method that should help the reader make an informed decision. There is advice about the bcrypt alternatives and how they stack one against each other.

But as I read further the article, it seems to fall into all these security traps: for example, the title says "don't use bcrypt", only to say on its first paragraph "If you're already using bcrypt, relax, you're fine, probably" Hold on, what was the point of the article then? And if you try to read the article comments, my guess is that unless you're very strong on crypto, you'll not fully understand half of them and will come up confused and even more disoriented.

But what better summarizes what is wrong with security is the second paragraph: "I write this post because I've noticed a sort of "JUST USE BCRYPT" cargo cult (thanks Coda Hale!) This is absolutely the wrong attitude to have about cryptography"

How is detailing the reason for using bcrypt a wrong attitude about attitude? The original article is a good description of the tradeoffs of bcrypt against other methods. That is not cargo cult. Not at least in the same way as "just use a SQL database", "just use a NoSQL database", "just use Windows" or "just use Linux" are cargo cult statements. Those statements are cargo cult only when taken out of context. Like the DBA that indexes each and every field in a table in the hope that sacrificing his disk space, memory and CPU to the cargo cult church will speed up things.

But the original article was not cargo cult. Not more than the "don't use bcrypt" article is cargo cult.

I guess that what I'm trying to say is that there are "bad" and "good" security. The "bad" security will tell you all about what is wrong with something and that you should fix all this immediately. The good security should tell you not only what is vulnerable, but also how to avoid creating vulnerabilities in the future. And provide you ready made and usable tools for the job. And articles like "don't use bcrypt" are frustrating in that they give almost what you need, but in a confusing and contradictory way.

When I choose a database, or operating system, or programming language, or whatever tool to do some job, I do it having only a superficial knowledge the trade offs of each option. But I don't have to be an specialist in any of these to decide. I don't know the nuts and bolts of the round robin vs. priority based and how O(1) task schedulers work. Or the details of a B-Tree vs. hash table index implementations. Or the COW strategy for virtual memory. I know the basics and what works best in each situation, mostly out of experience and education. True, with time I will learn the details of some of these as needed. But a lot of the time software developers are making really educated best guesses. And the more complex the subject -and crypto is one of the most- the more difficult these decisions are.

If I want to encrypt something, I want to have an encrypt function, with the encryption method as a parameter and a brief explanation of the trade offs of each method. And make it fool proof, without any way of misusing it. Yes, someone will find a way of misusing it and probably will be a disaster. Find ways of finding these misuses.

So please security guys, give us tools and techniques to prevent security issues. With a balanced view of their costs and benefits. And let the rest of the world sleep safely in their ignorance of 250 years of number theory. That is your real job. Creating huge repositories of vulnerabilities and malware signatures is not good enough. That in fact does little to protect us from future threats. Give us instead the tools to prevent these in the first place. And in a way that everyone can understand them.Thank you.

Friday, 17 May 2013

With all the talk about IT governance, risk management, security compliance and all that terminology, it seems that most IT people ignore the realities of the environment they are working on.

As an example, let's have a corporate security department, defining security standards and imposing them on the IT organization for almost all possible situations. All in the name of keeping the company away from security incidents, yes. They dismiss all objections about usability, convenience, and even how the security standards are relevant or not to the company business.

That latter point is a pet peeve of mine. It is very easy to define security standards if you ignore everything else and just apply the highest levels of security to everyone. By doing that, nobody is ever going to come back to you and say that the security is not good enough, because you are simply applying the strongest one. However, unless your company or organization is actually a secret security agency, you're seriously restricting usability and the ability of the systems to actually help people doing their jobs. But hey, that's not on my mission statement, right?

What they forget is that applying these standards implies adding overhead for the company. All these security policies not only add time and implementation cost to the company, but also create day to day friction in how people use their tools to accomplish their work.

Not unsurprisingly, the end result is that all these policies end up being overriden by exception. Let's see a few examples coming from real life. Or real life plus a bit of exaggeration to add some humor (note, in the following paragraphs you can replace CEO with whatever role has enough power to override a policy)

Everyone has to enter a 16 digit password that has at least two digits, special characters and use words that do not appear in the Bible. That is, until the CEO gets to type that.

Everyone has to use two factor authentication, until the CEO loses his/her RSA token or forgets to take it to the beach resort.

Nobody can relay or forward mail automatically to external accounts. Until the CEO's mailbox becomes full and Outlook does not allow him/her to respond to a critical message.

Nobody can connect their own devices to the office network. Until the CEO brings to the office his/her latest iPad.

Nobody can share passwords, until the CEO's assistant needs to update the CEO location information in the corporate executive location database. Security forbids delegation for some tasks and this is one of them.

Nobody can use the built in browser password store, until the CEO forgets his/her password for the GMail account that is receiving all the mail forwarded from his coporate account.

All internet access is logged, monitored and subject to blacklist filters. Until the CEO tries to download his/her son latest Minecraft update.

No end user can have admin rights on his/her laptop, until the CEO tries to install the latest Outlook add-on that manages his/her important network of contacts.

USB drives are locked, that is, until the CEO wants to see the interesting marketing materials given away in a USB thumb drive in the last marketing agency presentation, or wants to upload some pictures of the latest executive gathering from a digital camera.

I'm sure you can relate these examples to your real world experience. Now, except for a few perfectly understandable cases of industries or sectors where security is actually essential for the operations of the company, what do you think will happen? Experience tells me that the CEO will get an exception for all these cases.

The corollary is: security policies are only applicable for people without enough power to override them. Which often means that the most likely place for a security incident to happen is in... the higher levels on the company hierarchy. Either that or you make sure the security policy does not allow exceptions. None at all, for anyone. I'm sure that would make the higher company executive levels much more interested in the actual security policies and what they mean for the company they are managing.

Monday, 15 April 2013

My recent experience with an application upgrade left me considering the true implications of using proprietary data formats. And I have realized that they are an often overlooked topic, but with profound and significant implications that are often not addressed.

Say you live in a country where the law requires you to keep electronic records for 14 years. Do you think it is an exaggeration? Sarbanes-Oxley says auditors must keep all audit or review work papers from 5 to 7 years.You are carefully archiving and backing up all that data. You are even copying the data to fresh tapes from time to time, to avoid changes in tape technology leaving you unable to read that perfectly preserved tape -or making it very hard, or having to depend on an external service to restore it.

But I've not seen a lot of people make themselves the question, once you restore the data, which program you'll use to read it? Which operating system will that program run on? Which machine will run that operating system?

First, what is a proprietary data format? Simple, anything that is not properly documented in a way that would allow anyone with general programming skills to write a program to extract data from a file.

Note that I'm leaving patents out of the discussion here. Patents create additional difficulties when you want to deal with a data format, but do not completely lock you out of it. It merely makes things more expensive, but you'll definitely be able to read your data, even if you have to deal with patent issues, which are another different discussion altogether.

Patented or not, an undocumented data format is a form of customer lock in. The most powerful there is, in fact. It means that you depend on the supplier of the programs that read and write that data forever. But the lock in does not stop here. It also means that you are linking your choices of platform, hardware, software, operating system, middleware, or anything else your supplier has decided that is a dependency to read your data.

In the last few years, virtualization has helped somewhat with the hardware part. But still does not remove it completely, in that there could be custom hardware or dongles attached to the machine. Yes, it can get even worse. Copy protection schemes are an additional complication, in that they make it even more difficult for you to get at your data on the long term.

So in the end, the "data retention" and "data archiving" activities are really trying to hit a moving target, one that is very, very difficult to actually hit. Most of the plans that I've seen only focus on some specific problems, but all of them fail to deliver an end to end solution that really address the ability to read the legacy data on the long term.

I suppose that at this point, most of the people reading this is going back to check their data retention and archiving plans and looking for gaping holes in the plans. You found them? Ok, keep reading then.

A true data archiving solution has to address all the problems of the hardware and software necessary to retrieve the data over the retention period. If any of the steps is missing, the whole plan is not worth spending in. Unless of course you want your plan to be used as mean for auditors to thick the corresponding box in their checklist. It is ok for the plan to say "this only covers xxx years of retention, we need to review it in the next yyy years to make sure daat is still retrievable", it is at least much better and more realistic than saying "this plan will ensure that the data can be retrieved in the following zzz years" without even considering that way before zzz years have passed the hardware and software used will become unsupported, or the software supplier could disappear without no one able to read the proprietary data format.

There is an easy way of visualizing this. Instead of talking about the business side of record retention, think about your personal data. All your photos and videos of your relatives and loved ones, taken over the years. All the memories that they contain, they are irreplaceable and also they are something you're likely to want to access in the long term future.

Sure, photos are ok. They are in paper, or perhaps in JPG files, which are by the way very well documented. But what about video? Go and check your video camera. It is probably using some standard format, but some of them use weird combination of audio and video codecs, with the camera manufacturer providing a disk with the codecs. What will happen when the camera manufacturer goes out of business or stops supporting that specific camera model? How you will be able to read the video files and convert to something else? That should make you think about data retention from the right point of view. And dismiss anything that is in an undocumented file format.

Monday, 11 February 2013

Short story: we have a strange TIFF file. There has to be an image somehow stored there, but double clicking on it gives nothing. By the way, this file, together with a million more of them, contains the entire document archive of a company. Some seven years ago they purchased a package to archive digitized versions of all their paper documents, and have been dutifully scanning and archiving all their documents there since then. After doing the effort of scanning all those documents, they archived the paper originals off site, but only organized them by year. Why pay any more attention to the paper archive after all? In the event of someone wanting a copy of an original document, the place to get it is the document archiving system. Only in extreme cases the paper originals are required, and in those cases yes, one may need a couple of hours to locate the paper original, as you have to visually scan a whole year of documents. But is not that of a big deal, especially thinking about the time saved by not having to classify paper.

All was good during these seven years, because they used the document viewer built into the application that works perfectly. However, now they want to upgrade the application, and for the first time in seven years have tried to open one of these files (that have the .tif extension) with a standard file viewer. The result is that they cannot open the documents with a standard file viewer, yet the old application displays them fine. Trying many standard file viewers at best displays garbage, at worst crashes the viewer. The file size is 700K in size, the app displays them perfectly, so what exactly is there?

Some hours of puzzling, a few hexdumps and a few wild guesses later, the truth emerges: the application is storing files with the .tif extension, but was using its own "version" of the .tif standard format. Their "version" uses perhaps the first ten pages of the .tif standard and then goes on its own way. The reasons for doing this could be many, however I always try to keep in my mind that wise statement: "never attribute to malice what can be adequately explained by incompetence"

The misdeed was, however, easy to fix. A quite simple 200 line C program (including comments) was able to extract the image and convert it to a standard file format. At least on my Linux workstation.

I was very happy with the prospect of telling the good news to the business stakeholders: your data is there, you've not lost seven years of electronic document archives, it is actually quite easy and quick to convert these to a standard format and you can forget about proprietary formats after doing that. However, I then realized that they used Windows, so I had to compile the 200 line C program in Windows just to make sure everything was right.

Checking the source, I could not spot any Linux specific things in the program, all appeared to be fairly vanilla POSIX. However what if they are not able to compile it, or the program does something differently? This is one of the moments when you actually want to try it, if only to be absolutely sure that your customer is not going to experience another frustration after their bitter experience with their "document imaging" system and to also learn how portable your C-fu is across OSs. Too many years of Java and PL/SQL and you get used to think that every line of code you write has to run unchanged anywhere else.

So I set myself to compile the C source in Windows before delivering it. That's where, as most always, the frustration began. The most popular computing platform became what is now, among other things, by being developer friendly. Now it seems that it is on its way to become almost developer hostile.

First, start with your vanilla Windows OS installation that likely came with your hardware. Then remove all the nagware, crappleware, adware and the rest of things included by your friendly hardware vendor in order to increase their unit margins. Then deal with Windows registration, licensing or both. Then patch it. Then patch it again, just in case some new patches have been released between the time you started the patching and now that the patching round has finished. About four hours and a few reboots later, you likely have an up to date and stable Windows instance, ready to install your C compiler.

Still with me? In fairness, if you already have a Windows machine all of the above is already done, so let's not make much ado about that. Now we're on the interesting part, downloading and installing your C compiler. Of course, for a 200 line program you don't need a full fledged IDE. You don't need a profiler, or debugger. You need something simple, so simple that you think one of the "Express" editions of the much renowned Microsoft development tools will do. So off we go to the MS site in order to download one of these "Express" products.

So you get here and look at your options. Now, be careful, because there are two versions of VS Express 2012. There's VS Express 2012 for Windows 8 and there's VS Express 2012 for Windows Desktop, depending if you're targeting the Windows store or want to create... what, an executable?. But, I thought Windows was Windows. In fact, I can run a ten year old binary on Windows and will still work. Oh, yes, that's true, but now MSFT seems to think that creating Windows 8 applications is so different than creating Windows Desktop applications that they have created a different Express product for each. Except for paying VS customers, who have the ability to create both kinds of applications with the same product. Express is Express and is different. And you don't complain too much, after all this is free stuff, right?

As I wanted to create a command line application, without little interest in Windows Store, and without being sure of whether an inner circle of hell awaited if I choose one or the other, I simply choose VS Express 2010. That will surely protect me from the danger of accidentally creating a Windows Store application, or discovering that command line apps for example were no longer considered "Windows Desktop Applications" You may think that I was being too cautious or risk averse at this point, but really, after investing so much time in compiling a 200 line C command line utility in Windows I was not willing to lose much more time with this.

Ah, hold on, the joy did not end there. I finally downloaded VS 2010 Express and started the installation, which dutifully started and informed me that it was about to install Net 4.0. How good that the .Net 4.0 install required a reboot, as I was starting to really miss a reboot once in a while since all the other reboots I had to do due to the patching. At least the install program was nice enough to resume installation by itself after the reboot. Anyway, 150 MB of downloads later, I had my "Express" product ready to use.

What is a real shame is that the "Express" product seems to be, once installed, actually quite good. I say "seems" because I did not play with it much. My code was 100% portable in fact, and it was a short job to discover how to create a project and compile it. Admittedly I'm going to ship the executable to the customer the build with debug symbols, as I was not able to find where to turn off debug information. Since the program is 30K in size, that's hardly going to be a problem, and if it is, it's 100% my fault. To be honest, I lost interest in VS Express 2010 quickly once I was able to test the executable and verify that it did exactly the same as the Linux version.

But the point is, in comparison, I can build a quite complete Linux development environment in less than two hours, operating system installation included, incurring in zero licensing cost and using hardware much cheaper than the one needed to run Windows. Why is that to create a Windows program I need to spend so much time?

What happened to the "developers, developers, developers" mantra? Where is it today? Anyone old enough can remember the times when Microsoft gave away free stacks of floppy disks to anyone remotely interested in their Win32 SDK. And those were the days without internet and when CD-ROMs were a luxury commodity. And the days when IBM was charging $700 for their OS/2 developer kit. Guess who won the OS wars?

Things have changed, for worse. Seriously, Microsoft needs to rethink this model if at least they want to slow their decline. At least, I guess I've discovered one pattern that probably can be applied to any future OS or platform. Today, to write iOS/MacOS programs you need to buy a Mac and pay Apple $100. The day it becomes more difficult, complex, or expensive (as if Apple hardware were cheap), that day will be the beginning of the end for Apple.

Tuesday, 5 February 2013

A bit late, but time to review what has happened with my 2012 predictions. Since the score is clearly favorable to me, please allow me the time to indulge in some self congratulation, and offer also my services as a technology trend predictor at least better than big name market analysis firms. No, not really. But nonetheless having scored so high deserves some self appraisal, at least.

The bad

Windows becoming legacy. I was wrong on this one, but only on the timing. Microsoft's latest attempt to revive the franchise is flopping on the market, to the tune of people paying for getting Windows 8 removed from computers and replaced by Windows 7. Perhaps Redmond can reverse the trend over time, perhaps Windows 9 will be the one correcting the trend. But they have already wasted a lot of credibility, and as time passes it is becoming clear that many pillars of the Windows revenue model are not sustainable in the future.

Selling new hardware with the OS already installed worked well for the last twenty years, but the fusion of the mobile and desktops, together with Apple and Chromebooks are already eroding that to a point where hardware manufacturers are starting to have the dominant position in the negotiation.

The link between the home and business market is broken. Ten years ago people were buying computers essentially with the same architecture and components for both places, except perhaps with richer multimedia capabilities at home. Nowadays people are buying tablets for home use, and use smartphones as complete replacements of things done in the past with desktops and laptops.

On the server side, the open source alternatives gain credibility and volume. Amazon EC is a key example where Windows Server, however good it is, it is being sidetracked on the battle for the bottom of the margin pool.

JVM based languages. I was plain wrong on this one. I thought that the start of Java's decline would give way to JVM based alternatives, but those alternatives, while not dead, have not flourished. Rails keeps growing, PHP keeps growing and all kind of JavaScript client and server based technologies are starting to gain followers.

As for compuer security... well, the shakeup in the industry has not happened. Yet. I still think that the most of the enterprise level approach to security is plain wrong, focused more on "checklist" security than on actual reflection of the dangers and implications of their actions. But seems that no one has started to notice except me. Time will tell. In the end, I think this one was more of a personal desire than a prediction in itself.

The good

Mayan prophecy. Hey, this one was easy. Besides, if it were true, I won't have to acknowledge the mistake on a predictions result post.

Javascript. Flash is now irrelevant. Internet connected devices with either no Flash support at all or weak Flash support have massively outnumbered the Flash enabled devices. jQuery and similar technologies now provide almost the same user experience. Yes, there are still some pockets of Flash around, notably games and the VMWare console client, but Flash no longer is the solution that can be used for everything.

NoSQL. I don't have hard data to prove it, but some evidence -admittedly a bit anecdotal- from its most visible representative, MongoDB, strongly suggest that the strengths and weaknesses of each NoSQL and SQL are now better understood. NoSQL is no longer the solution for all the problems, but a tool that, as any other, has to be applied when it is most convenient.

Java. I have to confess that I did not expected Java to decline so quickly, but as I said a year ago, Oracle had to change a lot to avoid that, and it has not. The latest batches of security vulnerabilities (plus Oracle's late, incomplete and plain wrong reaction) have finally nailed the coffin for Java in the browser, no chances of going back. A pity, now that we have Minecraft. On the server side, the innovation rate in Java is stagnated and the previously lightweight and powerful framework alternatives are now seen as bloated and complex as their standards derived by committee brethren.

Apple. Both on the tablet and mobile fronts. Android based alternatives already outnumber Apple's products in volume, if not in revenue. And Apple still continues to be one of the best functioning marketing and money making machines on the planet.

MySQL. This one really is tied down again to Oracle's attitude. But it has happened, both for the benefit of Postgres and the many MySQL forks (MariaDB, Percona, etc) that keep in their core what made MySQL so successful.

Postgres. In retrospect, that was easy to guess, given the consistent series of high quality updates received in the last few years and the void left by Oracle's bad handling of MySQL and the increasingly greedy SQL Server licensing terms.

Windows Phone. Again, an easy one. A pity, because more competition is always good. As with Winodws 8, it remains to be seen if Microsoft can -or want to- rescue this product from oblivion.

Will there be any 2013 predictions now that we're in February?

On reflection, some of these predictions were quite easy to formulate, if somehow against what the general consensus was at the time. That's why there is likely not going to be 2013 predictions. I still firmly think that Windows will go niche. That is happening today, but we have not yet reached the "Flash is no longer relevant" tipping point. You'll know that we've arrived there when all the big name technologists start saying that they were seeing it coming for years. But they have not started saying that. At least yet.

Anyway, this prediction exercise left my psychic powers exhausted. Which is to say, I don't have that many ideas of how the technology landscape will change during 2013. So as of today, the only prediction I can reliably make is that there won't be 2013 predictions.

It has been a few months since my latest post, and I've been quite busy with other interests during these times, but finally got some time to reflect and post a few updates.

Last time I wrote something, it was my intention to start playing around with Android applications.
Note that in this context, "applications" means software packages
where the final user is also the one who is paying for the application.
Enterprise packages can have notoriously bad user interfaces and people
using these can complain as much as they want, but at the end they are
being paid for using them, and unless someone can positively prove some
productivity gains of a UI upgrade, these user interfaces will remain
there now and forever.

Android applications fall squarely on the category where asking someone for money raises the level of expectations. Nowadays, the race to the bottom in pricing applications has left very little margin per unit sold. Very few Android apps cost more than 99 cents, the underlying idea is that you'll make it up what is lost in per unit margins by leveraging the sheer market size of the billions of Android devices and leveraging the sales volume. The end result is that for such low amount of money, the users are expecting polished, well designed, reliable and well behaved applications.

Compound that with the problem of market saturation. "There is an app for that" is a very convincing slogan, and is also true in the Android market. Almost all types of market niches for applications have already been occupied. It's very hard to think of an application that is not either already done well enough to occupy its niche or has enough free good enough alternatives that nobody is seriously thinking of making money selling one. There is always the ad-supported option, of course, but that is something that introduces a lot more uncertainty in the equation.

(now someone will say that the market saturation problem is only an idea problem, and will be probably right. Could be entirely my own problem not being able to come up with new ideas)

So far I've created very few things worth trying to sell, or even give away. But all is not lost, at least this experience has reminded me of an important fact that I have almost forgotten: developing applications is difficult. I mean, one gets used to look only at the server side portions of an application and analyzing them in detail, while essentially ignoring all the other components.

The phone development environment starts by throwing you back to the days of the past. Seemingly innocent development decisions have consequences on CPU and RAM usage that you're used to discard as transient spike loads on a desktop or server, but in those limited machines can make or break the difference between an usable application and one that the OS decides to close because it's taking to long to respond or too much memory to run.

What we take today for granted, such as dealing with different timezones (with different daylight saving time rules changing from year to year), different character sets and different localization rules are the results of lots of people working during lots of time, including doing such unglamorous things as standards committees. Those are amazing achievements that have standardized and abstracted huge portions of application specific functionality, but even so, they are only a small part of the scope that an application has to provide.

And let's face it, the most unpredictable, irrational, demanding and unforgiving component in any software application is the human sitting in front of it. In any application, even the trivial looking ones, there is a lot of user interaction code out there that has to deal with human events happening in crazy order, data entered in weird formats that is expected to be understood and business rules that have to match the regulatory landscape changes of the last fifty years or so.

Further proof of that: the number one category of security vulnerabilities is exploiting memory management errors (buffer overflows, use of orphan pointers) by... usually sending the application malformed input. This is not by accident, dealing with user input correctly is one of the hardest parts of creating a satisfactory user experience.

Let's not even add the regulatory compliance, audit requirements, the integration requirements with the rest of the environment -perhaps using those beloved text files- and the technical standard compliance and cross platform requirements.

All this adds up to a delicate balance between the user experience, the real world metaphors and processes being modeled and implemented, and the technical environment. And all this for 99 cents.
I'm not dropping completely the idea of selling some day an Android application, but it will have to wait for the right idea to come, and also for the necessary time to execute it properly.

There is also an emerging market for Android applications, one that is starting to surface and gaining momentum, as business adoption of Android and iPhones expands: the enterprise application, mobile version. Yes, expect some of these ugly use interfaces to be ported over to mobile platforms and likely this is the next big revenue source for mobile developers. And of course, I expect these applications to have performance issues, too.

But so far, my biggest learning is not with the ADK, Dalvik, ICS vs. Jelly Bean or Eclipse, for that matter. My biggest learning from all this is that there is a world of difference between focusing on a single area of an application and improving its performance or resource usage and delivering a complete application. That requires a different skill set. And after looking for a while at creating mostly toy Android applications, I'm glad that this experience has reminded me of all this. Too long living in the ivory tower can make you forget that these simple things are, in fact, quite complex.

Sunday, 6 May 2012

The journey begins

What? Hey, you are usually focused in ranting about random topics, database performance, and generally proving the world how smart you are. Why then this sudden curiosity for creating an Android application?

It is part curiosity, part opportunity. As they say, opportunities are there waiting for someone that is in the right place at the right time to catch them. I'm not that one, for sure, but still, after the sad news that come from the Java camp, I wanted to explore new ways of writing applications.

Of course, it also helps if the potential audience for your application is numbered in the hundred of millions, if not more.

So, I wanted to develop a simple Android application. Being a Linux aficionado, and looking at the Google docs, Eclipse under Linux seemed like the main opportunity. Let's start with the basics.

Setting up the stage

First, install the Android SDK. Well, the Android SDK is just a zip file that you extract somewhere in your local disk. According to what I read later, one can create whole applications with the SDK without needing any IDE at all. It has been a long time since I created user interfaces out of raw hexadecimal dumps, so I'm not one of those brave souls. In any case, take note of the folder where you extract the Android SDK. You'll need it later.

Android likes you to use Eclipse to create applications. Perhaps, after my long stint with NetBeans it's time to go back to Eclipse again? For some reason, I tend to go from NetBeans to Eclipse and back each year or so. I tend to like the all-included NetBeans philosophy, whereas Eclipse is the place where the minority and cutting edge tools start to appear. This time is back to Eclipse, I guess.

So go to Kubuntu and start Muon. Oh, or Software Center or something similar if you're using Ubuntu. Make sure Eclipse is installed. Start Eclipse to make sure everything is ok. Choose a suitable folder as your workspace.

Next, you can finally go tohttp://developer.android.com/sdk/eclipse-adt.html#installing and attempt to follow the steps to install the Eclipse infrastructure for Android. You go to Help->Install new software. You add https://dl-ssl.google.com/android/eclipse/ to the Eclipse list of sources, select the Developer Tools, click next and after a quite long pause you get... an error.

Cannot complete the install because one or more required items could not be found.
Software being installed: Android Development Tools 16.0.1.v201112150204-238534 (com.android.ide.eclipse.adt.feature.group 16.0.1.v201112150204-238534)
Missing requirement: Android Development Tools 16.0.1.v201112150204-238534 (com.android.ide.eclipse.adt.feature.group 16.0.1.v201112150204-238534) requires 'org.eclipse.wst.sse.core 0.0.0' but it could not be found

This is one of these errors that if it were not for Google, I'd never be able to resolve. Fortunately, a noble soul has documented the fix, even with a video here. Thanks a million. However, I'm feeling that this is threading into waters that I don't know well enough. There is something very good about the Internet. Being able to tap such huge resources of information is fantastic, but am I really learning something by applying the fix? Yes, that there are people out there that know a lot more than I. Better respect these people and try to contribute something back, like with this article.

Are you ready to create your first Android app? Not yet. When you restart, Eclipse warns you that you have not selected an Android SDK. Go and define one, choosing the right API level for your target and using the folder where you extracted the SDK package. My target is going to be Android version 2.1, just because I happen to have a phone that runs that version.

Wednesday, 21 March 2012

If you're about to purchase a smartphone, a tablet, or even a PC, you probably have already noticed it: Microsoft now has become a niche player.

It is all about how the balance of producers and consumers of content has evolved. When the PC revolution started, PCs were used to create content that was consumed by other means. PCs were, and they are still, used to create music, graphics, movies, books or movies. They were used to enter data. But the content was primarily consumed in non electronic forms. Magazines, theatres, records. Paper, film or vinyl. Computers helped to create content that was consumed in other mediums.

The only exception of this rule was, and still is, data processing applications. Data is entered in an application, and then transformed and retrieved in many ways, but the results rarely go out of the application, perhaps they are interfaced with other applications and is transformed. But the ratio between the amount of transactions entered and the volume of information that is extracted is increasingly smaller. Data is condensed in tiny amounts of information for dashboards, account statements or check balances.

Then things started to go digital. Content created on computers is increasingly consumed only on electronic devices. And the PC was the main device used to consume content. Databases, on the other hand, increased in size and complexity, with each evolution of the technology, each iteration generating bigger and bigger amounts of data. A significant trend is that the most of today's data is directly entered by the end user, be it plane reservations, shopping carts or generated based on clickstreams from web sites. There are less and less data entry clerks, for each iteration of process optimisation attempts to reduce or eliminate the need for human intervention. Warehouses and store shelves are full of bar code labels that reduce or eliminate data entry to its minimal expression.

Ten years ago, if you wanted to do anything useful with a PC, there was little choice but use Windows. It was the result of a three pronged approach: the tight control Microsoft exerted over the hardware manufacturers ensured that Windows was a popular, even cost effective choice, for PC hardware. Their product portfolio covering such a wide surface of applications allowed them to offer very seductive deals to their customers. In the database area, for example, it was not uncommon years ago to hear someone going to standardise in SQL Server, and learning from insiders that the product was throw in the box close to free as part of a much larger deal involving workstation, office and server software. And finally, their lock in in the proprietary formats and protocols kept everyone else from making competing products.

When the PC was the only device capable of running applications for content creation, there was little choice but use Windows. When the mainframe terminal died, the PC was the only alternative for data entry.

The world of today is different. The balance of content creators versus content consumers has shifted. Content can be created and consumed in many different ways, all of them completely digital. There are now orders of magnitude more devices in the world capable of running applications than personal computers running Windows. New classes of devices (phones, tablets, settop boxes, book readers) have separated clearly the roles of creator and consumer. You no longer need to use the same device for creating and consuming content. Data entry happens by means of bar code scanners or users entering the information themselves, and behaviour data is collected automatically by web logs or TV settop boxes.

And almost none, if not all, of those devices run Windows. Windows and windows applications have failed to move to these scenarios, except when they have managed to hide an embedded PC inside the devices (think of ATMs). At this point, I can only see three Windows use cases, and each is getting weaker and weaker.

Enterprise applications and office productivity: that is a now niche that is restricted only to people needing five year old applications that depend on Windows being compatible to run them. That plus people at home that want to have a home computing environment similar to the one in the office. This segment is being attacked very effectively by cloud services and apps, but the inertia here is huge, so it's going to last them a few years. It is also the most profitable, so expect Microsoft to fight to death to preserve it.

Content creators: people that still need the full power and ergonomy of a desktop or laptop computer to create content. Note that even with the empowerment of the digital technology to create, the ratio of content creators vs. content consumers is still like 1 to 1000. This is not very profitable for Microsoft, but is a key segment because this channel in the past has served to promote content in propietary formats (VB, C#, SliverLight, Office formats, WMA, .AVI, DRM music, .avi,....) that were essential to increase the desirability of their products for the consumer segment. Unfortunately for them, open standards and/or reverse engineering of formats and aversion to DRM are destroying the virtuous cycle of created content that can only be consumed on the Windows platform.

People that simply want a computer for basic tasks (browsing, mail, light content creation) and make a cost conscious purchase. It is actually true that Windows PCs are cheaper than Macs. While this is likely Microsoft's safest niche for now, it is so for a reason: this segment is the bottom of the barrel in terms of profitability. And both Mac and open source based alternatives are eroding market share from both the long and short ends of the profitability spectrum.

Microsoft Windows can now be considered a niche player in these three segments. It is a huge niche, and most anyone else would be happy to own these niches, but still a niche nonetheless. Either because of self complacency, protection of their cash cows, or
lack of vision, Microsoft has failed to make any significant presence in
any new technology since the year 2000 or so. The cruel irony is that protecting those niches is also what has lead them to losing in other segments. Disruptive players do not care about preserving their legacy because they don't have one to preserve.

Some of you may point to the XBox as a counter example. Check the financials of the Microsoft console division and see how long, if ever, they will recover all the money thrown to make XBox fight for number two or three in the console market before progressing the discussion.

In the database arena, things have been very similar. SQL Server has always been limited in scale by the underlying Windows platform. SQL Server could only grow as far as the type and number of CPUs (Intel or Alpha in the early days) word and RAM size of the Windows OS, and this prevented it being used for big loads, or even small or medium loads if there were plans to make them bigger. Since the definition of "big load" keeps changing with Moore's law, SQL Server has never made any serious inroads beyond the medium sized or departmental database, facing competition from above (Oracle, DB2) and below (open source) Could Microsoft have made SQL Server cross platform and have it running on big iron? Probably, at an enormous expense yes. But that would also miss the nice integration features that made it such a good fit to run under Windows. And also the reason to buy a Windows Server license.

And when SQL Server was seemingly ready for the enterprise, a number of competitors arrived that made unnecessary to host your database on your own server (Amazon). Or to have a relational database at all (NoSQL). Could Microsoft have moved earlier to prevent that? Probably, but that would have required first to foresee it, and it would have happened at the expense of those lucrative Windows licenses sold for each SQL Server instance.

So the genie is now out of the bottle, and Microsoft can't do anything to put him back in. They are now niche players. Get used to it. The next point of sale terminal may not be a PC with a connected cash drawer .

Wednesday, 29 February 2012

TL;DR: database version control does not fit well with source code version control practices because challenges associated with data size.

I could not help but think about posting a comment on this well written blog post, but realized that it was a topic worth discussing at length in a separate entry. If you've not clicked and read already the rather interesting article, here's the summary: there is a big impedance mismatch between what are the current best practices regarding changes in source code and change control inside databases. The solution proposed is quite simple: develop a change script, review it, test it, do it in production. If anything fails, fix the database by restoring a backup and start all over again.

For added convenience, the author proposes also to store the scripts in the database itself, so that everything is neatly documented and can be reproduced at will.

There are a number of very interesting comments proposing variations on this process, and all of them really reflect some fundamental problems with databases that do not have their reciprocal in the world of source code control. While I seriously think that the author is basing his proposal on real world experience and that the process works well for the systems he's involved with, there are a few environmental factors that he is ignoring that render the approach impractical in some cases. It is as if he is falling into the classic trap of believing that everyone's view of the world has to be the same as yours. Here are a few reasons why not.

Databases can be huge

This is the crux of the problem. Code is the definition of some data structures plus how to process them. Databases contain structure, data and perhaps also instructions on how to deal with data. Compilers can process the source code in a matter of minutes, but adding the data takes much longer. Either by restoring files, running scripts or whatever other means, there is no way to avoid the fact that the time to restore data is at least a couple of orders of magnitude above the time needed to compile something.

This makes all the "simple" strategies for dealing with databases fail above certain size, and break the agilistic development cycles. In the end, if you want to have continuous integration or something similar, you simply cannot afford to start from an empty database in each build cycle.

Big databases have big storage costs

In an ideal world, you have your production database, plus a test environment plus a copy of the production database in each development workstation so that you can make changes without disturbing anyone. This works as long as the data fits comfortably in a hard disk.

In the real world, with a database big enough, this is simply not possible. What ends up happening, in the best case, is that developers have a local database with some test data seeded in it. In the worst case, all developers share a database running in a server that is not likely able to hold a full copy of the production environment.

Performance testing for big databases is done
usually on what is sometimes called a pre-production environment: a full
copy of the production database restored on separate storage. That
already doubles the total cost of storage: production environment plus
pre-production environment. For each additional environment you want to have, say, end user testing, you're adding another unit to the multiple.

Before you blame management greed for this practice, think again. Each full copy of the production database is increasing storage costs linearly. For $100 hard disks, this is perfectly acceptable. For $100.000 storage arrays, it is not.

We've had for decades Moore's law on our side for RAM capacity and CPU power. But the amount of data that we can capture and process is increasing at a much faster rate. Of course, having infinite infrastructure budgets could help, but even the advocates of setting up the best developemnt environment agree that there are limits on what you can afford to spend on storage.

One promising evolution of storage technology is the snapshot-copy on
write based systems. They provide nearly instantaneous copy time -only
metadata is copied, not the actual data- and only store what is changed
across copies. This looks ideal for big databases with small changes,
but is unlikely to work well with databases with big changes, either big
or small, as you're going to pay the "price" -in terms of amount of
storage- of the full copy at the time you do the changes. But, don't forget that the copied databases will be impacting the performance of the source database when they access unchanged parts. To prevent that from happening, you need to have a standalone copy for production, and another for pre-production, and another for development. So at a minimum, you need three different copies.

Restores mean downtime

So does application code upgrades, one could say. And in fact they do. However, the business impact of asking someone to close an application or web page and reopening it later can be measured in minutes. Restoring a huge database can mean hours of downtime. So it's not as easy as saying "if anything goes wrong, restore the database" Even in a development environment, this means developers waiting hours for the database to be available. In a big enough database, you want to avoid restores at all costs and if you do them, you schedule them off hours or in weekends.

Data changes often mean downtime too

While in the past adding a column to a table required an exclusive lock on a table, or worse, on the whole database, relational DB technology has evolved to the point of allowing some data definition changes not to require exclusive access to a table. However, there are still some other changes that need that nobody else is touching the object being changed. While not technically bringing down the application, this in practice means that there is a time frame when your application is not available, which in business terms means downtime.

It's even worse: changes that don't need exclusive locks it usually run inside a transaction, which can represent a significant resource drag on the system. Again, for a small database this is not a problem. For a big enough database, it is not likely to have have enough resources to update 100 million records and at the same time allow the rest of users to use the application without taking a huge performance hit.

Is there a way of doing it right?

Simple answer: yes. Complex answer: as you increase the database size, the cost of doing it right increases, and is not linear. So it is going to become more and more difficult, and it's up to you to decide where the right balance of safety and cost is.

However, a few of the comments in the post suggested improvements that are worth considering. In particular, having an undo script for each change script seems to me the most reasonable option. Bear in mind that some kind of data changes do not have an inverse function: for example UPDATE A SET A.B=A.B*A.B is always going to yield B positive regardless of the sign of the original value of B. In those cases, the change script has to save a copy of the data that before updating it. With that addition, at least you have some way of avoiding restores. This does not completely remove the problem of downtime, but at least mitigates it making it shorter.

This, plus the practice of keeping the scripts inside the database, has also the added benefit of keeping the database changes self contained. That means less complexity should you need to restore, which is something DBAs appreciate.

According to the different scales, these are then the different scenarios:

Small database: ideal world, one full production copy for each developer. Use the original process. When moving to production there is a low chance of problems, but if they appear, use the original process: restore and go back to step 1.

Not pretty or perfect, but likely the world we live in. And note that NoSQL does not change anything here, except on the simplistic views of those novices to the NoSQL or development world at large. In some NoSQL implementations, you don't have to do anything special because there is no real database schema. You only change the code, deploy and everything is done. Really? Well, yes if you don't count all the places where you, or anyone else, made assumptions about the implicit structure of your documents. Which could be everywhere any of that data is being used. The relational world has some degree of enforced structure (keys, cardinality, nullability, relationships, etc) that makes certain category of errors to surface immediately instead of three weeks after the code is in production.

Tuesday, 7 February 2012

Today, Canonical announced that they are relegating Kubuntu, one of their "official" variants of their flagship Ubuntu Linux distribution, to the same status as the other distribution derivatives.

Canonical is the brainchild of Mark Shuttleworth, a dot
com boomer that wanted to give back to the same community that provided some
of the
wonderful FOSS software that helped him becoming a millionarie. When it
started, Canonical did not had any clear financial constraints, or
objectives for that matter. Bug #1
in Ubuntu's bug database simply reads "Microsoft has a majority market
share" Was Canonical's objective to take away market share from Windows?
At the time, that seemed to be a bold statement, but the first Ubuntu
releases were making giant strides towards that objective, to the point
of being considered a credible alternative by many established players,
including Microsoft itself.
Kubuntu, one of the Canonical projects, is an attempt to merge the friendliest Linux distribution -Ubuntu-
with the desktop environment that I find that is the closest fit for a
Mac or Windows user: KDE. An ideal combination for someone that switches between operating systems or is a seasoned user that wants to move away from the proprietary environments.

What the latest announcement means essentially is that the single individual paid by Canonical to develop and maintain Kubuntu will no longer be assigned to that role, and any further developments in Kubuntu will have to come from the community instead. This does not necessarily implies that there will be no more Kubuntu releases after Pangolin: for example, the Edubuntu community has managed to keep up with releases.

It's not that Kubuntu users are left in the cold, however. The next
release (Precise Pangolin) of the Ubuntu family will still include
Kubuntu, and being a Long Term Support (LTS) one, means that existing
Kubuntu users will get patches and support for the next three years.

So what's so important about the announcement, then? By itself, very little, except for the small minority of Kubuntu users. Kubuntu had not enough user base, reputation or visibility to be worth keeping, hence Canonical has retired the single individual dedicated to Kubuntu because it does not make economic sense to keep paying him to do that.

What is important is not the announcement, is the trend: Canonical is more and more
taking decisions based on economic, not idealistic, considerations.

Now, those idealistic goals are set aside more and more in search of a more mundane objective: profitability. The turning point was reached last year: they released Unity, a new desktop environment, targeted at non computer users, with an eye on using it as the interface for Ubuntu based touch devices and other non-PC environments. That was a big change that was received by the existing user community with a lot of backslash, yet Canonical is firmly resoluted to develop Unity in spite of that, and not willing to devote time or resources to keep alive an alternative to Unity.

What Canonical wanted to be at the beginning was not clear, but now it is: to be profitable.

And is hard to blame Canonical for not trying. After all, they have an extensive -read, expensive- staff dedicated to the many projects they sponsor. They have clearly invested a lot of effort -read, money- into many initiatives targetting everything from the office productivity desktop to the settop TV box, with incursions in the music store business, cloud storage, cloud OS and management, alternative packagings (Kubuntu still appears there at the time of writing this, by the way), education, home multimedia hubs, corporate desktop management and who knows what else.

All these projects have generated a considerable user base, at least in comparison with previous attempts, and helped Canonical to accomplish Shuttleworth original intention of giving back to the community, even if there are differing opinions on how much actual contribution has been made.

Yet, none of these projects seems to have generated a respectable enough line of business. Maybe some of these projects are self sustaining, maybe some of them generate some profit. But are they taking over the world? No. Are they going to be a repeat of the first Shuttleworth success? No. Are they making headlines? No. All that investiment is certainly producing something that is valued and appreciated by the open source community, but profitable is not.

At least yet. How do I know ? Honestly, I don't know for sure. What I know for sure is that any degree of significant success would be heralded and flagged as an example by the always enthusiast open source community. And that is not happening.

So Canonical is looking for profitability, big time. And if it means losing all their existing user base, so be it. Which will not be a big loss, because their existing user base is demonstrably not very profitable to begin with.

Which makes complete sense from a business perspective. If I was a millionare and had put a lot of my own personal fortune in something, I'd be expecting to see something back. Another, completely different issue, is how they can become profitable. That would mean looking into Unity, their biggest bet so far, and... well, you already know my opinion. Is a keyboard search the most effective way of finding things in a device without a physical keyboard? You judge.

Anyway, it was about time to change distro. With all due respect for the huge contribution Canonical and Ubuntu have made to build a robust, flexible and fast desktop environment. Until they stopped wanting to do that, of course.

Thursday, 29 December 2011

Unlike other years, where I wrote my "predictions" entry at the end of the year, this time I'm going to try to predict what is going to happen in a few technology areas in 2012. Oh yes, another not so database centric post. Well, it does have some database content. Read on.

First, I don't believe in the "the year of...." idea. At least applied to technology, it does not make much sense. There has not been a "year of Facebook", "year of Windows Server", "year of Lotus", "year of Office", "year of Novell" or "year of Google". We have never had a "year of Oracle", "year of SQL Server" or even a "year of iPad". True, some of these products have been very successful at launch time, reaching quickly a lot of popularity. Some people tend to take these launch dates as inflection points in tendencies, but they tend to forget how strongly have keep growing over time. No successful product I can think of -and correct me if I'm wrong- has been launched, ignored for a long while, and then boomed.

So don't expect these predictions to say "2012 will be the year of..." because 2012 will not be the year of anything. But in my opinion, 2012 will be the year when we will see some technologies emerging and others starting to disappear in the sunset.

MySQL share will go down. Oracle has failed to keep the hearts and minds of database developers. As a product, MySQL has nearly an infinite life span ahead, giving its huge momentum. But don't expect to be at the forefront of innovation. Unless Oracle as a company becomes something completely different from what Oracle is today, MySQL is going to remain the cheap SQL Server alternative, because everything else implies a threat to their other profit lines.

Java will finally start to lose momentum. Again, Oracle has to change a lot from what is today for this not to happen. From what I read about the evolution of the language, and the attempts to revive the ill fated JavaFX, Java is stagnating and becoming a legacy language. Notice I say momentum, not market share. During next year, less and less new projects starting from scratch will use Java, but that at the moment that is a small blip in the radar.

Windows Phone will have an agressive marketing push in 2012. Windows Phone will fail, crushed by the brand superiority of Apple and the massive spread of Android to... everywhere else.

Windows will become legacy. Yes, Windows 7 is not bad. Windows Server 2008 is not bad. But both are sandwiched in their respective niches. New client technologies (tablets, phones) are challenging the old king of the desktop. And in the server front, the combination of Cloud/SaaS growth and commoditization of basic enterprise services is challenging its dominance. Expect to see more and more integration with Active Directory trying to compensate for the lack of flexibility and higher costs of running your on premise Windows farms. Whereas the current Windows shops do not even question themselves if they should deploy new Windows servers or services, at the end of 2012, it will be customary to do so.

Speaking of Apple, 2012 will be the year when tablet manufacturers finally realize that they cannot compete offering something that is not quite as good as the competition but at the same price. So we'll hopefully see new products that offer innovative features while at the same time are -gasp- cheaper than the Apple equivalents. By the way, Apple will continue to be the stellar example of technology company, money making machine, marketing brilliance and stock market darling at the same time.

PostrgreSQL will increase market share. Both as a consequence of its own improvements, which make it more and more competitive with high end offerings, and because of Oracle not managing well its MySQL property, PostgreSQL will become more and more a mainstream choice. Many think it is already.

JVM based languages will flourish. While Java as a language is stalling, alternative languages that generate JVM bytecode will accelerate growth in 2012. The JVM is mature, runs under everything relevant from Windows to mainframes, and is a stable enough spec that nobody, even Oracle, dares to even touch. This, together with the tons of legacy code you can interface with, makes the JVM an ideal vehicle for developing new programming languages. Seriously, who wants to implement again file streams, threads or memory mapped files?

Javascript will become the Flash of 2012. Mmmm... maybe this has already happened since Flash has already retreated from the mobile front. Yes, Javascript is not the perfect programming language. But it is universally available, performs decently, and together with the latest HTML specs allows for much of what Flash was being used in the past.

NoSQL will finish its hype cycle and start to enter the mainstream stage. Instead of a small army of enthusiasts trying to use it for everything, the different NoSQL technologies will be viewed with a balanced approach.

The computer security industry will be in the spotlight in 2012. Not because there is going to be a higher or lower number of security related incidents next year, but because as an industry, computer security has expanded too far with too few supporting reasons beyond fear and panic. Forgive my simplification, but currently computer security amounts to a lot of checklists blindly applied, without rhyme or reason. Much like in real life, security needs to go beyond the one size fits all mentality and start considering risks in terms of its impact, likelihood and opportunity costs. Otherwise, be prepared to remember a 20 character password to access your corporate network.

Oh, and finally, and in spite of all the fear mongering, the world will not end in 2012. You will be reading these predictions a year from now and wonder how wrong this guy was.

Wednesday, 16 November 2011

During my first years of Linux, I switched between KDE and GNOME at the same time as I switched distributions, or more exactly, as each distro had a different default desktop environment. Later on, I began switching when each desktop environment leapfrogged others with new fancy functionality.

Then, some time ago, I settled on KUbuntu, and keep using it for my day to day desktop users. I'm perhaps not a typical KDE user, because I use many non KDE alternatives as standard applications. I don't use KMail or any of the semantic desktop functionality. My default browser is Chrome/FireFox, my mail client is web based, and I use GIMP to retouch photos. This is not to say that the KDE Software Compilation apps are bad -try them and you'll see that they are in fact quite good- just that I'm more used to the alternatives.

However, when I got a netbook, I tried KDE and found it too demanding on screen real estate to be comfortable to use, so I installed Ubuntu with the default GNOME 2 desktop on it. The machine ran 10.10 perfectly, and I did not feel the need to upgrade or change anything.

We, KDE users, had to endure a couple of years ago the difficult transition from KDE 3.5 to KDE 4. The KDE 4 team had a very hard time explaining to its users the reasons for the change. As I understood it, they were rewriting the KDE internals in order to clean up the code base and be able to implement existing features better and allow for evolution of the desktop environment without carrying over difficult to maintain legacy from the 3.5 code base. For users this was difficult to understand, since the changes in the desktop environment required also changes in applications. Which mostly meant that existing applications were either not available, or were not on par feature wise with their 3.5 equivalents at the time the version 4 was released.

Two years have passed since that traumatic 3.5 to 4 transition, and the pain is over. KDE 4.7 is on feature parity with 3.5, and is regarded as one of the most elegant and configurable desktops. Certainly is not the lightest, or the least intrusive. But you have to agree that you can change almost anything you don't like using the KDE control panel to suit your tastes.

This is to say that I've been mostly an spectator in the Unity/GNOME 3 debate. That is, until I decided to upgrade Ubuntu in the netbook.

I read a lot about Unity, and was prepared for a different desktop interface. I read a lot of angry comments targetted at Unity, but honestly I did not gave much credibility to them. In the land of Open Source, everyone is entitled to have their own opinion, and there is always a segment of users that reject changes. Happens always with any kind of change. For these whose work environment is perfect after years of tweaking and getting used to it, anything that tries to change that, even for the better, is received with angriness and noise.

I was not prepared for the shock. Unity is a radical departure from the previous GNOME 2 desktop. It's not only radical, it is also trying to go into many completely different and conflicting directions. Let me explain.

Most desktop environments, not only KDE and GNOME but also Windows and even Mac, have been disrupted by appearance of the touch based devices. Using your fingers on a screen is completely different than using a mouse, either stand alone or via a touch pad. Fingers are less precise, if only because a mouse arrow targets an area that is a few square pixels. Fingers are also much faster to move than the mouse over the input area, and you can use more than one at the same time, instead of being limited to the one to three mouse buttons.

Touch devices need a different user interface metaphor, one based on... touching instead of one based on pointing. This has became evident with the success of iPhones, iPads and Android based devices. Note that touch interfaces can, or perhaps should, be markedly different depending on the screen sizes, because the different ratio of screen size vs. human hand.

What does not work well is trying to mix the two metaphors. Touch and point based devices have different usage patterns, and different constraints. Trying to have a user interface that is efficient and ergonomic with both devices at the same time is simply impossible. It is like trying to have the same interface for switching gears in an automobile versus a motorbike: yes, you can build something that can be used in both contexts. But no, it will not be optimal in two at the same time.

Unity is such an attempt, and one that fails to be efficient with any input devices.

The Launcher
In the past, you pressed the "Start" button on the bottom (or top) of the screen and you were presented with a set of logically organized categories to choose from. Or you could type a few letters of what you were searching for and find the program you want to execute. No more. You now have a bar on the right side of the screen with a row of icons, whose size cannot be changed, that in theory represent the programs that you use most. This bar is on the side in order to not take space out of the precious screen height, which in a netbook is usually small. Well, at least something good can be said about the launcher.
Ah yes, we can always use some keyboard shortcuts to switch between applications. Another usability triumph, I guess.

Now, try to tell a novice how to find what he wants there. You cannot. Instead, you explain that if those icons have a tiny ball on the left side, means that they are applications that are being executed. The distinction on how to launch a new empty window document and how to switch to a running instance is very small, in fact a few pixels small. It's a lost battle to try to explain the difference between creating a new document in LibreOffice using the File->New command versus launching another LibreOffice instance.

Given that there is no simple way of finding what you want to execute unless it is in the first six or seven icons, you tell the novice to press the home button on the top left of the screen.

And good luck there, because something called a "Dash" appears there, which is a window that lists programs. The dash shows oversized icons of the most frequently used applications with four small icons that represent application categories at the bottom. It's up to the novice to figure out what those categories mean, and to find anything there. Of coutse, the novice can type a few letters to search the dash. Depending on how well localized Ubuntu is, he/she may be lucky and find a mail client, or a web browser, or a photo viewer. Or not.

The window title bar

One of the most important aspects of any kind of interface design, not only user interface, is to be consistent. The Unity window title bar breaks inconsistency records in that area. When it is maximized, it shows the application name, except when you hover the mouse over it, when it magically changes to show you the window menu. Of course, if the application window is not maximized this is different. If our novice has not yet given up using Unity, he's going to be asking "where is the application menu" in a matter of seconds. That is, assuming that he or she discovers how to maximize or minimize windows, which probably will make you go back to the explanation about tiny little blurbs on the launcher because when you minimize a window, it literally disappears except for the little blurb that tells you that is still executing.

Overlay scrollbars

These are, to put it bluntly, an horrible idea. To reach for that tiny line on the right side of the window, click and then have a widget appear where you page up or down is clumsy, counterintuitive and downright uncomfortable. Whoever decided to go for them forgot a few key differences between touch and pointing devices: first, in touch based devices the tiny bars on the right are an indicator of where you are in the page, not a scrolling device itself. Scrolling is done with a finger gesture that does not require you to move your hand to the right side of the document. Second, aiming to such a tiny line is difficult with touchpads. Third, whereas with the traditional scrollbar you could click anywhere below or under the lift to page up or down, now you are required to move the pointer to the part of the scrollbar that is being displayed, and thus you have to move much farther on average.

I cannot believe that someone that uses a computer for handling documents that are routinely longer than a single page can find these scrollbars convenient. Certainly I cannot find anyone. The scrollbar as we know today (before Unity, that is) has been an incredibly simple metaphor that novice users could understand without explanation. Now try to explain these overlay scrollbars to a novice.

All this inconvenience is introduced in order to save something like 5% of a maximized window width. Here is a message to the Unity user interface designers: it is not worth doing it, because for documents that fit in the window height you can simply hide the scrollbar, and for those that are longer it's better to have an effective navigation device than having to deal with such an oddity.

And please, do not remind me that we can always use the keyboard. Because it is true, but we are talking about usability, right?

Lack of customization

All this is the default behavior. Being a Linux user, you may think that it is just a matter of finding where the configuration dialog is and start changing those odd default settings. Here are the good news, there are no options to change most of that, lets the novice become confused with too many options. In the end, I find this the most sensible choice for Unity: your users are going to be so confused by the user interface that it's best to hide anything else to prevent them becoming distracted from learning the new Unity ways of doing things, which is going to consume most of their mental energy.

Maybe there is something good about incoming Unity releases. I'll have to find it on a VM, because there is no way that I'm going to use it on my desktop, laptop or netbook. Which is now running Mint, by the way. Of course, with the overlay scrollbar package removed.

P.S: this post comes with tremendous respect for the Unity and Ubuntu developers. I know that creating a good user interface is incredibly hard. And I know that lots of time and resources have been invested in Unity in a well meaning attempt to create something different and better. I just can't understand what mental processes have been in place to allow for Unity to see the light in its current state.