Just another WordPress.com weblog

Menu

If you are setting up a continuous integration environment for anything serious and Jenkins is your choice of a continuous integration tool, then there can be a learning curve. By default Jenkins comes bundled with very little features. For anything more, you need to install plugins. Here is a list of plugins, that can make the learning curve easier (even for the future me when I have forgotten everything that I have learnt)

1. The green balls plugin: The default colour for a successful build in Jenkins is blue. If you call a successful build ‘a green build’, then it changes the blue balls to green and your brain doesn’t have to map the blue to green ever again.

2. The copy artifact plugin: This helps you copy artefacts from other successful projects. A classic example for it being an acceptance test job that copies a successful build, deploys it and tests it.

3. The promoted builds plugin:This helps you set up a deployment pipeline. You can either manually promote a good build or even better, promote a build when a downstream job like acceptance tests goes green. The UI to set it up is not very intuitive, but if you get your head around it sort of works. I have used it in conjunction with the copy artifact plugin when I had to deploy multiple promoted builds into one environment.

4. The SCM sync configuration plugin:Setting up a deployment pipeline is no trivial job. It can take weeks to get everything working. The last thing you want is a hard disk crash to make you do it all over again. That is where the SCM configuration plugin comes into the picture. It syncs all you job configuration to a an SCM (I used git), so that if something goes wrong I can recover all my job configuration. It can also work to your advantage when you want a like for like Jenkins environment in another machine.

Recently at a client, we had to build restful services that talk to each other. We had a choreographed set of services, which meant that any service could talk to another service. The problem we had was, we were constantly refactoring the messages that were used by the services to talk to one another. We started off by duplicating the messages on each service. It kinda worked initially when had just two services; we just had to change it in two places. Then we added a third service that used the same message. Now we had to change stuff in three places. We also had a fourth place, the acceptance tests which tested the services independently. It was cumbersome to do it in four places. The approach clearly didn’t scale. One approach we thought about was making the messages a library (a jar in our case), but that meant that we had a one place to change a message. It also meant that we had to independently change the message, check it in and wait for it build and publish it to an artefact repository. That would mean at least five minutes even for a trivial change. We wanted something faster.

We then thought about giving git submodules a try. We made the messages into a separate repository. It was then checked out as a regular source folder into every service project. Now we could finally use the IDE’s magic refactoring on the messages. All looked good initially; what we didn’t realize was,

1. It requires a lot of discipline and a good understanding of how git works plus the idiosyncrasies of submodules. You have to check the submodule in separately and push it before you push the parent repository changes. Checking in a bad reference to a submodule is the most common mistake people make.

2. Git checks out a submodule on a ‘no branch‘ when you use the ‘git submodule update’ command. This is a bit irritating because, when you have changed stuff and want to commit, you can’t. You have to check out master or some other branch and check it in. This means that I stash my changes first, checkout master, pop my stash and then commit.

3. Even if I update one repo to point to a new version of a submodule, this doesn’t mean that the other repos will point to the new version. Somebody has to go to each dependent repository, check the new reference in and push it. Manually this doesn’t scale even when you have three repos. So, we created a job on our CI server that monitors for check ins to the submodule, checks out all dependent repos and updates them (another manual task to update the job, when a new dependent repo comes along). This still doesn’t guarantee that another developer is happily working on an older version of the submodule, completely unaware, that a huge breaking change is coming her way (more importantly, maybe 3 hours later). This defeats the goal of fast feed back inherent to continuous integration. This wouldn’t be a problem if we had used an artefact repository and every build (even on a local workstation) picked up the latest changes. This would mean that the developer would know that there was a breaking change earlier.

Git submodules may look powerful or cool upfront, but for all the reasons above it is a bad idea to share code using submodules, especially when the code changes frequently. It will be much worse when you have more and more developers working on the same repos.

Before the ‘agile revolution’, the biggest bottleneck for organizations was building, deploying and maintaining reliable software (It still is for a lot of organizations). Then came the agile way of building teams and software. Suddenly, software was delivered faster, more reliably and more quickly. Practices like test driven development, continuous integration automated regression tests helped fix that bottleneck. The actual deployment and day to day operations were the next bottleneck. The devops movement and continuous delivery were the answer to that. Now we can reliably build, test, deploy and maintain software. Time to move on to the next bottleneck. For me it is the organization being bureaucratic and inflexible. It can manifest it many ways for a development team. A few examples are that I have observed are,

1. At one client whom I worked with, we requested for two extra developers at the beginning of a six month project. It took them four months to find the developers. Instead of increasing the team productivity, it actually brought the productivity down. For a team already constrained on capacity, we now had to get the new developers up to speed as well as build the application. By the time the developers were fully ramped up, the project was over (with the scope reduced).

2. At another client, we had a ‘technical architecture council’ and a ‘security council’. Each of them had their own opinion and they never agreed on a lot of things. More importantly, they took weeks and months to disagree. As a development team focussed on delivering business features, the biggest bottleneck for us was a consensus on the technical architecture. In the end when they did agree, we had to change a lot of stuff that we had already built. Of course, this had to happen before the first release, which means we didn’t go live for a long time.

3. Then there is the classic procurement bottleneck. In most traditional organizations, there is a decent lead time for any procurement, be it hardware or software. It can be anywhere between weeks to months. For a team looking to release early and get feedback, it can be a fatal bottleneck. It can be anything from getting a developer box, a testing environment or production infrastructure. The traditional answer most often is, ‘That is how the process works’. To be fair to the IT team, the bottleneck is sometimes somewhere else, like the hardware vendor. The cloud is a great way to alleviate a lot of the hardware procurement issues, but again a lot of traditional organizations are sceptical about it. It is changing rapidly though.

4. Another bottleneck I have seen is getting money from the business. If it takes three months for you to decide if you want to pay for something or not, then you have already wasted three months of development time and your competitors may have already built it in that time. Business agility has it’s own wikipedia article, so I’ll stop talking about it.

I am sure there are good and bad reasons why certain processes exist in an organization. The point these things are never scrutinized and changed. There is no feedback loop. As a result, you spend more money and make customers and employees unhappy. This applies to any organization in general not just software. Quoting from the theory of constraints, a chain is no stronger than it’s weakest link.

Recently I worked at a client where the developers did not have laptops and the internet was restricted. You could not look at social networking websites and blogs even. The reasons most companies give is that it is for information security and to prevent people slacking. If you have worked in technology for long enough, you will know that locking down stuff is a very naive way of enforcing information security. A talented enough hacker will get stuff in or out your systems, if required. Of course you need protection like firewalls and anti virus software (I am not an information security specialist, so I can’t comment more). Locking down sites just frustrates the hell out of developers. If I am trying to fix a technical problem at work, the last thing I need is a proxy telling me that I can’t access a blog post which might have the answer. If I want to try and find a solution at home where I have unrestricted internet access, it is still frustrating; Now I don’t have the code (because I don’t get to take the code home). Most developers can do wonderful things when they are in the right frame of mind. For me, it is when I am alone and can concentrate on solving the problem without any distractions. Unfortunately, the most peaceful time I get is on a lazy weekend and not having access to code is very frustrating. Some problems are much easier to fix on a weekend; for example, changing the directory structure of code or a major refactoring of the object graph. Because nobody else is checking stuff in and out, I can do such stuff without merge conflicts and pain. But now, the most I can do is recreate the problem on my personal laptop, fix it, get to work early tomorrow morning and repeat it there. Management must understand, there can be no innovation without some amount of slack. Robots working for 8 hours a day continuously (and with restrictions), will not do wonderful innovative things and solve difficult problems. Sometimes I need to code on a weekend (not every weekend, if have a personal life too) and if you don’t let me do that, you lose.

After getting my bearings around JavaScript as a first class language, I wanted to try using it on a non-browser environment. Enter Node.js. The install was pretty simple on Ubuntu. Googling for a tutorial gives you this awesome ebook. It starts off with by an example of creating your own Http server. However my new found JavaScript powers made me impatient to drive on my own rather than follow exactly what the book says. In the space of one hour I churned out a rudimentary web application framework which could handle multiple Urls. Here’s some code to create an http server which listens on a port.

Notice the first few lines look like a Java import or a C# using statement. Since there is no built in syntax in the language for imports, the guys at node have improvised by assigning an object to a module. A module is like a package/namespace in Java/C#. A module can export functions or objects. There is a one to one mapping between a module and a file. It was something I missed on the browser and it works well.
Creating an http server is very simple; you just register a callback for every request. I created my own simple router function which takes in a the request, response and a list of routes. I didn’t couple the routes to the router because I can unit test the router independent of the routes. The last line tells which functions/objects get exported for that module. It is analogous to Public methods/interfaces in conventional languages.

It checks if a route exists for a URL in the routes and calls the handler registered for the route. If there is no route, then it 404s with an error message. The routes is an object/associative array with a handler function registered for different routes. It is quite similar to how a hash would be used in Ruby.

The biggest advantage of using JavaScript for me is that you don’t have to create an object, if you don’t need it. The Router function would have been a class in a language like C# or Java. My first tendency was to go for the pattern, but then it occurred to me that the router is pure behaviour and doesn’t contain any data. Making it an object would couple the actual routes with behaviour and would make it less reusable and testable. The Routes however is an object, since it contains data on which url points to which request handler. Had it been a conventional language, I would have used the good old object hammer to pin ever nail. Since functions are truly first class citizens in JavaScript, it makes you think out of the object oriented box.

All in all, I am pretty convinced that node.js will slowly take over the world. 🙂

I have been a developer writing mostly web applications for the last 6 years. Though I can say, I know most common concepts of the 3 tier architecture comprising of JavaScript, HTML, DOM, CSS, Java/C#, (N)Hibernate and SQL, I still can’t say that I know everything in depth (except Java and C# to some extent). I can write a regular web app with all of these things plugged together, but my JavaScript will look far from awesome. For CSS and DOM manipulations I have to depend mostly on Google. For complicated SQL queries, again I have to look up to Google. If I use an ORM framework like hibernate, the chances of me creating an n+1 selects problem are really high. However as far as general purpose languages are concerned, I am much more confident. I can write much better Java/C# than I can write JavaScript. After some introspection, I figured that there is a good reason for this. I have studied the Java and C# languages from scratch. For Java, I have read the SCJP book when I took my Sun Certified Java Programmer exam (today I don’t believe in certifications, but that was another time 3 years ago). For C# I have read (at least more than half of) CLR via C#. The CLR book is more of an advanced guide, but since I had prior Java and C# experience, it helped. Recently I am playing around with Ruby and I have read the free e-book on Ruby. It is hardly surprising that I am pretty much at home with a few General Purpose Programming languages. For the record, I haven’t read a single book on how Javascript, SQL, HTML/DOM or Hibernate work. All my knowledge has accumulated by reading blogs and StackOverflow. I have read three books on one slice of the 3 tier architecture and zero on the other two. Therefore I have decided to bridge that gap, by reading one book on each technology that I have ignored for so many years. I will start with JavaScript (which is swiftly on it’s way up as a mainstream language). I have bought the book Object Oriented JavaScript on Amazon. It has got good reviews from most people on the internet. Game On!

Recently I switched to Linux after 3 years on Windows. On windows I wrote code on the Java and .Net platforms, with some amount of build scripts in Ant, Nant and PowerShell. Now I am doing a DevOps role on Ubuntu Linux. It has been a humbling experience to say the least. The last I worked on something close to Linux was Sun Solaris (wasn’t Oracle then), with Apache and Tomcat. The deployment wasn’t automated (read primitive), so all I needed to know was Putty, vi, cp and chmod. Now after waking up (like Neo in the Matrix), I find that the Linux world has moved on drastically. We use Puppet for configuration management; Ganglia and Nagios for monitoring; SSH as a remote shell; Apache is not cool any more, Enter Nginx;Upstart for managing services;Amazon EC2 and Vmware Vcloud for infrastructure on the cloud (not to mention the restful API for provisioning machines); Graylog and Logstash for log management. The list doesn’t end here. All the above tools use other Linux packages for which we use Debian Package Management. Puppet runs on Ruby which has it’s own package management system called RubyGems. On Windows I just used IIS, some PowerShell for scripting and the good old Event Viewer for looking at logs. Automation is a luxury at best in the Windows world. In the Linux world, it has become the norm. I have experienced the two contrasting philosophies of the Windows and the Linux/OSS world first hand now. While windows is Monolithic (There is only one way and that is the Redmond way), slow and of course point and click (a legacy from it’s desktop genes), the Linux world is heterogeneous (every problem is divided into smaller parts and there is more than one way of solving a problem), fast moving (there is a new version of a library every month) and completely automated (a legacy from the command line shell and shell scripting). Its been more than a month on Linux for me and everyday is a catching up game. Switching platforms and way of thinking is something like Cold Turkey, but I guess I will eventually get used to it. Change gives you a different perspective of the world.