Archive for the ‘Build Process’ category

Back in May of 2007, Linus Torvolds did a talk at Google about git. Specifically he tended to reiterate the point on how your workflow changes dramatically when you can perform a merge or get new code from other people in under a second. Merging multiple branches into your own branch without remembering revision numbers and knowing that your history is in tact, being able to view the entire history of a project without an internet connection, and just being able to work with your version control system in sub-second time can change how you work with your code. You become willing to make branches because it’s so cheap and easy. You can try out new things quickly and have an easy way to toss them if it doesn’t work out, or merge back parts if it does. I think that the philosophy of really fast version control and its effects on your productivity are quite real, and very important. I also think that the same mentality can and should be applied to your build process.

An example of how development might work via a revision control system such as git. I just needed a picture. Image provided by wikimedia commons.

Of course the first thing you might think is “There is no way I’m going to get my build down to under a second.” And for anything non-trivial, you’d be right. However, I think having a goal of a build plus unit tests in under 20 seconds is not an unreasonable demand. Even better if it’s under 10 seconds. But projects can get really large and involve thousands of files and a similar set of thousands of unit tests. How can you build all of that in under 20 seconds? The answer is: you don’t. Do I sound contradictory? I’m not, but you might have to change how you think about your project.

The main idea behind getting a build under 20 seconds is that it will allow you to compile what you’re working on and run all of your unit tests so that you can test your changes and see if anything breaks. If your build takes several minutes you’re not going to want to build and run your unit tests until your done. This makes doing Test Driven Development a lot more difficult because if you don’t run your tests constantly, you’re probably not designing around your tests passing. You might write less tests, have a less elegant design, and ultimately produce poorer quality code. But still, a large project with hundreds of thousands of lines of code isn’t going to even compile, much less run all of the unit tests in under 20 seconds, so what do you do?

As with any development problem, you break it down into a smaller and more manageable chunk. Instead of writing a giant application with thousands of different files and unit tests, you write more small libraries and services that hold maybe a few dozen or hundred files and tests that contain decoupled and independent structures. These smaller libraries and services can have their own separate build process as they are all separate small projects in and of themselves. That way you can write just the unit tests relating to those small projects and cut your build times down to under 20 seconds.

Now granted that’s only the build time for one module, but all of them together can still take several minutes. But that’s ok. Most bugs and development that you’re working on will only touch one or maybe two modules at a time, causing you to run the build process for just those modules to fix the problem or add the feature. There’s no need to compile and run unit tests for a lot of other systems that are completely unrelated to what you’re working on. As for if your change will break things in other packages, that’s what integration tests are for.

Besides, if you separate your code into smaller libraries and services, you’ll be providing a natural way of decoupling systems and making them no longer become dependent on each other. If you write good strong unit tests that mock any external entities that are called or would call your service or library, those tests enforce the contract that your other systems will require. This will ensure that changes you make will not break other systems. If the changes you make do break the systems, you just create or modify your unit tests to fail due to this breakage so that you know why your system should output what it does or call who it does.

By driving down your build times by having sections be in separate projects, you will have faster turnaround and feedback on the work that you are doing right now. You can run a build right after changing a single line of code just to make sure, and 20 seconds isn’t too long of a wait to get a response. It’s almost like a small break, but no quite long enough that you start trolling slashdot and forget you’re building something.

I feel this post needs another picture, so here's my cat Mischief playing with some rope.

Making a service based, distributed application is a good way to help your scalability goals. If you can compartmentalize the different aspects of your application into separate processes that can host services through either web services or plain sockets, you’ll gain the ability to make asynchronous and parallel calls between services.

When you’re developing an application you generally use a single development machine. You can run into problems when trying to do integration tests between multiple services due to the limits placed on listening to ports and how some technology stacks can act differently on the same machine in comparison to using multiple machines. While your application might be distributed onto many servers, your development environment might not have the same number of servers to access, often due to budget constraints. This is especially true if you’re just hacking something up in your spare time at home or if you’re a self-employed programmer. However, machine resources can be limited in many companies due to the cost of those servers.

Good news is you probably don’t need additional physical servers just for your own testing. Most computers today are so powerful that you don’t use all of the resources of it very often. You can harness those additional resources by running virtual machines and using those as your additional servers. A really good place to acquire a high-quality, free virtual machine package is from Virtual Box. It’s an open source virtual machine started by Sun and now owned by Oracle that’s quite fast, multiplatform, and of course free.

The main idea will be to run multiple virtual machines, each one representing a proxy of a real server you would run your application on. You can then distribute your services onto these VMs and test how your interactions work. Each VM will have less resources than a real machine, but still enough for most testing you’ll need to do. In fact you can use VMs on your real servers if you find your resource utilization on those servers are low, but you need more machines for your application design.

This is also useful if you’re attempting to build a fault-tolerant distributed application where a single service might be replicated onto many machines. Any distributed application needs to deal with node failure, which is difficult to simulate on a single box. However, if you run your service on multiple virtual machines, you can see the effect of shutting one down during execution to see if your clients can handle the service outage and redirect to a running service. This can also be used to test the reintegration of a previously down server back into a distributed service pool.

Of course this isn’t new stuff at all. Many people have been using VMs for years for a variety of purposes. I’d like to mention however that if you’ve avoided VMs because you don’t see how they can apply to your work, if you think your work is too small for it, you should reconsider. Even a non-developer can benefit from using Virtual Machines. Sometimes you might want to run a piece of software that doesn’t work in your current OS, but you’d like to gain its services. You can fire up a VM of the exact OS that software supports and run it inside of a virtual machine. Some VMs can be fairly light-weight, taking up only a minor amount of ram, disk space, and cpu usage.

Another developer usage for VMs is to make a completely clean build environment. If you want to make sure your package can build, run, and execute with many Linux distributions you can create automations to create new virtual machines, install the OS, import your code, compile, test, run your application, export the results, then remove the virtual machine. You can use VMs as part of an extended systems test for your application. This way you can find out if your application relies on some odd configuration that might cause difficulties for your users. It also can help you refine your build process and configurations for multiple distributions or Operating Systems.

In the current development climate, processors have nearly reached their speed limits. It is said that the next big push is for more parallel processing among multiple cores. Beyond that is distributed computing. When needs reach beyond the multiple processors of a single machine, we’ll look at the resources of multiple processes of multiple machines. This work is already being done at Yahoo!, Google, Microsoft, Amazon, Ebay, Facebook, and all of those other large tech corporations. However in the future I suspect that it will become fairly common to utilize the resources of many devices for a single application. Virtual Machines can give you a cheap way to explore the exciting realm of distributed applications from the comfort of your single (hopefully mutli-core, multi-processor, or both) machine.

I work on a product that manages test cases, test plans, and results of software testing. QA engineers create test cases in our system and submit results, usually through some sort of automation, and we keep track of those test results and generate useful reports.

When I first started this job, the product already existed. We were in more of a maintenance mode, adding features and fixing bugs. Ironically, while we provide a product to help out QA, our team has no QA team. There weren’t even any unit tests on the main product and all testing was done sporadically and manually.

This has been improving with a recent push to have all products under Continuous Integration and with a full unit, smoke, and regression tests. However writing up tests and executing them takes time. Currently, many of the developers are working part time as QA before we launch a new release. Unfortunately, many of them forget to use their development tools for the QA process. Namely, automation.

I can manually run through a test case in maybe 5 to 10 minutes, which isn’t too bad. If you have 70-80 test cases to run through before a release, that can take about 2 days of work. Now if we release on a monthly cycle, every 30 days we’ll lose 2 days to manual testing. Also, every release will add more tests which have to be performed along with the previous tests from the last release.

Since we write a web app, a lot of our testing is done by clicking on things in the web UI. Front End testing can be painful, but with the help of selenium, we can automate this process. Creating the selenium script can take a while, but once it’s functional separate runs at a later date are fairly quick, and can be run in an automation server. So while each test will take longer to perform because you have to combine the manual testing process with script creation, the end result is a slowly growing package of tests that will automatically run whenever you wish.

To tie our scripts together, we utilize TestNG inside of a maven package. Selenium has java packages that allow us to launch selenium straight from our TestNG tests. While TestNG is designed for running unit tests, it works for our regression tests quite well. I was even able to build a TestNG listener to automatically insert the results of each test into our product as well, so our product is keeping track of the results if its own tests.

By spending the time to create these automated tests and set them up in an easy to execute automation package (we run the maven tests from hudson), we’ve greatly extended our testing coverage of our product. This helps us make sure that when we add new features or fix bugs in one place we’re less likely to be breaking them in another. It also saves us time each month by allow us to just spend time creating new automations for new tests rather than manually running all of the old ones along with the new ones.

Later we plan on automatic deployments of the product upon checkin via hudson. Once we have that set up we can continuously run our automated front end tests on each checkin to the code, finding problems faster.

I have an idea for continuous integration that is a bit beyond what I’m aware of as far as the capabilities of yhudson are concerned.

We work as a team on a project. This project is stored in SVN (like all good projects should be) and is continuously checked out by yhudson to be built on a CI server. However, yhudson is set up to just check out the latest version of trunk on a commit, checking every 5 minutes. The problem is that most developers do their work on development branches, which don’t take part in the CI cycle. These development changes do not gain the benefit of continuous integration until the changes are merged into trunk which results in a lot of loss in developer potential.

In our development system, people create branches for each bug or feature set that they are working on. Now if a bug is trivially small or easy to see and fix, it might not warrant its own branch and will be just fixed on trunk. But for larger changes, a branch is cut so that development that one person is performing won’t adversely effect other developers. It also allows us to postpone features from our monthly release cycle. If you’re working on a feature that is only halfway complete by the time our release branch is cut and code freeze begins, you can just not merge your change into trunk and you won’t dirty up the release branch with half-finished code. Once you finish your change, you merge into trunk and solve any conflicts that might have come up. It’s not a perfect process, but I don’t know if anyone’s will classify (git excluded).

Because of these separate branches, developers are not able to harness the utility of yhudson and our CI environments. yHudson is set up with a specific location in SVN in mind when you create a project, usually trunk. If you want to build your branch, you have to set up a new project and go through all of the configuration of your trunk project but with the new location. There’s no real way to just duplicate a project so you can change a setting, and even if there were the manual steps involved would be enough that most developers won’t bother. There should be a better way.

I think that there should be a different style of project able to be used in yHudson. The project should define a standard build like we did with trunk on our project, but also allow it to monitor all of the folders under branches and treat each one as their own project. Whenever a change is made to one of those branches, a build will be performed. If a new branch is created, a new project should dynamically be created for that branch. When the build is made, it should be deployed according to the Staging instructions of the original project, but to a dynamically generated VM or from a pool of VMs with modifiable settings for things like ports and server names. This will allow us to do continuous deployment on each of our development branches so that if a QA tester finds a problem with a new feature a developer is working on in a branch, the developer can quickly provide a fix and have it automatically go through a full set of continuous integration tests and be deployed to the staging environment for the tester to try out again.

This fast turn around time between bug reporting, bug fixing, and fix deployment could greatly enhance productivity and let us try out different solutions, faster. With a fast feedback loop, we can work directly with the QA department or our customers to solve the issue they’re having and let them try it out right away in such a fashion that doesn’t effect anyone else’s bugs or problems that they are working on.

In essence, dynamic continuous integration would broaden the usage of our continuous integration servers to cover all development, even on separate branches, without the need to manually create (or forget to create) projects for them. I think that if we worked on creating such a functionality in yHudson, along with some helper tools for utilizing large pools of servers for fast staging environment, we can create stronger bonds between developers and testers, along with developers and their customers by having a shorter feedback loop and faster turnaround of problems.

I have an idea for continuous integration that is a bit beyond what I’m aware of as far as the capabilities of hudson are concerned.

We work as a team on a project. This project is stored in SVN (like all good projects should be) and is continuously checked out by yhudson to be built on a CI server. However, hudson is set up to just check out the latest version of trunk on a commit, checking every 5 minutes. The problem is that most developers do their work on development branches, which don’t take part in the CI cycle. These development changes do not gain the benefit of continuous integration until the changes are merged into trunk which results in a lot of loss in developer potential.

In our development system, people create branches for each bug or feature set that they are working on. Now if a bug is trivially small or easy to see and fix, it might not warrant its own branch and will be just fixed on trunk. But for larger changes, a branch is cut so that development that one person is performing won’t adversely effect other developers. It also allows us to postpone features from our monthly release cycle. If you’re working on a feature that is only halfway complete by the time our release branch is cut and code freeze begins, you can just not merge your change into trunk and you won’t dirty up the release branch with half-finished code. Once you finish your change, you merge into trunk and solve any conflicts that might have come up. It’s not a perfect process, but I don’t know if anyone’s will classify (git excluded).

Because of these separate branches, developers are not able to harness the utility of hudson and our CI environments. Hudson is set up with a specific location in SVN in mind when you create a project, usually trunk. If you want to build your branch, you have to set up a new project and go through all of the configuration of your trunk project but with the new location. There’s no real way to just duplicate a project so you can change a setting, and even if there were the manual steps involved would be enough that most developers won’t bother. There should be a better way.

I think that there should be a different style of project able to be used in Hudson. The project should define a standard build like we did with trunk on our project, but also allow it to monitor all of the folders under branches and treat each one as their own project. Whenever a change is made to one of those branches, a build will be performed. If a new branch is created, a new project should dynamically be created for that branch. When the build is made, it should be deployed according to the Staging instructions of the original project, but to a dynamically generated VM or from a pool of VMs with modifiable settings for things like ports and server names. This will allow us to do continuous deployment on each of our development branches so that if a QA tester finds a problem with a new feature a developer is working on in a branch, the developer can quickly provide a fix and have it automatically go through a full set of continuous integration tests and be deployed to the staging environment for the tester to try out again.

This fast turn around time between bug reporting, bug fixing, and fix deployment could greatly enhance productivity and let us try out different solutions, faster. With a fast feedback loop, we can work directly with the QA department or our customers to solve the issue they’re having and let them try it out right away in such a fashion that doesn’t effect anyone else’s bugs or problems that they are working on.

In essence, dynamic continuous integration would broaden the usage of our continuous integration servers to cover all development, even on separate branches, without the need to manually create (or forget to create) projects for them. I think that if we worked on creating such a functionality in Hudson, along with some helper tools for utilizing large pools of servers for fast staging environment, we can create stronger bonds between developers and testers, along with developers and their customers by having a shorter feedback loop and faster turnaround of problems.

Subscribe

Aspirations of a Software Developer syndicates its weblog posts
and Comments using a technology called
RSS (Real Simple Syndication). You can use a service like Bloglines to get
notified when there are new posts to this weblog.