Primary Menu

Category Archives: Software Development

For a recent project I was working on, I was required to set permissions on a remote windows share. All of the roads seemed to point to JCIFS as the library to do this. Unfortunately, JCIFS did not have support for the operations I required so I went about seeing what it would take to add them. This is the story of my JCIFS journey.

JCIFS is not what you might expect from a typical modern open source project. What source control system do they use? Git, SVN, Surely not CVS? I was surprised to find that the answer was none. There is no source control system that controls the official JCIFS releases. This stems from the fact that there is a single developer/maintainer of the codebase. The next thing I looked for was to see if I could find their bug tracking system. Same story. There is no bug tracking system for JCIFS either. The one thing JCIFS did have going for it was the active mailing list. Michael B Allen, the JCIFS developer/maintainer of the project, was very helpful in answering my questions to get me going.

What I Needed

What I was looking for was the ability to set Access Control on file shares of a Windows server. I found a promising patch that I thought was my answer on the JCIFS mailing list http://comments.gmane.org/gmane.network.samba.java/9045. It turns out that this was not exactly what I was looking for. This patch can be used to set file permissions (returned from JCIFS SmbFile.getSecurity()). What I was really looking for was to set permissions of the share (returned from JCIFS SmbFile.getShareSecurity()). This patch was a starting point but it would need some work.

If you have done any coding in Java that requires interoperability with Windows systems, you have probably come across JCIFS. JCIFS is an “Open Source client library that implements the CIFS/SMB networking protocol in 100% Java.” Many other java projects out there such J-Interop and many others use JCIFS internally. The reason for this is because JCIFS has implemented a Java version of Microsoft’s version of DCE/RPC. Leveraging this protocol, you can call pretty much any remote procedure call Microsoft has implemented. A great resource on what Microsoft has in this area is the MSDN documentation on Microsoft Communication Protocols (MCPP).

To SRVS, I needed to implement the NetrShareSetInfo call to set the permissions I was looking for. After working through this I realized I needed a way to lookup a user SID by name. To do this, I also implemented the SAMR call SamrLookupNamesInDomain.

Implementing My Changes

Implementing changes to the DCE/RPC calls in JCIFS was not trivial to figure out. There seemed to be generated code (srvsvc.java and samr.java) that was generated from srvsvc.idl and samr.idl. I figured Corba at first but quickly realized that this was not regular IDL. It was not even the Microsoft IDL as described in the Windows calls. This IDL was massaged into a format that JCIFS could work with. I spent a long time trying to find out how this IDL was compiled until I got a reply on the mailing list with this blog post by Christofer Dutz. He pointed out a tool that I missed called midlc that is part of JCIFS. It is unfortunately not referenced in the JCIFS main website at all other than having the download listed. Following his instructions, I was able to get midlc compiled and running.

I’ve been doing a lot of work with Git lately and have done a lot of thinking about version control systems. I think our analogy of a ‘tree’ to represent the life-cycle of software versions is no longer relevant. Today, trees and branches do not adequately represent what version control systems are supposed to do.

Branching is Easy

All version control systems can branch fairly well. Simply creating a branch does not give you much. It is simply a copy so it is expected that it will work well.

What is the good of branching if you cannot merge.

Merging is Hard

The thing I love the most about Git is that it gets merging right. Other version control systems I’ve used can do merging but it always feels like a pain to do so.

Image Source: http://very-bored.com/pics/weirdtrees/weird-trees-8.jpg

The tree metaphor does not really fit with the concept of merging. So why do we still use it? Most of the time I see people drawing Git graphs in lanes.

A new metaphor

Source control is more like lanes on a highway. Commits (Cars) are free to move from lane to lane over time. Branching and merging have an equal weight.

I often see posts on hacker news or proggit exclaiming how Subversion is dead and should not be used by anyone anymore. I do not agree with this but not for the reasons you might think.

Before I get to the reasons why Subversion will still be around for many years to come, let me rant about the anti-subversion movement…

Anti-Subversion Rant

I have a hard time when people lump CVS and Subversion into the same group. Subversion is vastly superior to CVS in almost every way. After working with CVS for a few years, Subversion was a breath of fresh air. People seem to forget that these days. True, distributed version control systems may equally be a leap forward above Subversion but that does not discount what Subversion did right.

When using Subversion with a small team, the merging and branching issues are not usually a problem. For commercial end user software, it has some benefits. For most commercial software, a centralized model for source control is beneficial. I love how the subversion revision number works. At my company we use it to identify the build. It works well to identify exactly what version a customer may be running. Git’s 40 character ID simply cannot be used in this way.

Subversion does have some things going for it especially when you intend to impose a centralized model anyway. Distinct product versions (1.0, 1.1, 2.0, etc) lend itself to this model. If you think about it, this is how commercial software always used to be. Web based products and continuous deployment kind of changed this which is why I think we have changed the way we do version control.

I still do think that DVCS is the way to go and will be the future of version control. I am currently looking into how to easily transition from Subversion to Git for our core product source control (using git+svn). I think the benefits out weight the few nice things we will lose. That is a topic for another post though.

Why Subversion will never Die

Now for the real reason Subversion will not die in the foreseeable future (particularly in the enterprise world). The real reason is the licensing. Consider licensing of Subversion. Subversion is licensed under the Apache license. The apache license is very commercial friendly as I’ve written here previously.

Alternatively consider the licensing of the various distributed version control systems:

Git – GPLv2

Mercurial – GPLv2+

Bazaar – GPLv2

Darcs – GPL

Notice a trend?

I totally understand why this is the case. The communities around these products want to protect from commercial companies who will fork these products, enhance them and not provide source for these changes. Typically if this was a client-server type model, it may be acceptable to have the core server licensed under GPL and have a library for accessing it under the LGPL. This would allow commercial offerings to freely build clients to these tools.

That simply cannot work here. Because of the nature of distributed version control systems, each user has their own repository. This means all of the magic that goes into providing a version control system lives as part of the client the user accesses. Licensing this under a commercial friendly license essentially provides that to the whole product. You cannot separate the ‘client’ from the ‘server’ in a distributed version control system as they are one and the same.

Perhaps it is easier to consider a case of this in action.

Embedded Version Control Support

Consider the popular text editor for Mac OS, Textmate. First a disclaimer, I am in no way involved in textmate and am just using it as a plausible example of the problem.

Textmate provides source control access within their editor to Subversion. They created a Java library called SVNKit for this purpose. I’ve actually used this in a product to provide subversion access from Java.

If you look at Textmate’s feature list, you will notice that they do not offer support for Git, Mercurial or any of the other GPL-DVCSs. They do have extensions for these but they cannot bundle support for it like they can with their SVNKit library.

The products I work on are in a similar state. We are excluded from providing Git or Mercurial support in our product due to the licensing. It may be possible to provide this as a plugin or extension that only drives the command line. Though technically possible, many of our enterprise customers would have issues accepting this especially when they have to download and install the GPLed version control software themselves.

Conclusions

As long as the distributed version control systems stick with the GPL license, they will be in exile from many enterprise environments. Perhaps some day we will have an alternative distributed version control system under the apache license. Until then, Subversion will continue to exist in the enterprise. Especially now with the community at Apache, Subversion will continue to grow and evolve for many years to come.

Humans have a hard time understanding the concept of ‘random’. A great example of this that I love to use is to get someone to quickly pick the first ‘random’ number they think of between 1 and 100 (you can do this right now). If the number was truly random, a pick of 2 or 97 are equally likely. In reality, humans are really bad at being random number generators. This becomes even more evident if you ask someone to pick 2 or 3 numbers. Most likely someone will not pick number close together but will instead pick a few nicely space numbers.

When you tell someone to pick a random number, their brains automatically try to create a normalized set of numbers. Computers are also bad at pick random numbers but for completely different reasons.

When it comes to software development, you may require a feature with random elements to it. The classic example of this is ‘shuffle’ in a music player such as iTunes. If every time it tried to pick a new song, it picked a ‘random’ one, you may find yourself listening to the same song a few times in a row or songs from the same album back to back. The typical user reaction is ‘This shuffle is not very random’. We know this to be absurd. In fact, the song selection is very random and it is the human who is unable to understand what random means. What the user actually means is they want a more normalized distribution of songs rather than truly selecting a random one each time.

How a lot of music players solve this problem is to randomize the order of all of the songs in the playlist instead of picking a random one each time. This produces a playlist where each song is played exactly once. To a human this feels more ‘random’ but is actually just more normalized. There are also other tricks you can use like ensuring that “like” songs do not occur back to back such as keeping the same artist from playing back to back. There are lots of other ways you might be able to give a better user experience by making the “random” feature less random.

Never take what a customer says initially at face value. You often need to dig at what they really want. Though the customer claims they want ‘random’, they probably do not. Software Development is all about trying to find out what the customer is really looking for when they ask for something completely different.

Performance Tuning is one of those black arts in programming. It takes skill to do it properly. Often people end up attempting to optimize the wrong things for performance. As the great computer science wizard, Donald Knuth put it: “We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil”.

I think of it in these terms. Readability comes first and foremost because that leads to maintainability. If you have a performance issue, then worry about tuning performance. I am by no means saying you should completely ignore performance and brute force everything. You need to be aware of performance and do things in an optimal way. You should simply not go out of you way to make something faster at the cost of readability.

Occasionally you will be tasked with the job of performance tuning. On the last three major projects I have worked on, each of them required performance tuning at some point. For each of these there were some basic tools I used to go about looking for areas to optimize.

The Hunt

The first thing you must do when looking to boost performance is to go on the hunt. It is important to know what is slow before you can make it fast. You will be very surprised to find that most times the thing you think is slow may not be so slow and another thing that seemed to be trivial may be the cause of lots of performance issues.

Before going on the hunt, you first must have the proper tools. Here are some essential tools for tracking down performance issues:

Profiler – A code profiler allows you to see how long your application is taking doing various tasks. At my company, we use Eclipse for our Java development. It comes with a profiler as part of the testing and performance toolkit. There are lots of other commercial profilers out there that are likely much better. Pay attention to the ‘hot spots’ in your code that are executed more than others. Though it might not seem like it spends a lot of time each iteration, a small boost here could end up being a lot.

Poor Mans Profiler – Sometimes you might not have a profiler or you want to just look at a small section of code. In these cases, putting a few System.currentTimeMillis() will allow you to get some timings. In the current project I just worked on for performance optimization, the code already had extensive use of the Java Monitoring API (http://jamonapi.sourceforge.net/). Using this has the same effect as currentTimeMillis() but has a more refined API to work with. It can also help for seeing how fast certain calls are.

Performance Test Suites – I often like to write Unit Tests for specific functionality that I’m trying to optimize for performance. This way it is easier to profile and check performance on a specific part of the code. This way you can also do this in a unit test rather than starting the whole application.

Process Viewer – Task Manager on Windows and top on Unix are invaluable tools as well. These allow you to watch CPU usage when running performance tests. Often times a sure sign of a synchronization bottleneck in a multi-threaded application is watching a single CPU be maxed while the rest are idle. Always do development on a multi-core machine if you are writing multi-threaded applications so you can look for these issues.

Approaching your Target

After you have found a performance issue, it is time to attack the performance issue. You know where the performance issue lies but you don’t know what is the cause. There are a few things to look for.

Synchronization – A big performance issue I alluded to earlier is synchronization. In multi-threaded development, sometimes you need to work with shared objects. The easy way to do this in Java is to use the synchronized keyword. You need to be careful the scope of where this is used and keep it as narrow as possible. The CPU usage is a good indication of this problem. If this is your problem, you may want to look at using a modern concurrency library like the java.util.concurrent library for Java. The ConcurrentHashMap can solve many issues around synchronized maps and is much better than using Collections.synchronizedMap(). Many synchronization issues can be difficult to track down because a debugger cannot show you them.

Serialization – Serialization is another big performance hit. Anywhere you are going from data objects to XML, JSON or binary on disk or in memory, you have a performance hit. These operations are notoriously slow but are often necessary at times. You should make sure these are not being done more than they need to. Often times a cache on deserialization of objects can greatly improve performance here.

Nickels and Dimes – Often times there is not one single performance issue that is the cause of all of the problems. More than likely there are a few things that add up over time. If you shave off 1 millisecond from a call that is called 100000 times, you have saved yourself a second worth of processing time. This can often be better than shaving 50 milliseconds off of a call that is only called once. This is where your profiler and performance tests help out in knowing where the problem is.

Databases and Performance – If you are using a database and notice performance issues you should check a few things. Make sure you are using database queries. Most of the time the database can manipulate things faster than you can in code. Also make sure you have proper indexing on your database tables so the queries run fast. Sometimes things can be done faster manually in code. Make sure you run performance tests before an after to compare any changes.

The Cleanup

After you finish your performance tuning, it is very very important that you re-run your performance tests. You need to prove that the improvements you made had a positive effect on performance if they did not, then they weren’t needed and are more likely to introduce bugs than anything else. If the performance did not improve, throw out the change and return to the hunt.

Along this note, it is important to only be on the hunt for one issue at a time. If you make 2 changes at once, it is not possible to tell which one may have given the performance gain. Each change must be done in isolation so you can be sure each change is required.

The Maritime Dev Con was a huge success. About 95 people total attended the event making it a huge success for developers in the Maritimes. I had a great time at the event and met a bunch of really cool people.

The presentations I gave went well with a number of attendees. I’m putting up the slides from the presentation here in case you want to review them.

I’m also going to include the sample code from the presentations. The samples was for the modern hello world example in Java and Groovy. It used the twitter API to query MaritimeDevCon from twitter to find my ‘Hello World’ tweet.

There is going to be a maritime developers conference coming up on June 18th in Moncton. It is going to be a great opportunity to have developers from Moncton and other areas of the maritimes get together and learn a bit about other languages and technologies they might not have been exposed to. All of the presentations are limited to 45 min and will mostly give an introduction to the language or technology.

Modern Java Development – In this presentation I’m going to give an introduction to Java for non-Java developers. It will cover the basics of knowing where to start and getting started.

Groovy Primer – This is essentially going to be a rehash of the Groovy talk I gave at the Maritime Java User’s Group a month or so ago. This will focus on showing what Groovy has to offer (particularly to Java developers) and how to get started with Groovy.

Problem. Bugs happen. The common solution to this problem is to fix the bug and release a patch. Version 1.0 has bugs, version 1.0.1 fixes those bugs.

Inevitably at some point in time you will need to put together a list of all of the changes in a release. For me, this needs to go into a format we can post on our wiki. This process can be tedious if it is a manual process. There are a few approaches to handling this. You can go against the bug tracking repository and look for what bugs were fixed for this release. This will tell you everything that should have changed. I say ‘should have’ here because you cannot know for sure if the information is 100% accurate.

The other option is to go to the version control repository for information on what has change. This is the authoritative source of what has changed but often contains more information than what you would want in a change-log.

In my previous post on version control I mentioned that we have best practices around format for commit messages. All bugs start with the words “Bug

The ‘–xml’ option is used to format the output as XML. This allows groovy to break it down easily.

The ‘-g’ option is used which shows log messages from other revisions that were merged onto this branch. Let’s say you have 50 bugs that are merged onto the bug fix branch all at once. This would create a single revision on the branch. Using this option includes all 50 comments from their original commit on the trunk. This detail we want in the change-log. This gives nested entries though so the code has to handle that case.

The ‘-r’ option is used to specify the revision range to use. In this case for a branch, we want from the previous release revision number to the current (or HEAD). For this example, let’s assume the 1.0 branch was at revision 1528.

The command to run then becomes:

svn log -r HEAD:1528 -g --xml

The next step that needs to be done is to take this XML and turn it into a change-log. I plan to use this as a comment into a wiki so I prefix the lines with ‘*’ so they will appear as a bulleted list in trac. It also puts the revision number at the end of the line in brackets. The output should look like this:

To generate this changelog, I wrote a groovy script. It uses the svn command to generate the changelog and uses Groovy’s XML Parsing to break it up and format it. The path to the working directory and revision number would change from release to release but the rest of the code is reusable.

Two of the most useful tools to a developer outside of their development environment are version control and bug tracking systems. Version control allows tracking of changes to the product and allows for branching and merging. Bug tracking systems allow for tracking issues with the product whether they be bugs or enhancements.

Even though these tools are often separate products, they have a major commonality which is the code you are working with. Often times you want to be able to see for any given bug number, what code was changed for that bug. Also, for a change in the code (in version control) you want to see if it was associated with a particular issue in the bug tracking software.

At the company I work for we use Subversion for version control and Bugzilla for bug tracking. We have some best practices around these tools to make things easier.

Version Control and Bug Tracking Best Practices

When resolving issues in the bug tracking database, our team always puts in the build number of the build that contains the fix. This way a person who is looking at the bug can know if the build they have contains the fix. Anytime our team fixes a bug we put in a comment that looks like this:

Build Fixed: 1.0.1.12354

The last number is the revision number in Subversion.

When we commit code changes to Subversion, we also include the bug number for the bug being fixed. Our commit messages always appear in this format:

Bug 1234: Fixed this bug

Subversion Tooling

Recently I came across a neat feature in Subversion that allows you to link it to a bug tracking system. Basically this allows clicking on the bug number in the subversion history view to take you directly to the bug number in the bug tracking software.

Enabling this feature is fairly simple to do and involves setting 2 properties in the subversion repository. These properties need to be set on the root folder in subversion that you would use to checkout your project from. It automatically is available for everything in that tree but you need to checkout from this root for it to work. These are the two properties that need to be set.

bugtraq:logregex – This defines a regular expression to ‘match’ bug numbers in subversion comments. For the pattern I listed above, we are using: [Bb][Uu][Gg] (\d+)

bugtraq:url – This defines a URL to go to when the user clicks on a bug number. The browser is launched when the number is clicked on and takes you to this URL replacing the BUGID parameter. For our bugzilla repository we are using: https://some.server.somewhere.localhost/show_bug.cgi?id=%BUGID%

The following steps walk through this process of how to set this up using Tortoise SVN:

On the root folder of your subversion working copy, right click on the folder and click TortoiseSVN -> Properties.

I have been reading a lot lately about people hating on SOAP based web services. As a whole, the Web is moving more toward REST based APIs. This post is to make a case for WSDL and SOAP based web services.

Don’t get me wrong. I’m a huge fan of RESTful Web Services. I use them in many places where they make sense in the software I develop. I am not writing this post to say that SOAP/WSDL based web services are better than the REST style but I intend to point out some of the things that a WSDL does better.

It all comes down to perspective. Who is going to be consuming the Web Service? Is the consumer going to be a human or a machine? Let’s look at each case.

Humans Consuming Web Services. By humans consuming web services, I mean a programmer sits down and writes some code to use a web services. The common example of this is a developer from a website using the Flicker, Twitter, Google Maps or Facebook API to integrate with their site. Even in the business world, when a user needs to write a piece of code to connect two things together, REST is the clear winner. REST style web services are easier to work with and usually result in much cleaner code.

Machines Consuming Web Services. By machines consuming web services, I’m referring to software that interprets and leverages the web services. The primary examples for this are Business Process Management and Runbook Automation Software. This happens to be the area in which I develop software. In this space the goal is to allow machines to interpret the Web Services (or other technologies) and allow the user to just map from service to service. The user needs to know nothing about the transport or how the service function themselves as the machine is responsible for all of that. This type of software is typically used in the business world and not on the Web.

For this type of software, a strictly defined specification (a WSDL) provides a valuable tool to the software that needs to interpret it. A REST web service may be easier for a human to use by reading documentation, but we have yet to make a computer read documentation and produce results. Also, many REST style web services have XML formats that cannot be expressed in a Schema. This may be fine for a human, but XML Schemas provide an easy way for a computer to consume and understand the XML format. Yes, XML Schemas are very wordy and difficult to read by a human. They are also a pain to write properly. Even with their limitations, they do a good job of defining XML in a way that a machine can interpret them.

I see a lot of newer programmers who have only ever done work on the Web claim that REST is the best choice in all cases. We have graduated beyond SOAP to the superior technology. Even Joel Spolsky mentioned it on episode 64 of stackoverfow. REST as a superior technology is simply not true. REST may be the best choice for the Web but there are many other uses for Web Services besides the Web.

REST is getting closer to what WSDL has to offer. With WSDL version 2.0 or WADL you can define REST style web services. Maybe in a few years things will be different. Maybe we will get to the point where REST really is better than traditional web services. But we are not there yet.