21 October 2008

With Obba, you can easily build Excel GUIs to Java code.Its main features are:

* Loading of arbitrary jar or class files at runtime through an Excel worksheet function. * Instantiation of Java objects, storing the object reference under a given object label. * Invocation of methods on objects referenced by their object handle, storing the handle to the result under a given object label. * Asynchronous method invocation and tools for synchronization, turning your spreadsheet into a multi-threaded calculation tool. * Serialization and de-serialization (save Serializable objects to a file, restore them any time later). * All this though spreadsheet functions, without any additional line of code (no VBA needed, no additional Java code needed).

02 October 2008

Controlling subversion from java via an api like SVNKit could be quite useful. Never really thought about using it from a typical application but I have written applications that have versioned files, so why reinvent the wheel, just use Subversion via svnkit. An example might be managing changes to configuration files at runtime.

SVNKit is a pure Java toolkit - it implements all Subversion features and provides APIs to work with Subversion working copies, access and manipulate Subversion repositories - everything within your Java application.

SVNKit is written in Java and does not require any additional binaries or native applications. It is portable and there is no need for OS specific code. SVNKit is compatible with the latest version of Subversion.

So you make your application super scalable... dont just stick a database in the way....

databases that touch disks that inherently involve sequential, serial processing. And there is no way you can change your database vendor’s code. Systems that involve all but the simplest, most infrequent database use would be facing a massive bottleneck thanks to this serialization. Databases are just one common example of process serialization, but there could be others as well. Serialization is the real enemy here, as it undoes any of the throughput gains parallelism has to offer. Serialization would only allow a single core in a multi-core system to work at a time, and limit usable computing power. And this is made even worse in a cluster or grid. Using a bigger and beefier database server reduces this somewhat, but still does not overcome the problem that as you add more servers to your grid, you still have process serialization going on in your database server.

Distributed caches are a solution to this problem. They are simple to implement where data is immutable (read-only), things become a bit more complex when the data being distributed is mutable. Ideally the cache dynamically distributes the data across nodes based on runtime usage.

I've often not used System.currentTimeMillis() in code, rather I inject an interface into objects that need to know the current time. This interface, lets call it Clock, basically provides a single function getCurrentTime(). Reading Alex Miller - Controlling time I see I'm not alone in this.

This is awesome for unit testing as you can create your own Clock, fix the time, control the rate at which it advances, test rolling over daylight savings time changes, time zones, etc. But all of your code just relies on Clock, which is the standard facade.

... its also very useful event processing systems, you can factor out time-based behaviour, so that the same code can be used for processing historical and realtime events.

What I have seen is the value of visible public feedback during development. One way to do that is with iterations that provide working features on a regular basis. Another way is to do periodic milestone demos or usability tests or performance testing or whatever makes sense for your project. The most important demo is the first one as you’re probably building something completely wrong and when the user or product manager sees it, they’re going to freak out. That’s ok. In fact, it’s great because you will get plenty of feedback to build the next version.

I think you’ll find that many of our best practices happen to correlate to cycle feedback. Releases get feedback through acceptance testing and actual users. Iterations (or milestones) get feedback by showing work earlier to users and re-planning. Daily builds give feedback by running a suite of tests and reporting on quality. Daily standup meetings give feedback on who’s doing what. Commit sets give feedback by exposing your changes to others and possibly by running commit-time hooks for test suites. Code/test/refactor cycles give you feedback by green bar unit test runs. Post-mortems (or the trendier term “retrospectives”) can give the team feedback about itself at whatever frequency you want.

I’ve come to believe that most process improvements can be tied to a feedback cycle

And on project management:

From a project management point of view, one of the most important things you need to figure out is how you will track the project while you’re in the middle of it.

The most commonly used tool when planning and tracking is the dreaded Gantt chart. I’ve loathed the Gantt chart for a long time but I’ve finally figured out that I hate it because my early exposure to it was using it as a tracking tool. And for tracking the progress of software development, there are few tools worse suited than the Gantt chart. The Gantt displays all tasks as independent and of fixed length. But tasks in the development world tend to be far more malleable, interconnected, and incremental than it’s possible to represent in a Gantt.

Faced with this mismatch, you have two choices: 1) ignore what’s actually happening in the project and approximate it in your rigid schedule or 2) modify the Gantt on a daily basis to reflect the actual state of the project down to the minute. The first will cause developers to ignore the schedule because the schedule seems completely disconnected from reality. The second is madness because the amount of daily churn means the project manager will do nothing else and he will be pestering developers constantly as he does it. These both suck.

For every feature, you should:

* have a list of tasks to be completed (this evolves constantly) * know the order they need to be done in (but I find this to be intuitive and not necessary to spell out) * know whether tasks are completed, in progress, or not started * know who is doing them * know estimates for each remaining task

There are a bunch of ways to do this kind of tracking during the life of a project. I’ve most commonly used a spreadsheet or plain text file, but have had some success with agile-oriented PM tools like Rally or VersionOne. The point is that for many, many projects you can track this stuff with little work by relaxing your death grip on the initial schedule. 4

You do need to know how much work remains so that you can either adjust end dates or de-scope by dropping or shaving features. You can determine this with: (tasks remaining * estimates) - people time available. If that’s <= 0, you need to take corrective action. I find doing this adjustment on a weekly basis works pretty well and can take less than an hour if you stay on top of it. A burn-down chart is a really nice way to represent this info.

After 2 years of excuses, laziness, constant turnover (complete waste of training time when the guy/girl buggers off and leaves you with a new muppet), terrible or copied-from-Google code, never-ending bugs, headaches, baffling phone calls where no-one understood each other, emails that promised to "do the needful" but went ignored, applications that just didn't work, MILLIONS of dollars, and much, much more....... we had enough, and told the Indian coding behemoth we'd had enough and brought our dev team back in house. Saying that things go more smoothly is a massive understatement. Don't know why we bothered. Oh yes, some spreadsheet said it would be cheaper.

This informal communication is completely lost when parts of a project are outsourced. Sending the same spec to another country to be evaluated by a developer who has never met the author and who must route all queries through an account manager just does not work.

Nice to see "pen and paper" winning the best GTD application. Still not happy with the choice of web-based contact management apps, I wish google would pull their finger out and make their contacts app as good as their email and calendar apps.

* Perform a wide range of Data Transforms - XML to XML, CSV to XML, EDI to XML, XML to EDI, XML to CSV, Java to XML, Java to EDI, Java to CSV, Java to Java, XML to Java, EDI to Java etc. * Populate a Java Object Model from a data source (CSV, EDI, XML, Java etc). Populated object models can be used as a transformation result itself, or can be used by (e.g.) Templating resources for generating XML or other character based results. Also supports Virtual Object Models (Maps and Lists of typed data), which can be used by EL and Templating functionality. * Process huge messages (GBs) - Split, Transform and Route message fragments to JMS, File, Database etc destinations. * Enrich a message with data from a Database, or other Datasources. * Perform Extract Transform Load (ETL) operations by leveraging Smooks' Transformation, Routing and Persistence functionality.

Smooks supports both DOM and SAX processing models, but adds a more "code friendly" layer on top of them. It allows you to plug in your own "ContentHandler" implementations (written in Java or Groovy), or reuse the many existing handlers.