Productivity

Dec 07, 2009

Here's a common scenario: you have some CSVs containing data and you need to perform some analysis. Sort, filter, create summaries, perhaps even cross two CSVs together in a more complex query. Your first choice would be a spreadsheet. It will be very easy to import your data and do some sorting and filtering. However, when you need some more in depth analysis, this becomes much more difficult.

In this article I will present a simple alternative: import the data to H2 database and manipulate it. You need some basic knowledge of SQL and not much more (you don't need to be a programmer). This article starts from scratch and gets you there quickly. There's no complex installation or heavy-duty server required. You can be up and crunching your data in about 10 minutes.

H2 is an awesome little database engine written in Java. I use it for my Eclipse plugin, nWire, and I wrote about it in the past. It is small, light, simple to install, fully functional and surprisingly quick.

Step 1: Download H2

Go to the H2 Download page. You can select the latest beta, it is fairly stable and offers better performance. Go for the "platform independent ZIP" (works on Windows, Mac OS X, Linux/Unix), I prefer it over the installer which is Windows only (well, I'm on a Mac, maybe that's the reason). However, the installer is simpler to use, so if you're on Windows, you can get it.

Step 2: Install H2

If you downloaded the installer, run it. Otherwise, just pick a folder and unzip the downloaded archive. Installed. You don't even have to restart your machine.

Step 3: Fire up the H2 Console

On Windows: in the H2 folder, locate the bin folder. Inside, you will find "h2.bat". Double click it.

On Mac/Unix: open a terminal, cd to your H2 directory, cd to bin and execute "h2.sh" by typing "./h2.sh".

In both cases, this will open a link in your browser which brings up the H2 console. The console is by default at http://localhost:8082/.

Step 4: Open a fresh database

Th only setting you should modify is the URL. If should look like this: jdbc:h2:file:<path><databaseName>. The database uses several files, so it is best to pick an empty directory (it will be created if it does not exist).

On windows: jdbc:h2:file:C:/data/mydatabase (creates it in C:/data)

On Mac/Unix: jdbc:h2:file:/data/mydatabase (creates it in /data, use ~ to denote your home folder)

Click on "Connect". Upon the first connection, the database will be created and the console will open. It looks like this:

Step 5: Import your CSV

So, dust off your SQL skills, time to do some light hacking. H2 has a useful function called CSVREAD. Start with a simple command: (Specify full path for your CSV)

CALL CSVREAD('c:/data/test.csv')

This assumes that your CSV is comma separated (isn't that what CSV means?) and that the first line contains the column titles. It uses the default system encoding. You can customize it to fit your case:

CALL CSVREAD('c:/data/test.csv', 'ID|NAME', 'UTF-8', '|')

In this case, we define two columns, ID and Name, UTF-8 encoding and | as value separator. There are many customization options which you might need e.g. for importing dates at a specific format. See this page for more details.

In order to manipulate your data, you want to read it into a table. First, create the table:

CREATE TABLE DATA(ID INT, NAME VARCHAR);

Next, import the data to your table:

INSERT INTO DATA (SELECT * FROM CSVREAD('c:/data/test.csv'));

If you have several CSV files, you can import those into different tables.

Step 6: Crunch your data

Now, it's time for you to get creative. H2 supports standard SQL. The H2 Console has some nice features like completion and a useful display of the database elements. You'll get the hang of it in no time. Inside the H2 folder you will find a doc folder which contains a PDF with full reference to H2 functions.

Step 7 (Optional): Export your data to CSV

You can copy the data straight from the query results in the console and paste it into a spreadsheet. Make sure you copy all the results: you may need to set the "Max rows" to all (in a drop-down above the command pane in the console).

Alternatively, you can use the CSVWRITE function to dump the results to a CSV.

Conclusion

Yes, you could do this with most databases out there. However, this is just simpler, which makes it a viable companion to spreadsheets. The meaning of "I need a database" doesn't have to be days and weeks of hard labor. H2 can do much more, this is just a taste. Whatever you do, H2 always keeps it simple and light.

One final tip: the console can connect to any JDBC compliant database, provided the JDBC drivers are in the classpath. Of course it can work as a remote server which makes it a simple alternative to tools like phpMyAdmin (which is MySQL specific). That's a good topic for a separate post.

Jun 27, 2008

I'm developing software and most of the time I'm working by myself. This does not mean that I don't need to organize my work. As we all know, setting goals and staying focused is crucial. That is a challenge for me - I'm usually bursting with ideas for features and enhancements. I need to prioritize and keep tabs on what I'm doing. Still, I don't want to lose any good ideas for the future.

This is where OmniFocus comes to the rescue. OmniFocus, is, IMHO, the best task management software I've used. It is Mac only and it is based on the well known Getting Things Done method, aka GTD. I'm not a strict GTD follower, yet, I still get everything I need from OmniFocus. In this post, I'll show you how to turn OmniFocus into a development management system for tracking development tasks and bugs. For teams of developers, it cannot replace a full blown system like Bugzilla. However, if you're like me, working in a one man team (no sharing option in OmniFocus) this could be a great solution. Before we start, I recommend reading a bit about GTD if you're not familiar with its' basic concepts.

Overview
Here's an example of my "Planning View"
My system is based on the following:

Under 'Dev' I have a hierarchy of modules like Front End and Back End, under Front End I'll have all the flows, etc.

Tasks usually have tags: when entering a task, I'll usually start with a tag, like [bug], [check], [idea], etc. This helps categorizing tasks later, but not a necessity.

Entering Tasks
I usually use the Quick Entry window to enter new tasks. The great thing about it is that I can invoke at any given time, jot down an idea, bug or just something that I need to check later, and immediately get it out of my head. I usually add a task note which contains as many details as I think I'll need to understand my original idea. I also enter a context (see hereunder), but not a project - so it will remain in my inbox for review.

If you want to include screen shots, you can just add it as a note to the task. I find that Skitch is the perfect companion for that, since it provides simple tools for easy annotations. You don't need to save the file, either, you can just drag it from Skitch to your OmniFocus item.

WorkingI use either the Planning view or the Context View for that. I find the Planning View more natural for that. At the early days of the version you can decide to focus on Enhancements and proceed to bug fixes. That's why I find the separation to projects helpful. If I have a of lower priority, which I'm not interested in working on, I just push it to one of the Waiting projects according to its' priority.

When fixing bugs, the separation to Enhancements and Bugs becomes even more helpful as I usually have the tendency to get carried away and work on enhancements instead of fixing bugs. I have one super-task inside the Bugs project which I call "Can't Reproduce". Here I drop bug that I can't reproduce for some reason. You can also have a separate task list for that.

Contexts
The organization for contexts is critical here. When entering the context for a task I try to find the one that best fits. However, if I have a task for fixing something across the system, it will have a more general context, maybe even the top 'Dev' context.

One of the major benefits of this system is the ability to work by context. Clearly, if you have a large system, you cannot keep it all "in your head". This means that, when you're starting to work on some module, weather it is on an enhancement or a bug fix, you're going to spend some time to reacquaint yourself with the code.

When working by context, you can group tasks dealing with the same module and perform them together. Conceptually, it's the basic reason for having contexts like "@home" or "@supermarket" in GTD. You want to gather all the tasks which are relevant to your current context. That's exactly why module is the same: since I'm here, let's be productive and do everything which is in the vicinity, which is in the same context.

Review
From this point forward, it's basic GTD stuff. I have a daily review of my current tasks and I set up goals in terms of due dates. I have weekly reviews for reviewing all (hopefully) my tasks, promoting, demoting and eliminating the tasks no longer relevant.

Conclusion
I presented a way for turning OmniFocus into an effective development management system. What I like the most about this system:

The ability to quickly get things off my mind, knowing they won't be lost.

Gathering work by modules, i.e. contexts.

The outlining ability - breaking tasks into smaller and smaller sub-tasks. I think this is the main selling point of OmniFocus specially.

Apr 02, 2008

Yesterday's post, obviously, was an April fools' joke. I'm not switching back to Windows and there is no software which sends you electrical shocks through the keyboard.

Some after thoughts:

It's an interesting marketing technique: it may be easier to get bloggers to mention your site free as part of a joke. If you're smart enough (and have a good sense of humor), you can even generate some hype out of it.

There is a connection between managing my tasks and managing my machine time. So, why am I using two separate tools for that, that has no integration between them? There's a correlation between the RescueTime Tags and the GTD contexts. Just a thought.

Apr 01, 2008

I didn't think that this day will come. Well, I was wrong. I came across this amazingly innovative piece of software which completely changed the way I use my machine. It's called FruitfulTime TaskManager Electro. Essentially, it's a task management software which also keeps track of what you do on your machine accordingly. I had two separate systems for that on my Mac: OmniFocus for task management and RescueTime for logging my machine usage.

However, there's a big difference. FTE can help you prevent procrastination by giving you a small "reminder" whenever you start doing unrelated stuff on your machine. Now, you're probably thinking: "I can ignore those remainders, there's nothing new here". Trust me, you won't ignore this one, since it comes in the form of an electric shock.

I took part in a private beta over the past 3 weeks and I can just tell you this: amazing. My productivity went up by roughly 70%. As I said, it only works on Windows. A Mac OS X version is not planned at the moment, so I had to switch to Vista on my Mac. Well, my MacBook Pro was declared in 2007 as the fastest laptop for running Vista by PC World, so no worries there. My only concern was that it drained the batteries of my Wireless Mighty Mouse. Nevertheless, the FruitfulTime team assured me this will be solved until the beta is complete.

Mar 31, 2008

If you're a bit like me, you now have about 20 tabs open in your browser. Let's not go into the psychological reasons for that, but it probably have something to do with a very short attention span.

I found this cool Firefox extension which changed my browsing experience. It's called "Tab Tree Style" and it shows the tab headers on the left side rather than on top. The result: I can have more tabs and easily see all of them at a glance. Navigating between the tabs is easier than ever and I get full size tab headers, so if a site shows information in the header I can see it. For example, GMail shows the name of the current folder. No more cramped headers on the top of the page. The cost is screen real-estate on the left. This is not a problem for me since I use a wide screen and more web sites are designed for regular resolutions. Highly recommended.

I was so pleased with this transition, that I immediately thought about my IDE. I usually have at least 20 files open. Even on my wide 20" monitor, I see about 7 headers. In most cases, I work with two files, side by side, which means I see even less headers. Navigating the rest of the files is a pain. So, I looked for a solution.

I found an interesting plug-in called "EditorViewer". This plug-in provides a list of open editors. Since this is a separate view you can place it where you please. I placed it to the left of my editor.

It's close to what I was looking for, but not exactly. The plug-in seems to be deserted. Nobody touched it since it 2005. You can download it here (download link is missing from the site). There's no "Eclipse update site", but it does work on Eclipse 3.3.1 (and on a Mac). I think it should take less than a day to write something which fits my needs, so maybe I'll write myself, if I can find some spare time :-)