Tips

Dec 07, 2009

Here's a common scenario: you have some CSVs containing data and you need to perform some analysis. Sort, filter, create summaries, perhaps even cross two CSVs together in a more complex query. Your first choice would be a spreadsheet. It will be very easy to import your data and do some sorting and filtering. However, when you need some more in depth analysis, this becomes much more difficult.

In this article I will present a simple alternative: import the data to H2 database and manipulate it. You need some basic knowledge of SQL and not much more (you don't need to be a programmer). This article starts from scratch and gets you there quickly. There's no complex installation or heavy-duty server required. You can be up and crunching your data in about 10 minutes.

H2 is an awesome little database engine written in Java. I use it for my Eclipse plugin, nWire, and I wrote about it in the past. It is small, light, simple to install, fully functional and surprisingly quick.

Step 1: Download H2

Go to the H2 Download page. You can select the latest beta, it is fairly stable and offers better performance. Go for the "platform independent ZIP" (works on Windows, Mac OS X, Linux/Unix), I prefer it over the installer which is Windows only (well, I'm on a Mac, maybe that's the reason). However, the installer is simpler to use, so if you're on Windows, you can get it.

Step 2: Install H2

If you downloaded the installer, run it. Otherwise, just pick a folder and unzip the downloaded archive. Installed. You don't even have to restart your machine.

Step 3: Fire up the H2 Console

On Windows: in the H2 folder, locate the bin folder. Inside, you will find "h2.bat". Double click it.

On Mac/Unix: open a terminal, cd to your H2 directory, cd to bin and execute "h2.sh" by typing "./h2.sh".

In both cases, this will open a link in your browser which brings up the H2 console. The console is by default at http://localhost:8082/.

Step 4: Open a fresh database

Th only setting you should modify is the URL. If should look like this: jdbc:h2:file:<path><databaseName>. The database uses several files, so it is best to pick an empty directory (it will be created if it does not exist).

On windows: jdbc:h2:file:C:/data/mydatabase (creates it in C:/data)

On Mac/Unix: jdbc:h2:file:/data/mydatabase (creates it in /data, use ~ to denote your home folder)

Click on "Connect". Upon the first connection, the database will be created and the console will open. It looks like this:

Step 5: Import your CSV

So, dust off your SQL skills, time to do some light hacking. H2 has a useful function called CSVREAD. Start with a simple command: (Specify full path for your CSV)

CALL CSVREAD('c:/data/test.csv')

This assumes that your CSV is comma separated (isn't that what CSV means?) and that the first line contains the column titles. It uses the default system encoding. You can customize it to fit your case:

CALL CSVREAD('c:/data/test.csv', 'ID|NAME', 'UTF-8', '|')

In this case, we define two columns, ID and Name, UTF-8 encoding and | as value separator. There are many customization options which you might need e.g. for importing dates at a specific format. See this page for more details.

In order to manipulate your data, you want to read it into a table. First, create the table:

CREATE TABLE DATA(ID INT, NAME VARCHAR);

Next, import the data to your table:

INSERT INTO DATA (SELECT * FROM CSVREAD('c:/data/test.csv'));

If you have several CSV files, you can import those into different tables.

Step 6: Crunch your data

Now, it's time for you to get creative. H2 supports standard SQL. The H2 Console has some nice features like completion and a useful display of the database elements. You'll get the hang of it in no time. Inside the H2 folder you will find a doc folder which contains a PDF with full reference to H2 functions.

Step 7 (Optional): Export your data to CSV

You can copy the data straight from the query results in the console and paste it into a spreadsheet. Make sure you copy all the results: you may need to set the "Max rows" to all (in a drop-down above the command pane in the console).

Alternatively, you can use the CSVWRITE function to dump the results to a CSV.

Conclusion

Yes, you could do this with most databases out there. However, this is just simpler, which makes it a viable companion to spreadsheets. The meaning of "I need a database" doesn't have to be days and weeks of hard labor. H2 can do much more, this is just a taste. Whatever you do, H2 always keeps it simple and light.

One final tip: the console can connect to any JDBC compliant database, provided the JDBC drivers are in the classpath. Of course it can work as a remote server which makes it a simple alternative to tools like phpMyAdmin (which is MySQL specific). That's a good topic for a separate post.

Jan 04, 2009

During the development of my Eclipse plug-in (nWire) I had to test and fine-tune the behavior of the plug-ins upon startup. It's an annoying situation: you only have one shot for testing your code. If you want to retest, you have to stop and start over. This is especially annoying when loading Eclipse takes more than 10 seconds, which is usually the case.

I found a simple solution which allows me to have a fully functional Eclipse instance in about 2-3 seconds. It is very simple to implement and can save a lot of time (and annoyance).

The solution is simple: disable all the plugins which are not necessary for your Eclipse instance to execute properly. This task is very simple if you know how... so here goes:

Create a new Launch Configuration for your plug-ins or start from an existing configuration and duplicate it. Naturally, it should be of type Eclipse Application. The fastest way to create such a configuration is to open the plugin.xml and click on "Launch as Eclipse Application".

Go to the third tab titles "Plug-ins".

In the top drop down "Launch With" select "plug-ins selected below only".

Click on "Deselect All".

Select your plug-ins from the list. They will appear in the top section, under "Workspace".

Click "Add Required Plug-ins".

Done. Click apply and you are ready to launch your configuration.

A word of caution: this only works if you correctly define the required plug-ins for each of your newly developed plugins. If not, you will immediately notice the problem. This makes it a great way to test the correctness of your dependencies.

Mar 31, 2008

If you're a bit like me, you now have about 20 tabs open in your browser. Let's not go into the psychological reasons for that, but it probably have something to do with a very short attention span.

I found this cool Firefox extension which changed my browsing experience. It's called "Tab Tree Style" and it shows the tab headers on the left side rather than on top. The result: I can have more tabs and easily see all of them at a glance. Navigating between the tabs is easier than ever and I get full size tab headers, so if a site shows information in the header I can see it. For example, GMail shows the name of the current folder. No more cramped headers on the top of the page. The cost is screen real-estate on the left. This is not a problem for me since I use a wide screen and more web sites are designed for regular resolutions. Highly recommended.

I was so pleased with this transition, that I immediately thought about my IDE. I usually have at least 20 files open. Even on my wide 20" monitor, I see about 7 headers. In most cases, I work with two files, side by side, which means I see even less headers. Navigating the rest of the files is a pain. So, I looked for a solution.

I found an interesting plug-in called "EditorViewer". This plug-in provides a list of open editors. Since this is a separate view you can place it where you please. I placed it to the left of my editor.

It's close to what I was looking for, but not exactly. The plug-in seems to be deserted. Nobody touched it since it 2005. You can download it here (download link is missing from the site). There's no "Eclipse update site", but it does work on Eclipse 3.3.1 (and on a Mac). I think it should take less than a day to write something which fits my needs, so maybe I'll write myself, if I can find some spare time :-)

Oct 29, 2007

When I tell people that I'm now working for myself at home, the first reaction is usually "wow, that takes a lot of self discipline". True, it does. There's no replacement for healthy self discipline. However, as with most things in life, there are levels of gray here. I always seek ways to improve.

I found this great tool that helps me keep going. It's called RescueTime. It's a small little gizmo that sits on my machine and monitors what I'm doing. I can then log into the RescueTime web site, see a graph that shows how much I've worked and when. It logs tasks according to the process being executed or the web site being visited. Monitoring URLs is important since many applications are online today. I can then tag the activities with tags like "work", "fun", etc., and get a summary of my activities according to my tags.

So, if you tend to lose track of time reading your RSS feeds or in IM chats, this will give you the required feedback to get back on track. It's like a mirror, providing silent, yet deadly honest, feedback.

Sep 13, 2007

I came across this great post by Neil Patel. I can relate to these tips and it's worth reading. Keeping up all these principals across days of training is not an easy task, but it's the only way if you want to really educate people and leave a residue.

I want to focus on talking with the audience rather than to them. Some people may find it hard at the beginning. So here's my tip: Plan discussions in advance. Find the spots in the presentation where you can turn it from talking about your material to discussing a topic.

Here's some pointers:

Look for points (decisions, opinions) that may be controversial and ask the audience what they think. If the decision looks trivial, play the devil's advocate and show them the other side, eventually reaching the correct conclusion.

Ask instead of tell. Before presenting your solution, ask for their solutions. The best thing is to lead the audience to your solution, but be prepared to accept other solutions as well.

Turn audience questions back to the audience. Unless the question is about pure facts, the person asking probably already has an opinion. Hear it. Let others react. Even if the answer is plain facts, maybe the audience can deduct the answer. This only shows you're broadcasting on the same wave length.