The JDK already provids a decent collation library with java.text.Collator and an even better one is available with icu4j, so why bother? Well, sometimes you simply want your application and your database (a MySQL database that is) to have consistent sort order and equality. If you write heavily database-centric applications like our very own molindo-dbcopy, it’s mandatory.

As there is no Java collation 100% consistent with MySQL, we’ve decided to go for JNI. No, really. My C programming is pretty rusted, but somehow I’ve got it done. Basically, we use libmysqlclient.so and the method strnncollsp(..) defined in m_ctype.h (documentation can be found in string/CHARSET_INFO.txt of MySQL’s source distribution). Basically, it’s the trailing space ignoring equivalent of strnncoll(..) which “compares two strings according to the given collation”. As simple as that. Why no trailing spaces? Well, “all MySQL collations are of type PADSPACE. This means that all CHAR, VARCHAR, and TEXT values in MySQL are compared without regard to any trailing spaces” (see docs).

The library performs fairly well but requires some more testing. It’s available from Maven Central and comes with a pre-compiled library for Ubuntu amd64 that requires libmysqlclient. You can however tweak build.sh to your needs if you are on another operating system.

This is the first post after a while, and we’re trying to have more than one post every two years. But let’s be honest, you’re not interested in this yadda yadda anyway and just came here because Google Webmaster Tools sent you a kind message like

Increase in not found errors

Then you went to GWT and saw an unexpected spike in 404 (or 410 where Google doesn’t make a difference) like this one:

The total amount of course varies from case to case as Google sends the warning if an unusual increase happens. However the numbers can be as high as hundreds of thousands not found pages.

And then you immediately thought “What’s wrong? Will this harm my rankings?”

So you’re kind of safe now, but you want to know what caused the spike and how to get rid of the errors? Well I can’t tell you what your problem was, but I’ll show you mine and maybe, just maybe, it’s the same 😉

The reason for the spike was that Google crawled our Javascript and discovered something that looked like an URL but in fact was a cometd channel id. This can of course also happen to generated links in javascript with snippets (or the infamous /a generated by jquery). Most times you don’t want Google to crawl those links – but how can you avoid Google to crawl Javascript?

To put it simple: you shouldn’t.

Here’s a short explanation why: By all means returning a 404 in this case it the right thing to do, but a rising amount of 404 in your webmaster tools console and a weekly warning message just don’t feel right. The immediate thought usually is to block the URLs via robots.txt. But that’s the wrong thing to do for two reasons:

You tell Google that there is a page existing (which isn’t) and it’s just not allowed to crawl

You move the 404 errors to the “blocked URLs” section of your webmaster tools

So if you want to get rid of you 404 error in Google Webmaster Tools that were caused by Google crawling your javascript, perform a 301 redirect to the best matching page, or if there isn’t one to the homepage. This way you’ll get the errors out of GWT without moving the errors to the blocked section.

While implementing a forum with wicket, spring, hibernate and compass for search, I recently ran into a problem: there are topics and posts that should only visible for some users. Say there’s a moderator forum where all content should only be visible for … well moderators :-).Read the rest of this entry »

In this post, I’m looking for active collaboration of my readers (as I really hope that I have some). I’ve thought about a simpler way to handle Java system properties as I tend to forget them all the time. Additionally, I don’t like to see them as string constants – neither within the code nor somewhere else. I’ve come up with a single enum class, that aims to simplify handling of system properties. Actually, you won’t ever think of possible best practices – hence “The Final Take on Java System Properties” 🙂Read the rest of this entry »

As we’ve recently started feeling that response times of one of our webapps got worse, we decided to spend some time tweaking the apps’ performance. As a first step, we wanted to get a thorough understanding of current response times. For performance evaluations, using minimum, maximum or average response times is a bad idea: “The ‘average’ is the evil of performance optimization and often as helpful as ‘average patient temperature in the hospital'” (MySQL Performance Blog). Instead, performance tuners should be looking at the percentile: “A percentile is the value of a variable below which a certain percent of observations fall” (Wikipedia). In other words: the 95th percentile is the time in which 95% of requests finished. Therefore, a performance goals related to the percentile could be similar to “The 95th percentile should be lower than 800 ms”. Setting such performance goals is one thing, but efficiently tracking them for a live system is another one. Read the rest of this entry »

Today, I’m happy to announce the availability of annotation-based mounting and merging of resources in wicketstuff-merged-resources (version 3.0-SNAPSHOT for Wicket 1.4, version 2.1-SNAPSHOT for Wicket 1.3). In order to mount resources, all that’s needed is adding annotations to component classes:

I’ve recently discovered Stackoverflow as a nice pass-time on the one hand and as a valuable source for answers on the other hand. Normally it takes only a few minutes to get answers for most questions. However, I managed to ask a question that nobody was able to answer yet. The question was about Collations. As I’m suspecting that Collations are a Java feature that is hardly used, I kept working on the problem myself rather then just waiting for an answer on Stackoverflow.

I’ve managed to get something working right now. It’s not completely tested but it should work quite well. What I’m doing is the following: I parse the charset files of MySQL (on an Ubuntu system, you can find them in /usr/share/mysql/charsets/) and do the collation based on those files myself rather than using Java’s built-in collations.