Some time ago I introduced declarative creation of Zend Form instances. But what should you do if you have two or more forms that have common parts? Yes, you can copy-paste form files but adding extension mechanism is much easier and much more elegant :)

To improve search quality I needed stemming algorithm. Porter2 seemed to be the best choice. However I realized that the only reference implementation exists is written on Snowball.

Now I’ll be throwing stones to Snowball. I really cannot get people who handcrafted this language. Its unreadability can be compared to perl, but the syntax and expression possibilities are really limited.

It’s being a while since my last Zend-related post. Now I’ll try to cover approach to Zend_Form usage. This component has really extensive functionality. You can use Zend_Form in a straightforward manner by creating form elements in your controller’s code:

Expanding your vocabulary is considered to be one of the most important, and I would say, difficult tasks in language acquisition. Especially if you’re living some place that is far enough from Japan. So, finally you came across the product that offers you easy and fun way to extend your vocabulary

offering approximately 7500 nouns, adjectives and adverbs, expressions and verbs over the three levels.

Well, if you decide to buy it please be well aware of what you’re buying. Essentially the list of chaotically aligned nouns, verbs and phrases. These words are pronounced in English and Japanese. That’s it.

No context, no usage examples, no topics, nothing… How I’m I supposed to remember 300 words in 10 minutes if everything you do is just read them one by one with 5 seconds interval. Moreover you hear annoying music in background. I cannot really grasp the idea of the guys who created that piece of … media.

BTW this site gave me an answer why I couldn’t really find Japanese movies or anime with Japanese subs regardless how hard I tried:

… the thinking in Japan’s movie industry has typically followed two distinct lines:

Hearing-impaired people can go in the general direction of heck.

Subtitles on foreign movies are not merely intended to repeat dialogue, but to convey, clarify and expound on dialogue — in other words, to pick up perceived slack in the audio translation

There are several hot discussions going on around his method, many people admire his way of learning the language, others are quite skeptical. But IMHO you should read it yourself, analyze it and then…

In this post I will reveal the secret of really smart integration between Zend and Smarty :) Since I’m too lazy to write 10 similar functions inside the Smarty plugin, I decided to modify the Smarty compiler. And it worked well.

Making Smarty zend-aware

This step is simple – you’re just adding a function that allows you to call Zend View Helper using
call_user_func_array.

It’s been a while since I updated the blog. But the things were pretty busy lately…
So finally the application’s skeleton is in place. It uses Zend + Smarty. I’m also done with
Tanaka corpus parsing. I can only say that APR is really easy to use.

After you nailed this most likely you’ll start thinking about something more elaborate, capable of supporting Zend_View helpers such as headScript, doctype etc. And most probably you will end up with a plugin class that maps Zend_View helpers to Smarty custom functions.

Kanji… List of kanji’s… For many people these are synonyms. And it’s quite natural for many Japanese learners to think about kanji’s as of long list of characters that should be indexed, graded and memorized. You will find lots of pre-cooked lists and most likely fall into the trap.

Flash card programs, paper flash cards, books like Heisig’s ‘Remembering the Kanji’, JLPT-based lists and 常用漢字 on top of it.

In my opinion, the worst problem with those pre-cooked lists is that beginners try use them somehow in their studies. One sees 常用漢字 list and thinks, ‘This list has grades and is arranged according to frequency of usage. So if I make flash cards and will be memorizing 5 kanji a day I will be able to learn them all in 13 months.’ Others go a little bit farther – they take into additional factors like time required to revisit already learned symbols and summer vacation. Even in this case this way of thinking leads only to frustration once you started your attempts to nail these characters down.

So, what many people don’t understand is that you have to be a real genius to memorize all 1945 characters absolutely without context. Even if you managed somehow to remember all the kanji’s from the list, you should be aware of the fact that they are not real words and you have no idea how to transform these pictograms into meaningful language primitives (I mean words, of course). You have no idea how to read them and how to use them.

Even if you put the fact that you cannot really use those ready-to-use kanji lists aside, what’s the usefulness of these kanji inventories? Let’s take 常用漢字 as an example. Why am I supposed to learn 「亜」but not the kanji for the word “who” （誰）? Have you ever seen, even once, 「アジア」 and 「アメリカ」 written as 「亜細亜」 and 「亜米利加」? Moreover this kanji goes FIRST in this list. Why can’t I find 「枕」 among these 1945 characters? Don’t you use pillows every single day in your life? But you definitely should know that 「斤」 means 1.32 lb.

So what’s the bottom line for this post? Throw all your flash cards? No, I’m not advocating for throwing your stuff out, I’m simply trying to say that we should always think of the list not as of goal but as of an aid.

This Sunday I was busy trying to optimize data load process. In fact I ended up by completely rewriting the stylesheets. During this process I had a chance to compare performance of 2 XSLT processors I use: Altova XSLT and Saxon-B. The results are nonpresumable.

What you can find below is not a real benchmark. I simply took an average execution time calculated after 3 test runs on some of the stylesheets I use.

[1] – Compiled stylesheet was used instead of raw XSLT. And these results lead us to the interesting conclusion: popular assumption that a product that compiles to bytecode will be necessarily faster than an interpreter is WRONG. I will try to cover this topic in my next posts.

[2] – Settings for all saxon runs are as following: -l:off -dtd:off -tree:tiny

[3] – All results are in seconds

Well, in some cases Saxon, which is pure Java, is up to 48 times slower than pure C++. Moreover Altova consumes enormous amounts of memory failing to process relatively small files (approximately 45 – 70 Mb) on a 32-bit machine, while Saxon uses around 300Mb regardless of the input file size.

So right now the dictionary is processed in 77 seconds and loaded into the DB in less than 5 seconds. Not bad I think…