i just wanted to advertise my own implementation for this that i did a few months ago for icu. the generated file contains all names as defined in Unicode 3.0.0, allows fast random-access, and includes data about the algorithmic names for the cjk and hangul blocks.

the data file is 83860 bytes long, with a word list and tweaks as discussed here. it does not use huffman or similar.