Rails Plugins

Work

Thinking Sphinx Case Folding Configuration

May 1, 2009

Case Folding is what allows searches for pais or paÃ¬s to match paÃ­s. By default accented characters won’t even be indexed by Sphinx. They’ll be considered word breaks so you would have to search for pa s to match paÃ­s. I would have thought a standard latin case folding config would be standard for just about everyone using Sphinx, however a cursory Googling didn’t turn up much. The best article I found as by James Healy.

His config works, but then in the Sphinx wiki I found reference to a formatted list whose formatting and length imparted a certain air of authority.

I plugged it in to the config given by Mr. Healy and had YAML-related problems. Using a double-quoted string in YAML will generally collapse everything to one line. After removing comments from the list the generated config looked good, but then Sphinx choked on the incredibly long line length: