Before I forget: I have a bunch of files I mirror between Windows/NTFS and Linux/ext4 filesystems that include not only accented characters but curly quotes in the filenames. (I know: the easiest solution would be to just get rid of the extended characters). The curly quotes were created in Windows, so don’t render properly in standard Linux character sets (UTF-8, iso8859-1, iso8859-15, etc.).

This all came up because iTunes under Windows couldn’t find curly-quote files when it was reading from the exported Samba share filesystem rather than an attached NTFS drive. The files showed up as missing because they had different filenames.

The solution was not easily google-able, so for the record, in brief, add this to the [Global] section of /etc/samba/smb.conf:

unix charset = cp1252
display charset = cp1252

And reload Samba.

Also, to make the characters render properly from a terminal on the Linux box, first create the relevant character set:

sudo localedef -f CP1252 -i en_US en_US.CP1252

Now you can use this charset on your Linux box, and, like magic, the curly characters will be back:

I’m unaware of any free tool to perform OCR on a PDF and embed the resulting data in the PDF itself so it is text-searchable. If anyone knows of one, let me know. In the meantime, I use Acrobat Professional for this essential functionality.

High resolution PDFs produced by my scanner (HP Officejet Pro L7700) usually give the following error when I try to perform Acrobat OCR:

This page is larger than the maximum page size of 45 inches by 45 inches.

Surprisingly, there doesn’t seem to be any way to resize the page size of a PDF within Acrobat. It’s possible to print to a new PDF of the correct size, but this operation cannot easily be batched. If I apply the “crop” tool to resize the page in Acrobat, I get this error:

Page size may not be reduced.

Many report these issues in Adobe’s forums. The most common responses suggest reconfiguring the scanner or buying a new one.

I found nothing quick and easy after some googling for a simple ghostscript recipe to perform the batch pre-processing necessary to allow Acrobat to do the OCR. It’s not hard to do, just a bit of a trial-and-error pain to get the right switches.

For posterity, then, here is a simple command-line to make this happen (here under Windows, but could obviously easily be adapted for any other platform). First, download the latest ghostscript for your platform (at this time, 8.64 for Windows). Then:

All the “pages” linked from my weblog — for example, my “about” page and my PGP key — are broken. I’ve posted in the WordPress Support Forums with no luck. I’m not sure when or why they stopped working, but if any readers have any suggestions of how to troubleshoot, I’d love to hear about it. Nothing relevant appears in server logs.

In the meantime, apologies if you came here trying to find out about me. I’m temporarily out of service.

I’ve been playing around with Gallery and wpg2. I’m still a bit puzzled attempting to integrate Gallery and WordPress. I’ve resolved most issues; the main remaining issue is to display images in the Ajaxian theme without running over the borders in the Ajax/slideshow views. Also, the embedded image apparently doesn’t render in the RSS feed.Update: I’ve given up on the G2 tinymce plugin and the WPG2 tag for now and just hardcoded the image and album URL. Update 2: now the embedded image is working again for no good reason. Suggestions on the entire configuration are welcome.

In any case, I took some pretty photos today in our back yard (use left and right arrow keys to scroll through images after clicking on the one below — I still can’t get the navigation icons to appear):

Practicing lawyers, like practicing programmers, are professional pragmatists. Both must make their cases (and case mods) out of the materials they have available; both starve or eat steak depending on whether their creations work. The day-to-day practice of law is unlikely ever to require much high theory. We can mourn that fact because it means that they look at us with suspicion, or celebrate it because it frees us to chase Truth and Beautyâ€”and it will remain a fact either way.

More than 3,000 people, on average, were visiting his site every day, and his most popular songs were being downloaded as many as 500,000 times; he was making what he described as â€œa reasonable middle-class livingâ€ â€” between $3,000 and $5,000 a month â€” by selling CDs and digital downloads of his work on iTunes and on his own siteâ€¦

Coulton realized he could simply poll his existing online audience members, find out where they lived and stage a tactical strike on any town with more than 100 fans, the point at which heâ€™d be likely to make $1,000 for a concert. It is a flash-mob approach to touring: he parachutes into out-of-the-way towns like Ardmore, Pa., where he recently played to a sold-out club of 140â€¦.

In total, 41 percent of Coultonâ€™s income is from digital-music sales, three-quarters of which are sold directly off his own Web site. Another 29 percent of his income is from CD sales; 18 percent is from ticket sales for his live shows. The final 11 percent comes from T-shirts, often bought onlineâ€¦

Today I released version 0.60 of randomplay, my command-line shuffle-recall-swiss-army-knife music player. It will never make Winamp users happy, but itâ€™s a good substitution for complex combinations of find/grep/xargs/sort that people sometimes use to pick tracks to play. If you canâ€™t see why youâ€™d use it, you probably donâ€™t need it.

The latest version adds two new command-line options, â€”older-than and â€”newer-than. These can be used to limit the songs included in the shuffle on the basis of the file modification date. The syntax is fairly flexible, and resembles that used by rdiff-backup for restoration commands. For example:

Randomly play music under the ~/music directory that were added in the past week:

randomplay --newer-than 1W ~/music

Play in order music that is from before this year:

randomplay --norandom --older-than '1/1/2006'

Give a list of filenames of music that were added in the past 6 months, but haven’t been played in the last three months:

randomplay --names-only --newer-than 6M --days 3M

Play, but don’t record in the playing history, music added in the first three months of 2004:

Unfortunately, this new feature is pretty slow, because it stats each file individually on the initial spidering of the directories to be played. In fact, the startup is always fairly slow if you are searching a large directory hierarchy, since randomplay does not preserve any file index but checks anew on each execution. If you are searching tens of thousands of tracks over NFS (as I do), this can take a minute or so. Suggestions for improving the perfomance of the file modification time detection or of the whole startup are welcome. At some point, I will probably implement an indexing feature, but I like the simplicity of it now where it works basically like the shell find command.

Hi. This is the qmail-send program at yahoo.com. I'm afraid I wasn't able to deliver your message to the following addresses. This is a permanent error; I've given up. Sorry it didn't work out. : 72.1.169.10 does not like recipient. Remote host said: 550 : Recipient address rejected: undeliverable address: unknown user: "[list name]" Giving up on 72.1.169.10.

72.1.169.10 is, in fact, the IP address of my server. [list name] is (in the real version) the real live name of the list. The list seems to work for everyone else. And it’s certainly not true that I, or my server, doesn’t like this recipient (or sender).

Aside from this anomalous behavior, it’s also funny that Yahoo! provides plain old unfiltered qmail bounce messages to its users. Wouldn’t you think a fully matured webmail service like Yahoo! would, by this point, have somewhat customized their mail error reporting messages? In fact, wouldn’t you think they would want to hide the fact that the use qmail at all, if only for security purposes? Couldn’t they hire an intern to write a few replacement error messages? Maybe I’m missing something.

Update 5/30/06: Figured it out. Oddly, Yahoo! was looking up the CNAME DNS record for the domain name and replacing that in the mail header. While the original email went to e.g., listname@lists.mydomain.com, the message as delivered was addressed to listname@servername.mydomain.com. Because only lists.mydomain.com processed email for lists, the message bounced. The solution was to change lists.mydomain.com from being a CNAME entry to its own A entry with the IP address specified directly. That fixed the problem. I’ve never seen any other mail service work this way — gmail certainly doesn’t.