I have a bash script with a while loop that takes a long time to process. It restores file modification times for complicated reasons not worth discussing here. Removing some nonessential stuff, I have the following code (I know it could be rewritten to be elegant, or at least collapsed into a single line):

Before I forget: I have a bunch of files I mirror between Windows/NTFS and Linux/ext4 filesystems that include not only accented characters but curly quotes in the filenames. (I know: the easiest solution would be to just get rid of the extended characters). The curly quotes were created in Windows, so don’t render properly in standard Linux character sets (UTF-8, iso8859-1, iso8859-15, etc.).

This all came up because iTunes under Windows couldn’t find curly-quote files when it was reading from the exported Samba share filesystem rather than an attached NTFS drive. The files showed up as missing because they had different filenames.

The solution was not easily google-able, so for the record, in brief, add this to the [Global] section of /etc/samba/smb.conf:

unix charset = cp1252
display charset = cp1252

And reload Samba.

Also, to make the characters render properly from a terminal on the Linux box, first create the relevant character set:

sudo localedef -f CP1252 -i en_US en_US.CP1252

Now you can use this charset on your Linux box, and, like magic, the curly characters will be back:

I’m unaware of any free tool to perform OCR on a PDF and embed the resulting data in the PDF itself so it is text-searchable. If anyone knows of one, let me know. In the meantime, I use Acrobat Professional for this essential functionality.

High resolution PDFs produced by my scanner (HP Officejet Pro L7700) usually give the following error when I try to perform Acrobat OCR:

This page is larger than the maximum page size of 45 inches by 45 inches.

Surprisingly, there doesn’t seem to be any way to resize the page size of a PDF within Acrobat. It’s possible to print to a new PDF of the correct size, but this operation cannot easily be batched. If I apply the “crop” tool to resize the page in Acrobat, I get this error:

Page size may not be reduced.

Many report these issues in Adobe’s forums. The most common responses suggest reconfiguring the scanner or buying a new one.

I found nothing quick and easy after some googling for a simple ghostscript recipe to perform the batch pre-processing necessary to allow Acrobat to do the OCR. It’s not hard to do, just a bit of a trial-and-error pain to get the right switches.

For posterity, then, here is a simple command-line to make this happen (here under Windows, but could obviously easily be adapted for any other platform). First, download the latest ghostscript for your platform (at this time, 8.64 for Windows). Then:

I recently upgraded my home router box to Debian Lenny. Everything went fairly smoothly, with a few exceptions. My NFS mounts no longer worked because apparently wildcards are no longer allowed in IP addresses in /etc/exports; the export addresses needed to be translated to subnet format (e.g., 192.168.98.* becomes 192.168.98.0/824).

But after a power failure last night, the router box rebooted and I was no longer able to access the Internet from any clients on my LAN. Strangely, I could ping or traceroute external hosts and perform DNS lookups, but web surfing and ssh timed out after an initial handshake. I noticed by telnetting to port 80 of an external host, I got an error back from an invalid HTTP request (e.g. “oeunthioues”), but if I sent a standard valid request (GET /index.html HTTP/1.0), the connection just hung with no response.

I won’t recount all the false leads I had in diagnosing this problem. It turned out that the Internet-facing NIC on my router box had been reset to a low MTU. By setting the MTU on the LAN clients to that low number, or raising the MTU on the Internet-facing NIC back to 1500, the problem was solved:

# ifconfig eth2 mtu 1500

After restarting networking on the router box, the MTU was again set back down to 576, which is apparently the default MTU for an X.25 network. I have no idea why the interface is getting that value by default (where it wasn’t before), so I just added a hack to /etc/network/interfaces to fix it:

iface eth2 inet dhcp
post-up /sbin/ifconfig eth2 mtu 1500

Interestingly, pre-up didn’t work.

Hopefully I’ve included enough relevant terms in this entry that others with this problem will find it. It was hard to diagnose because no errors appeared in any log file, and I had partial but not complete connectivity from internal clients to the Internet. My first guess was that it was due to the iptables upgrade, but in fact it was entirely unrelated.

All the “pages” linked from my weblog — for example, my “about” page and my PGP key — are broken. I’ve posted in the WordPress Support Forums with no luck. I’m not sure when or why they stopped working, but if any readers have any suggestions of how to troubleshoot, I’d love to hear about it. Nothing relevant appears in server logs.

In the meantime, apologies if you came here trying to find out about me. I’m temporarily out of service.

This did the trick for me where no other solution would work. Of course, link autodetection no longer occurs, but that’s a small price to pay for connectivity.

This is a Debian etch installation using a slightly more recent kernel (2.6.25-2-686).

As an interesting side note, on this new box, the interface appears as eth0 in the kernel logs, but is actually mapped as eth1. Similarly, a second Ethernet interface appears in the log as a different device number than that by which it is referenced. Any ideas why?

Update 6/22/08: Still not getting 1000BaseT (Gigabit), however. If I force 1000BaseT with ethtool -s eth1 speed 1000, the link goes down again (even with autoneg off). The same card in another box, however, detects the link and goes to 1000BaseT automatically. So I’m stuck at 100BaseT.

I’ve been on the road a lot lately with no time to blog (or sleep).Â Following are a couple items to keep this space from going completely dead.

A wild turkey stopped by our backyard (in Boston):

I called city animal control, and they said these guys are everywhere. Apparently there is no more wild left for the wildlife.

In other news, Gears is finally available for Firefox 3! For me, at least, this is a big deal, since I frequently depend on offline mode for Google Reader and have otherwise fully migrated to FF 3 for the speed benefits. Since RC2, it’s mostly stopped crashing as well.

I don’t have much (any?) history of posting tips for the Windows platform, but I’m currently stuck with it for daily work use, so I figured I might as well share some tips that my readers who happen to be in the same predicament will find useful. (Planet Debian readers please have mercy.)

One of the worst things you that Microsoft Outlook allows a user to do is select a “stationery” for email. Stationery goes beyond regular old HTML mail (e.g., fonts, colors, and bullet lists) to add a patterned background, invariably rendering the content much less readable than it would be with a white (or even any other color) background. What’s worse is every reply to an email with stationery also adopts the original sender’s stationery!

I searched quite a bit for a solution that does not involve sending a nastygram to the original sender. Of course you can convert the email to plain text (or set Outlook to only display the plain text version) and then convert back to HTML or Rich Text, but you’ll also lose other formatting that you might want to retain. You could cut and paste the text into a new email, but what is really needed is a simple VBA macro that will strip the stationery but not other formatting.

Strangely, I don’t think that macro already exists. So I wrote one, to some extent cribbing from related code snippets (mostly from here). I now present to the world ClearStationery.bas, my best contribution to date to the Outlook ecosystem. Simply paste it into your Outlook Visual Basic Editor (ALT-F11) and then map the macro ClearStationeryFormatting() onto a toolbar with a hotkey, and you can instantly remove stationery from any email, whether it is in the “preview” pane or the full message view.

I’ve been playing around with Gallery and wpg2. I’m still a bit puzzled attempting to integrate Gallery and WordPress. I’ve resolved most issues; the main remaining issue is to display images in the Ajaxian theme without running over the borders in the Ajax/slideshow views. Also, the embedded image apparently doesn’t render in the RSS feed.Update: I’ve given up on the G2 tinymce plugin and the WPG2 tag for now and just hardcoded the image and album URL. Update 2: now the embedded image is working again for no good reason. Suggestions on the entire configuration are welcome.

In any case, I took some pretty photos today in our back yard (use left and right arrow keys to scroll through images after clicking on the one below — I still can’t get the navigation icons to appear):