I'm honored and pleased to be the person who gets to complete ls. This
project, begun around when I was born, was slow to turn into anything more
than a simple for loop over a dirent. It really took off in the mid and
late 80's, when Richard Stallman added numerous features, and the growth
has been steady ever since. But, a glance at the man page shows that
ls has never quite been complete. It fell to me to finish the job, and
I have produced several handy patches to this end:

The only obvious lack now is a -z option, which should make output
filenames be NULL terminated for consuption by other programs.
I think this would be easy to write, but I've been extermely busy IRL
(moving lots of furniture) and didn't get to it. Any takers to write it?

Due to the nature of these patches, they conflict with each other.
Here's a combined patch suitable to be applied and tested.

It remains to be seen if multi-option enabled coreutils will be accepted
into Debian in time for the next release. Due to some disagreements with
the coreutils maintainer, the matter has been referred to the Technical
Committee
(Flattr me)

Traditionally new ls contributors stop once enough options have been
added that they can spell their name, in the best traditions of yellow
snow. Once ls -richard -stallman worked, I'm sure RMS moved on other
other more pressing concerns. The current maintainer, David MacKenzie, was
clearly not done yet, since only ls -david -mack worked. But he was being
slow to add these last few features, and ls was very deficient in the realm
of spelling my name (ls -o -hss .. srsly?), so I took matter into my own
hands in the best tradition of free software.

My public radio station is engaged in a most obnoxious spring pledge drive.
Good time to listen to podcasts. Here are the ones I'm currently liking.

Free As In Freedom: The best informed podcast on
software licensing issues, and highly idealistic. What keeps me coming
back, though is that Karen and Bradley never quite agree on things, and
always end up in some lawyerly minutia culdesac that is somehow interesting
to listen to. They once did a whole show about a particular IRS tax form,
and I listened to it all. (Granted, I often listen to this while cleaning
house, but as Bradley would say, at least I'm not listening to it while
driving.)

This Developer's Life:
At least the early episodes before it got popular are a unashamed
imitation of This American Life, and I have quite enjoyed them. Although
I often roll my eyes at the proprietary developer mindsets on
display in the show. For example, often they'll have a bug and not root
cause it, because well, they don't have the source code for the Windows
layers. Still, beneath that it's mostly about the parts of software
development that are common to all our lives. A particular episode I
can recommend is #10
"Disconnecting"
-- the first 20 minutes is a perfect story.

Off the Hook: This is actually a live
radio show, quite well done, with
call-ins and everything. So much more polished than your typical podcast.
It's hosted by Emmanuel Goldstein!
And it's been going on for over 20 years, so why did I never hear about
it before? Probably I'm not quite in the right hacker circles. Since it's
out of NYC and very anti-authoritarian, I've mostly been enjoying it
as a view into the Occupy protests.

StarShipSofa:
The best science fiction podcast around. Probably not
news to anyone who ever looked for such a podcast. Long, and tends to be
frontloaded with a lot of administrivia, which I fast-forward to get
to the stories.

Spider on the Web:
The best music and science fiction podcast
around. Mostly on hiatus since Jeanne died, but I hope Spider picks it
back up. A good examplar is
"Bianca's Hands"

Long Now Seminars:
Consistently interesting. I visited their space last time
I was in SF only to learn they'd had a talk the night before, which would
have been a bummer, except they ran the bits of the Clock for us.

Linux Outlaws: After 18 years using
Linux, I find the level of discourse in most Linux podcasts typically
rather annoying. Including this one, but when Fab gets on a rant, it's
all worth it. Sometimes some interesting guests.

This Week In Debian:
Sadly no new episodes lately, and I've been too
lame to respond to repeated interview requests. Probably it needs to
move away from being an interview show if it is to continue; there
are only so many DD's who can give excellent interviews like liw did.

git-annex has
special remotes
that allow large files checked into git to be stored in arbitrary places,
that are not proper git remotes. One key use of the special remotes is to
store files in The Cloud.

Until now the flagship special remote used Amazon S3, although a few
other things like Archive.org, rsync.net, and Tahoe-Laffs can be made
to work too. One of my goals is to add as many cloud storage options to
git-annex as possible.

Box.com came to my attention because they currently
have a promotion that provides 50 gigabytes of free "lifetime" service.
Which is a nice amount of cloud storage to have for free. I decided that
I didn't want to spend more than 4 hours of my time to make git-annex
use it though. (I probably have spent a week on the S3 support by contrast.)

So, this is a case study in quickly adding support for one cloud storage
provider to git-annex.

First, I had to sign up to box.com. Their promotion requires an android
phone be used to get the 50 gigabytres. This wasted about an hour
getting my unused phone dusted off etc. This also includes time
spent researching ways to access box.com's storage, including reading
their API documentation. I found it has a WebDAV interface.

Sadly, there is not yet a native WebDAV library for haskell.
This is a shame, because it would make the implementation better.
But, I'm confident someone will eventually write one. My experience
with haskell libraries for other web APIs (S3, GitHub) is that
it's an excellent language to write them in, the code tends to
be very simple, concise and clear. But I can't do it in 4 hours.
So for now, the workaround is to use a WebDAV mounting tool.
I picked davfs2
as it was the first one I got to work with box.com's slightly broken
WebDAV. 2 hours spent now.

With box.com mounted, I was neary done; git-annex's
directory
special remote can use the mount point. But there was a catch:
box.com only allows up to 100 mb large files. I spent 1 hour or so
adding support to the directory special remote for chunking files
into a user-specified size.
This was a fairly complex problem --
the existing code had a ByteString that when accessed lazily read
the whole large file (from disk or from gpg, depending), and just called
writeFile on it.
I needed to still consume it lazily to avoid reading
the whole file into memory, but write out chunks. This gets a bit
into haskell's ByteString internals, but they're very well suited to this
kind of thing, and so after 15 minutes familiarizing myself with the
data structures, it was actually fairly easy to write the code.
patch

I spent my last hour testing and tuning the box.com special remote.
Using davfs2 as a quick fix caused some technical debt that I had
to make up for. In particular, the chunked filename retrieval code
had to make sure not to open every chunk at once, because that
makes davfs2 try to cache them all, instead of streaming one at a time.
patch

Not counted toward my 4 hour limit is the ... er ... 4 hours I spent
last night adding a progress bar to the directory special remote.
A progress display while transferring the files makes using box.com
as a special remote much nicer, but also makes using my phone's
SD card as a special remote much nicer!
This is why I'm a poor consultant -- when faced with something generic
and generally useful like this, I have difficulty billing for it.

And it seems to work quite well now. I just set up my production box.com
special remote. All content written to it is gpg encrypted, and various
of my computers have access to it, each using their own gpg key to decrypt
the files uploaded by the others.
(git-annex's encryption
feature makes this work really well!)

So..
There is a DropBox API for haskell.
But as I'm not a customer, the 2 gb free account hardly makes it worth
my while to make git-annex use it.
Would someone like to fund my time to add a dropbox special remote to git-annex?

This leap day saw me driving along the river on a rainy, with 4
chickens in the car's trunk, and 3 terabytes of disk (and a half a bale of
straw) in the back seat. I may have not been blogging much lately about
life, because these situations can be hard to explain. (Or because "joined
the Debian haskell team and spent two days working on rebuilds for the ghc
7.4 transition" is not thrilling reading.)

hens in a car

The Light Sussex chickens are my sister's spare flock, which are "too tame".
They're now cozily installed into a coop we
built last weekend.
In return I gave her a 6 foot long APC power strip, which had been mounted
on the wall of my office. I'm preparing my house in town to be rented,
and have little need for two dozen power outlets here in solar power land.

Indeed, today is a gift economy day all around -- when I
arrived at the cabin, there on the porch was an unexpected package from
Google. Particularly surprising since I never get deliveries here, since
the driveway is a mile long and often seems like it could dead-end into the
woods at any moment.

The combination of technological wackiness (I also debugged a laptop
whose USB hub hangs when a particular trackball is plugged in)
and in your face country texture (including coal trains, being stuck behind
a tractor, and miles of amazing tree-height mist) made this a memorable day.

One of the weird historical accidents of programming languages is that
so many of them use $ for important things. The reason is just that
out of the available punctuation, nearly all of it has a mathmatical or
other predefined use that makes sense to retain in a programming
language context, while $ (and also @ and #) do not.
Still, $ annoys me, it's so asymetric that we use it all
over our code, and never a £ or ฿ to be seen.

The one language that manages to use $ nicely, IMHO, is Haskell.
Recently I noticed that it has an actual visual mnemonic in its use of $.
And it's used for something I've not seen in other languages.

The visual mnemonic of $ is that it looks like an opening
parenthesis, with the related closing parenthesis on a line below it.

(something (that
(lisp folks
(are (very (familiar with)))
)
))

And this is also the problem that $ solves:

something $ that $
haskell folks $
are $ very $ familiar with

This is a trivial feature.. but oh so useful.
The implementation in Haskell of $ is simply:

f $ x = f x
infixr 0 $

Just function application, but at a different precedence than usual.

I am now very addicted to my $. Out of 15 thousand lines of code,
only 87 contain )), while 10% use $.

My last post missed an important thing about
GHC 7.4's handling of encodings for FileName. It can in fact be safe to use
FilePath to write a command like rm. This is because GHC internally uses
a special encoding for FilePath data, that is documented to allow
"arbitrary undecodable bytes to be round-tripped through it". (It seems to
do this by encoding the undecodable bytes as very high unicode code
points.) So, when presented with a filename that cannot be decoded using
utf-8 (or whatever the system encoding is), it still handles it, and using
the resulting FilePath will in fact operate on the right file. Whew!

Moral of the story is that if you're going to be using GHC 7.4 to read or
write filenames from a pipe, or a file, you need to arrange for the Handle
you're reading or writing to use this special encoding too.
I use this to set up my Handles:

Even if you're only going to write a FilePath to stdout, you
need to do this. Otherwise, your program will crash on some filenames!
This doesn't seem quite right to me, but I hesitate to file a bug report.
(And this is not a new problem in GHC anyway.)
If I did, it would have this testcase:

Since git-annex reads lots of filenames from git commands and other places,
I had to deal with this extensively. Unfortunatly I have not found a way to
read Text from a Handle using the fileSystemEncoding. So I'm stuck with
slow Strings. But, it does seem to work now.

PS: I found a bug in GHC 7.4 today where one of those famous Haskell
immutable values seems to get well, mutated. Specifically a [FilePath]
that is non-empty at the top of a function ends up empty at the bottom.
Unless IO is done involving it at the top. Really.
Hope to develop a test case soon. Happily, the code that triggered it
did so while working around a bug in GHC that is fixed in 7.4.
Language bugs.. gotta love em.

I've just spent several days trying to adapt git-annex to changes in ghc
4.7's handling of unicode in filenames. And by spent, I mean, time
withdrawn from the bank, and frittered away.

In kindergarten, the top of the classrom wall was encircled by the aA bB
cC of the alphabet. I'll bet they still put that up on the walls.
And all the kids who grow up to become involved with computers learn
that was a lie. The alphabet doesn't stop at zZ. It wouldn't all fit
on a wall anymore.

So we're in a transition period, where we've all learnt deeply the
alphabet, but the reality is much more complicated. And the collision
between that intuitive sense of the world and the real world makes things
more complicated still. And so, until we get much farther along in this
transition period, you have to be very lucky indeed to not have wasted
time dealing with that complexity, or at least having encountered
Mojibake.

Most of the pain centers around programming languages, and libraries,
which are all at different stages of the transition from ascii
and other legacy encodings to unicode.

If you're using C, you likely deal with all characters as raw bytes,
and rely on the backwards compatability built into UTF-8, or you
go to long lengths to manually deal with wide characters, so you can
intelligently manipulate strings. The transition has barely begin,
and will, apparently, never end.

If you're using perl (at least like I do in ikiwiki), everything
is (probably) unicode internally, but every time you call a library
or do IO you have to manually deal with conversions, that are generally
not even documented. You constantly find new encoding bugs.
(If you're lucky, you don't find outright language bugs... I have.)
You're at a very uncomfortable midpoint of the transition.

If you're using haskell, or probably lots of other languages like python
and ruby, everything is unicode all the time.. except for when it's not.

If you're using javascript, the transition is basically complete.

My most recent pain is because the haskell GHC compiler is moving along
in the transition, getting closer to the end. Or at least finishing
the second 80% and moving into the third 80%. (This is not a quick
transition..)

The change involves filename encodings, a situation that, at least on unix
systems, is a vast mess of its own. Any filename, anywhere, can be in any
encoding, and there's no way to know what's the right one, if you dislike
guessing.

Haskell folk like strongly typed stuff, so this ambiguity about what type
of data is contained in a FilePath type was surely anathama. So GHC is
changing to always use UTF-8 for operations on FilePath.
(Or whatever the system encoding is set to, but let's just assume it's
UTF-8.)

Which is great and all, unless you need to write a Haskell program
that can deal with arbitrary files. Let's say you want to delete
a file. Just a simple rm. Now there are two problems:

The input filename is assumed to be in the system encoding aka unicode.
What if it cannot be validly interpreted in that encoding?
Probably your rm throws an exception.

Once the FilePath is loaded, it's been decoded to unicode characters.
In order to call unlink, these have to be re-encoded to get a
filename. Will that be the same bytes as the input filename and the
filename on disk? Possibly not, and then the rm will delete the wrong
thing, or fail.

But haskell people are smart, so they thought of this problem, and provided
a separate type that can deal with it. RawFilePath hearks back to
kindergarten; the filename is simply a series of bytes with no encoding.
Which means it cannot be converted to a FilePath without encountering the
above problems. But does let you write a safe rm in ghc 4.7.

So I set out to make something more complicated than a rm, that still needs
to deal with arbitrary filename encodings. And I soon saw it would be
problimatic. Because the things ghc can do with RawFilePaths are limited.
It can't even split the directory from the filename. We often do need to
manipulate filenames in such ways, even if we don't know their encoding,
when we're doing something more complicated than rm.

If you use a library that does anything useful with FilePath, it's not
available for RawFilePath. If you used standard haskell stuff like
readFile and writeFile, it's not available for RawFilePath either.
Enjoy your low-level POSIX interface!

So, I went lowlevel, and wrote my own RawFilePath versions of pretty much
all of System.FilePath, and System.Directory, and parts of MissingH
and other libraries. (And noticed that I can understand all this Haskell
code.. yay!) And I got it close enough to working that, I'm sure,
if I wanted to chase type errors for a week, I could get git-annex, with
ghc 4.7, to fully work on any encoding of filenames.

But, now I'm left wondering what to do, because all this work is
regressive; it's swimming against the tide of the transition. GHC's
change is certainly the right change to make for most programs, that are
not like rm. And so most programs and libraries won't use RawFilePath.
This risks leaving a program that does a fish out of water.

At this point, I'm inclined to make git-annex support only unicode (or the
system encoding). That's easy. And maybe have a branch that uses
RawFilePath, in a hackish and type-unsafe way, with no guarantees
of correctness, for those who really need it.

Partly as a followup to a Github survey, and partly because
I had a free evening and the need to write more haskell code, any haskell
code, I present to you,
github-backup.

github-backup is a simple tool you run in a git repository you cloned from
Github. It backs up everything Github knows about the repository, including
other forks, issues, comments, milestones, pull requests, and watchers.

This is all stored in the repository, as regular files, on a "github"
branch.

Available in Cabal now, in Debian maybe if someone packages
haskell-github.

Hard to believe I've consumed all of 1981's Usenet posts now on
olduse.net, and it's been running for 7 months
already.

Last night, there was a "very long"
post, describing nearly
every node on usenet in 1982. There had been a warning about this post the
day before, since it would take many sites half an hour to download
at 300 baud. It was handily formatted as a shell script, which created
per-node files.

So, I ran this code nobody has run since 1982. It worked. I got files. I
tossed them on the olduse.net wiki, and used some ikiwiki
code TOVA contracted me to write just a few months ago, to make
clickable links on my usenet map.

The map data was contributed in another post a while back. By 1982, usenet is
getting nearly impossible to map with 1982 technology of ascii art. I enjoyed
throwing graphviz, git, wikis, and the web at it.

So, we have a collaboration across time, me and "Mark" and a lot of
people who described their usenet nodes and piles of technology
that make creating a mashup easy. Awesome!

I blog about stuff I find on the olduse.net
blog. It's an open blog;
Koldfront also blogs there, and we welcome other
bloggers.

Some of the highlights for me have included:

As the space shuttle program is winding down, reading the excitement about
the first shuttle flights, and the play-by-play coverage of a launch,
posted to net.columbia by a high school student borrowing his dad's
account. (A usegroup name that's hard to read without remembering
its fate).

The announcements of the Motorola M68k, the IBM PC, and the CD-ROM.

Reading the TCP-IP digest, and Postel's plans for launching IPv4 soon,
while the world IPv6 launch is being
planned now. (The nay-sayers are especially fun to read. Including the
guy who was concerned about the address space size, in 1981!)

The general development of usenet. B-news being rolled out, groups
proliferating, many first inklings of what will be major problems
and developments in 5 or 10 years. A shift in tone is already apparent,
by now usenet is not only about announcements, there are already some flames.