Friday, August 31, 2012

I hit up the Coding Dojo again this week, and we switched it up a little bit.

This time, we decided to split into two groups; one using Emacs[1] and one using a simple, OS X editor.

Yes, I already pointed out what a bad idea this was, but to no avail. We have at least one person there who really really wants to use Emacs, so that's that I guess. Anyway, given that the group has its share of vim users, and its share of various IDE users, we were bound to get into a ribbing match by the end.

I may or may not have referred to the OS X setup as "Lowest Common Denominator", and then had to explain that I didn't mean it in a bad way. That told me that there's another part of my internal state that I assume is common knowledge, but may not be. If this is already part of your experience, you'll know within the next two sentences or so, and at that point you can skip the rest of the article knowing you're not missing much.

Here's the trade-off you make when you choose your environment, or customize it, or (if you're really hardcore) build your own: ease-of-use vs. worth-learning. That's NOT a fancy way of saying "you fuckers are lazy for not learning #{my_editor}". There's an actual trade there.

Ease of Use

When you sit down at this environment, it will be easy to pick up. You may need to learn one or two new keystrokes, and you may need to toggle one or two options to make it a bit comfortable. You will never have it explode on you. You'll never have to make use of its built-in auto-debugger, which it probably doesn't have, because it doesn't ship with the source code.

You'll also never really bend it to your will, which means that you won't be coding as fast as you can possibly be coding. You'll need to make peace with the fact that it just plain won't let you do certain things, or force you to do certain repetitive things manually, and that you'll need to use external tools for certain pieces of your workflow. Assuming you choose to live with it.

Worth Learning

You can not pick up this environment in a day or two. It will take you weeks or months. It has substantially different keybindings than general-purpose editors because it is or includes a special-purpose editor. You need to go through a lot of configuration before you get it feeling just right. You may need to change your keyboard layout slightly, and/or write up a few custom modules. You will see the debugger, and you will say "Thank fucking god that I have access to this", because you will need it.

You will likely be able to pull source code for it, and it will likely have its own modification language/framework[2]

In the long run, it will make you much more productive. Noticeably more productive. You will show someone how you work, and their reaction will be "How the hell did you do that?".

Where It Matters

If you're a professional programmer, and actually want to be effective at it, I'd argue that it's a mistake not to pick the second option for your solo programming time. Note that this doesn't mean "pick Emacs". I did, but that's mainly because of the languages I use. vi/vim, Eclipse, leksah, jEdit, or whatever might make as much sense for you. Gedit, or Notepad++ or similar doesn't cut it here. And if there are any mutants out there coding in Word or something, just stay away from me, because I will not be able to veil my contempt.

The reason it makes sense there is that you're the only one whose effectiveness you need to worry about maximizing. That means that you can optimize the hell out of it without regard for the learning curve, or the portability[3].

First, you're not trying to maximize your own throughput, but the throughput of the group. It is a sub-optimal outcome if one of you can hit 10 lines per minute, and the rest can't even get to one. That means you have two options

Set up a standardized environment that everyone in the group agrees to, send out a setup script for whatever platform, and have everyone practice outside of the event

Set up a minimal environment with a very short learning curve so that everyone can pick it up without practice, and go back to their own customized environments otherwise

This is why I'm against using Emacs for physical, social coding unless it's with a complete group of Emacs users. You'll be handicapping some people pretty severely for no relevant benefit. In fact, unless you set up the vanilla Emacs distro, you'll be handicapping everyone for no relevant benefit, because every Emacs setup tends to be set up in its own way.

So yeah. "Lowest Common Denominator" is what you want here.

Footnotes

1 - [back] - On GNU/Linux, but this is incidental; I put together the Emacs environment on my machine, and I happen to be a Debian user. We never used anything other than Emacs, so the fact that I use a Tiling WM never really came up.

2 - [back] - Good language and framework optional; Elisp seems to be at the upper end of the curve these days, and it's not a particularly stellar language. Lack of namespace management gets pretty annoying after a while.

3 - [back] - Though you probably should keep a setup script somewhere to make it easier for yourself to re-install if necessary.

We haven't solved it yet, but we're getting there. Half the point is getting to know the language, and the TDD technique, so it's not as though failing to get to the end is the worst possible thing, really. I'm warming to the language, but not the technique (more on that next time).

We were supposed to have a dojo github page, but there doesn't seem to be a link going out from the meetup, and I can't find it after ten minutes of determined googling, so I can't point you to it. I have, however taken first stabs at the problem in three languages and want to go over the problem a bit.

Fundamentally, it's a sorting problem. We have cards, whose relevant properties are a rank and a suit. We have an ordered set of hand types, each of which have their own tie-braking method with other hands of the same type. The task, near as I can tell, is taking a pair of hands, figuring out their types, then sorting them to find out the winner[2].

The constructs we need to represent here are ranks, suits, cards (which is just a (rank suit) combo) and hands (which are just lists of cards). Here's my first first stab in Common Lisp[3].

Not bad for about 20 minutes of work. I punt on the break-tie method at the bottom there, opting to just compare high cards until someone wins. Like I said, that really should be doing something else; for instance, if we have two three-of-a-kind hands, we'd want to compare the set of three as opposed to the high cards. Once we've got the hands read into an easier format, we can test flush-p, which takes a list of cards and checks if they've all got the same suit, and straight-p, which takes a list of cards and checks if they constitute a run.

read-card takes a two-character string and returns a new card based on it. A card is just a rank attached to a suit. read-hand takes the specified hand string format, and returns a list of cards from it. Finally, we've got hand-type-> and hand->, which compare hand types and hands respectively[4].

It's minimal, and it doesn't really solve the problem, but I'm already familiar with the CL way of doing things, so I didn't want to spend any more time on this one than I really had to.

The Clojure version took me a bit longer since I'm still at the stage of having to code with a reference open, and I don't even have clojure-slime set up to give me argument hints. As I assumed though; there aren't really big conceptual differences between this one and the CL version. It's more compact by about 20 lines, but that's almost entirely due to the fact that Clojure has built-in range and group-by functions, which I had to define myself in the previous take.

The only other real difference is that there aren't any classes here, since Clojure encourages map and vector use instead. That's helped a bit by implicit indexing[5] and lambda shorthand[6]. Note that this already handles card names, rather than just ranks.

partial is what Clojure calls currying, and those three functions are there for readability in the hand-type body.

The part that I'm pointedly not showing here because it would be really boring, is the ~60 line set of test cases the group wrote up for this little program, as part of the construction process. Mostly, they were things like making sure that the read functions returned appropriate values from appropriate-looking strings, and specifying the basic functionality of how different hand types are coordinated and ranked.

Haskell is... odd. It's up there in the language bar because I poke at it rather vigorously with some frequency, but I've yet to do anything serious with it. I like it, but I always get the feeling that it doesn't like me very much.

This one took me a while. I'd bet it was between three and four hours. First, re-reading some of the documentation I'd already gone through as a refresher, then going through a bunch of reference docs to find particular function names[7], and finally writing the actual program.

It contains a few lines more than the Common Lisp solution, and about 20 more than the Clojure piece, but I'll cut it some slack for two reasons in this case. First, because those type signatures and declarations effectively replace between 90% and 95% of those boring test cases I mentioned. And second, because unlike the Lisp approaches, this one is complete apart from printing the output and one piece of input procedure.

That is, if you hand it a pair of hand strings and run compare, you'll get back the correct answer, down to the last tie breaker[8].

I use instance Read to declare readers for Rank, just but derive Read on Suit outright. Those two compose to let us read Cards and Hands as well. All of these types derive Ord, because the whole point is sorting them, and rank also derives Bounded and Enum so that I have an easier time of expressing a range of cards.

Once all the types are declared, the rest of the program just kind of falls out. You can see more or less the same flush and straight detectors, and even the same structure in getHandRank (except that it's named differently).

What you don't see is any boilerplate surrounding hand comparisons. Or, in fact, any comparison functions at all. We sort cards twice[9], but that's it. Because those types are defined deriving among other things Ord, you can use all the standard comparison operators to do the rest.

I was going to say a few proper words comparing the approaches and languages here, but this piece is already quite a bit longer than I'd like it to be. It'll have to wait for next time[10].

Footnotes

1 - [back] - Heads up if you were planning on joining us, by the way, they're holding a poll on what day next weeks' meeting should be held. If you weren't there yet, and your reason was "I'm not free that day", you may want to give your opinion a voice.

2 - [back] - There's also a bit of incidental complexity around displaying the winners after that, that I'll ignore for now.

3 - [back] - It's what I'm comfortable with. Also, note that all these tries were written before I started writing this post, so they have less thought in them than they otherwise might.

4 - [back] - I only implemented one direction, since the problem at hand doesn't call for more.

6 - [back] - As seen in group-of?, count-sets-of and probably a couple of other places.

7 - [back] - Hoogle helps immensely once you get your head around the type system, but I'd really like to have access to it on my local machine, along with proper auto-completion and type signature hinting.

8 - [back] - Just as an aside though, I have no idea what order suits are actually supposed to go in, so I arbitrarily picked H | C | D | S, even though that's almost certainly wrong. Don't hold that against the program, or the tools, that's just me being a not poker player.

For the past two weeks, we've been (unsuccessfully so far, but no one is about to give up yet) trying to run through the poker hand kata in Clojure. Half the point here is trying out the language, and I've successfully procrastinated until they got a fantastic, standardized build system going so that I don't have to fuck around installing libraries by hand, which seems like it'll be very gratifying after the bunch of time spent in the Erlang world lately.

Installing Clojure

Clojure the debian package is actually not in the free repos. You canapt-get install clojure, but only after adding contribandnon-free to your sources.list, which I don't particularly want to do. In case you haven't noticed yet, I'm the sort of person who occasionally runs vrms, just to make sure. It turns out though, that the Clojure build tool can handle the task of installing the language for you, and provide faux-quicklisp/quickproject functionality andis in the free repos as of wheezy. So, one

apt-get install leiningen

later had me on my feet. Or part of the way, at least. That install gives you lein new and lein repl, but doesn't by itself set up a development environment. In order to do that, I also had to lein plugin install swank-clojure, and shove clojure-mode into my .emacs. At that point, I was ostensibly ready to start on a project, but SLIME and swank-clojure weren't playing nice for whatever reason. I still haven't puzzled it out, but the best idea any docs gave me was that Clojure really doesn't want you to have your own swank installed, thank you very much.

Given that I'm a professional Common Lisper these days, I had exactly zero chance of following that instruction. Instead, I wired up clojure-mode to use the inferior-lisp option by adding the following additional code to my .emacs

After all that, run-lisp in a Clojure buffer will start up a Clojure REPL, and the keyboard shortcuts I'm used to from common-lisp-mode will more or less work as before. clojure-run-test is mind-numbingly slow, and I don't get completions or arglist hints, but it's good enough for a start.

Trying Clojure

The first thought that struck me was "Wait a minute, this looks a hell of a lot like Scheme". And really, that turns out to be pretty on the money, from what I can see so far at least. Clojure is a JVM Scheme with curlies, brackets, an Arc-esque obsession with counting characters needed in the source code, and heavy emphasis on immutability. That was bolded because, if you're in a hurry, you can basically stop reading now. If I were to offer advice about whether to learn it or not, I'd say

if you need to do any extensive work on the JVM, use Clojure, it beats the alternatives

if you don't know a Lisp yet, Clojure is a reasonable choice for your first[3]

if you already know Scheme or Common Lisp, and are comfortable with it, and don't go in for this JVM nonsense, don't bother learning Clojure because it'll teach you nothing new in the Perlis sense

The differences are mostly in minutia, rather than the general principles of the language. I'll go through the few that are obvious from cursory poking, but if you're interested at all, you should take in Clojure for Lisp Programmers Part 1 and Part 2, in which Rich Hickey tells you basically everything I'm about to and a few more things besides.

There are probably bigger differences than the ones I'll point out, consider this a "preliminary impressions" note, because I've yet to do anything more serious than an attempt at that poker hand kata.

Different Truth/Falsity Values Clojure has an explicit true and false. nil and the empty list are not equivalent[4], and you're free to define one-letter local variables that designate time, traffic or totals. That's different from both CL and Scheme, and I'm sort of leaning towards calling it frivolous, but I'll see how it works out in practice[5].

No Separate Function Namespace Clojure cribs from Scheme here. A single function/variable namespace means you don't need to use #', and it means you don't need separate let/flet. Oddly, there are two define forms[6], but it's otherwise closer to the Scheme way of doing things.

Fewer Parentheses I'm talking about let and cond bodies here. CL and Scheme both have you delimit each pair in an additional set of parens, while Clojure doesn't. This might make transpose-sexps a bit weirder on their clauses, but reduces the amount of typing you need to do by a tiny amount in the general case.

Polymorphic Built-Ins The general equality test in Clojure is =, unlike CL or Scheme where you need to pick between =, eq, eql, etc. first, last, map and many others also work generically on sequences rather than just on lists.

Vectors Everywhere[1 2 3] is "the vector of 1, 2, 3" rather than a list. Because of the polymorphic thing above, this doesn't introduce as much syntactic complexity as you'd think, and it means you don't need to worry about which end of a list you're taking from. Argument lists are all vectors rather than lists.

Destructuring By Default I'm pretty used to whipping out destructuring-bind in Common Lisp because it's sometimes the most straightforward way of expressing something. I don't use it nearly as often as often in CL as I do in Python or Erlang just because it doesn't save typing in nearly as many situations given what the construct looks like[7]. In Clojure, you can do something like

Curlies and Brackets Obviously. It's not as though CL doesn't have them, but they tend to get used very sparingly as part of reader macros. Clojure uses curlies to designate hash-maps/sets and [] to designate (among other things) vectors. Personally, I don't miss the JavaScript/jQuery matching hell that comes with nesting all three of them, but they don't seem to be mutually nesting in a lot of places, and paredit helps a lot anyway.

Whitespace Commas The quote and backquote still work as expected, but the "unquote" modifier is ~ rather than ,. This is another one that I see as frivolous, though I guess it could reduce cognitive friction for people who are used to delimiting lists with things other than spaces.

Two bigger ones that I feel the need to call out more prominently because I like them are multimethods and dochashes.

If you're a Common Lisper, you're already used to multimethods. What's different about them in Clojure is that the generic function declaration takes a dispatch function. Which means that you can specialize methods on arbitrary properties, rather than just types. In Common Lisp, I occasionally have to declare a class for something just so that I can define methods for it, even if the thing I'm dispatching on really makes more sense as a slot than a class. The Clojure approach would save me code in these places.

Doc hashes are severely beefed up docstrings. Or, you could think of them as programming-by-contract-lite, I guess. You still have the option of doing the usual docstring thing

You can define inline tests too, if you want, but it's probably better to keep those in a separate test file. The static typists among you are probably snickering at this, but I like it better because these are optional. You don't want them on every function ever, you just want them on the potentially confusing functions, whose existence you should be trying to minimize. This is one step closer to getting code and documentation to coexist peacefully.

Footnotes

2 - [back] - Which is actually a lot less painful with functional programming in general than it seemed to be for the various Java/PHP teams I've had the pleasure of UI-ing for.

3 - [back] - Because it has the elegance of Scheme, combined with the production presence of Java meaning it'll be easier to convince your boss to let you use this than it will to let you use an actual Scheme, not that there's a lack of JVM options there.

5 - [back] - As a note, having thought about it a little more, there are a couple of places where this is the unambiguously right thing to do, and I've yet to think up a situation where it'll trip me up.

Monday, August 20, 2012

Ok, so I mentioned I was working on a new thing that involved moderation, administration and the auth system I put together as part of the Four-and-a-half-and-counting part series on Authentication. I've still got one or two left to write there, but since this "don't talk about it 'till it's done" thing worked out so well, I'm going to keep you in suspense.

The result of my toil is Nitrochan a massively-ish scalable, real-time message board system inspired by the *abas that the internet is so full of. My problem with 4chan and similar boards is that they are sort of like going to a restaurant and having a guy come by to shit on your plate every few minutes. It seems that what you'd really want[1] is a constant, flowing stream of shit that you can pan for nuggets at your leisure. And this is an attempt at that. When a new thread is started, the boards are all updated with new data. When a new message is posted, the appropriate threads move up the sort order, and people already on the thread get the new message via Comet rather than having to F5. Threads can be moderated and moved between boards through similarly soft-real-time mechanisms.

The github is there, released under the terms of the AGPL[2]. I'll have another go at setting up an instance here for my own nefarious purposes[3] later this week.

The UI layer is still somewhat incomplete for a message board; we can't designate images as spoilers/nsfw, there aren't any comment markup options yet, there's no way to proactively protect a board or thread from spam, and the RSA login process is just as manual and painful as it was the last time I discussed it.

Still, we've got a good starting point to look at in terms of putting a running system together[4].

Now then, the bad stuff.

Bad Stuff

The Erlang deployment process is really beginning to annoy the fuck out of me.

I mean, it kind of did last time too, but I figured that it would get simpler as I went on and automated pieces. That... didn't really happen. You'll note that I mentioned I'll be trying again to set up an instance of Nitrochan.

The attempt proved to be futile, even without having to wrestle with rebar again. I'm really beginning to grudge that the language designers seemed to have considered actual deployment of an app to be outside of their scope. That's a shame, because every useful application is going to need to be deployed somewhere, and doing this stuff manually gets really tedious if you rely on even two or three libraries not found in the core Erlang image. rebarwould be a good solution, from what I understand about it, assuming it did what it says on the tin. It has yet to for me.

I did learn a lot about concurrency outside of the lock/mutex world, and I appreciated the opportunity to mess around with actors on a grander scale than I would usually be permitted, but the continuing headaches aren't worth it for me so far. I may come back to it once I've recharged my mental batteries. For the next week or so, I'll be playing around with Clojure[5].

Sunday, August 12, 2012

"mop" stands for "Meta-Object Protocol", and it's a term closely related to CLOS. I've mentioned getting annoyed at a certain piece of it last time, when I needed to iterate over CLOS instance slots for some weird reason. It turns out that due to the way MOP support is implemented, this is a non-trivial thing to do portably.

Last week, I got into a situation where I needed a temporary copy of an object. What I really wanted was an object with most slots mirroring an existing instance, but with changed values in two slots. For reasons related to the layout of the surrounding code, I did not want to destructively modify the object itself because it was unclear whether the old values would be expected on a subsequent call. So I googled around a bit, and found that the situation for copying is pretty much the same as it is for iterating. There isn't a built-in, general way of making a copy of a CLOS instance, shallow or otherwise, and implementing it myself in a semi-portable way would require doing all the annoying things that I had to pull with slot iteration earlier.

So, being that I occasionally profess to be a non-idiot programmer, I figured I'd take a stab at solving the problem in a semi-satisfactory way.

That implements slot-names (which takes a CLOS instance or class and returns a list of its slot names), map-slots (which takes a (lambda (slot-name slot-value) ...) and an instance, and maps over the bound slots of that instance), shallow-copy (which does exactly what it sounds like it would do) and deep-copy (which is tricky enough that I hereby direct you to the documentation and/or code if you're sufficiently curious about it).

I did cursory testing in GNU Clisp, and fairly extensive testing (followed by some production use) in SBCL, though the :shadowing-import directive should work properly in a number of others as well.

Now, I realize that due to the kind of crap you can pull using CLOS by design, this isn't a complete solution. That said, it did solve the problems I was staring down, and I think I've made it portable/extensible enough that you'll be able to do more or less what you want in a straight-forward way. For basic use cases, it solves the problem outright, which should save me a bit of time in the coming weeks. For more complex cases, each of the exported symbols is a method, which means you can easily def your own if you need to treat a certain class differently from others.

Firstly, because those terms are already loaded with enough political and emotional baggage that people are going to have a hard time letting go[1], and that's going to lead to[2] the same kind of partisan garbage that US politics is well known for.

Secondly, because partitioning any group of people into two explicit, conflicting sides is hands down the worst way of easing/preventing/reducing conflict within that group. Ostensibly, that's what he's trying to do with the thought framework; point out that certain things are a matter of preference rather than points of debate, and that we should therefore stop arguing about them. Something tells me the actual effect of this conceptual framework will lead to a different outcome[3]. I've read comments calling the opposition to this classification scheme "weird", and I have to wonder why. It's divisive, pretty much by definition. The fact that certain pieces of it are correct doesn't make it worth keeping in its entirety, and in any case...

Thirdly, the underlying properties he presents are, for the most part, not a matter of preference. He sort of presents them that way, but I disagree at that level. Hell, lets do a blow by blow. here are the points he defines as principles of software conservatives.

Software should aim to be bug free before it launches...

Programmers should be protected from errors...

Programmers have difficulty learning new syntax...

Production code must be safety-checked by a compiler...

Data stores must adhere to a well-defined, published schema...

Public interfaces should be rigorously modeled...

Production systems should never have dangerous or risky back-doors...

If there is ANY doubt as to the safety of a component, it cannot be allowed in production ...

Fast is better than slow. Everyone hates slow code. Code should perform well. You should engineer all your code for optimum speed up front, right out of the box...

The software liberals supposedly have the inverse principles. He makes them explicit in his entry, but I won't bother to quote them here. Note that points 1, 4, 5, 6, 7, 8 and 9 have not a fucking thing to do with personal preference. They're things that make sense in some contexts, and not in others. Some programmers really, really like having error prevention in the form of a restricted language (#2), and some really hate learning new syntax (#3), but the rest of these "principles" involve trade-offs that sometimes make sense and are sometimes retarded. Should All software aim to be bug free? Should production code All be checked by a compiler? Should production systems Never have back-doors? We actually can't know the right answer in general, from a static analysis at least. At the risk of being painted as a godless, sissy liberal in the wake of Yegge's proposal, we need to take a look at the run-time environment.

Your high-frequency trading software or your Air-Data/Inertial Reference Unit, or your cardiac implant firmware had damn well better be bug free, and rigorously modeled AND compiler checked AND free of back-doors AND not allowed anywhere near production if they're even suspected of incorrectness. When the stakes are billions, or lives, eating the cost of a more extensive and rigorous development process makes sense[4]. On the flip-side, when we're dealing with a situation where the software is replacing an already buggy manual process that no lives or life savings depend on, no one is going to care about a complication. Likewise, there isn't a benefit to taking weeks to prevent a bug that you can hotfix in days or hours. Finally, if the cost of a rollback or upgrade is close enough to trivial, you can be forgiven for taking more risks than you otherwise would.

This is not what a preference looks like; it makes sense sometimes and not others, and a correct one can be chosen based on context. A preference is something that there really isn't a "correct" way of thinking about. Something that we have to accept because it's atomic. So even if globally bifurcating the industry would lead to some new insight[5], and even if that insight would improve inter-programmer relations[6], these aren't the axes to do it on.

So there.

Steven... I disagree. And I won't be adopting your thought framework until you consider filtering out your projections.

Footnotes

1 - [back] - If you take a look at the HN, /. and G+ discussions, you'll already see people conflating the political meanings with the proposed software-oriented labels. Less so on slashdot, where most seem to simply dismiss the point of view, but there's a comment on the Google Plus page that reads

Dynamic typing has been shown through research to reduce maintainability compared to static typing. Lars Ivar Igesund

The research was done by a friend of mine while working at one of those famous, private research centers (yes, one you've heard of), but to my knowledge it has not been released. I don't remember the statically typed language used in the study, but I Imagine it was Java. The dynamically typed language was Ruby. This I can't point you to it, I just hope that you believe me when I tell you the conclusion of it. It certainly jives with mine experiences.Lars Ivar Igesund

That's about what I was expecting; "This guy I hang out with told me my opinion was totally right". Oh, by the way, 16 upvotes, or plusses, or whatever the fuck. Never-mind the fact that a methodology isn't outlined, or that the definition of "maintainability" isn't mentioned, or that the languages involved are "I Imagine ... Java" and Ruby, or that we don't know if/how the researcher controlled for differences among teams/programmers/projects or (in case this was a single team doing to separate projects) the teams' innate preferences/learning over the course of the experiment.

2 - [back] - Actually, as you can see by the previous note, "is already leading to" would be more accurate. Hell, there's already a guy out there calling himself a "Software Libertarian", and we haven't even gotten through Software Ayn Rand yet. That's some leapfrogging right there.

3 - [back] - I believe that may be the second time I've linked that comic this month.

Sunday, August 5, 2012

Just a short update this time, involving things I keep stubbing my toe on in Lisp and Erlang.

Common Lisp is not Object Oriented

The object orientation support is bugging me again. Not just me, either[1], because a bunch of modules I've been making use of lately have functions with names like time-difference or queue-push, which is precisely what the generic functions are supposed to save you from doing. It recently annoyed the fuck out of me while putting together a simple, caching implementation of a thread-safe queue. I wanted that construct to have push, pop and length, but because those names already designate top-level functions, it's not quite as simple as declaring them.

I'm not about to be dumb enough to propose that this makes Common Lisp an unacceptable language, especially since it looks like this could easily be fixed within the spec as it exists today, and I already quasi-proposed a semi-solution. I just have to give voice to that minor frustration, and point out that what you'd really want in this situation is access to a lot of the basic CLHS symbolsas methods rather than functions. Not having this has now bitten me directly in the ass no less than twice[2], and signs that it might be worth fixing are showing in various CL libraries.

Erlang Should Be More Like JavaScript

Ok, to be fair, they're better than having to deal with plain tuples when you're working with large constructs, and they're arguably The Right Way to deal with database storage, but they're a fundamentally annoying and hacky way of implementing key/value pairs.

The problem is record sharing. Here's a thought exercise: what happens when you have a system that deals with the storage and manipulation of sets of comments[3], and a second, completely separate system which would like to consume the output of that first one in order to display these sets in interesting ways for human consumption?

If you had a real k/v construct built in, like what everyothergoddamnlanguage on this earth seems to have, what you would do is pass an instance of that construct across.

If the hash map was a fundamental data type in Erlang, you would have no problem in this situation.

But.

Records are basically tuples, wearing a bunch of reader macros and syntactic sugar. That means they're potentially faster than using a dynamic data structure for the same purpose, but it means that you can't just pass a record between two otherwise decoupled systems. If you want the same sort of behavior that you'd get out of native k/v support, you have three options I can see, and they all make me want to glare menacingly at Joe Armstrong, or at least whoever decided that records were a satisfactory solution.

Option 1: Duplicate Records

You declare the same record in both systems, then send records across.

This sucks balls because changing the record suddenly requires you to change and recompile both projects. They are not really decoupled anymore. In our theoretical example above, say we've decided that we'd really like to start tracking comments hierarchically. We need to add a pair of new fields, root and parent so that each comment can tell you which tree its part of and where in that tree it is.

Now, we can't just make this change in the model component, because if you had different record declarations in the model than the view, you'd get compiler errors. If you have multiple views trying to make use of the same model, and not all of them need the new data[4], too fucking bad, you're changing them all over anyway. This isn't even the worst case scenario, by the way. If you decide that the record shouldn't change fundamentally, but that you merely need to reorder fields, you won't even get a compiler error if you forget to change records in both places.

This is not the sort of brittleness that I expect from a key/value construct.

Option 2: Shared Records

You can write one file, lets say records.hrl, put all your record declarations in there and then include that file in both projects.

This sucks balls because now you don't actually have two decoupled projects at all. You've got one giant, mostly disjoint project with shared data declarations. It's not horrible, to be fair, but remember that having a run-time construct rather than a compile-time record system wouldn't even require this much additional planning.

Option 3: Sending Tuples or Proplists

This is the option I went with for a recent project, and I'm honestly not sure it was the right approach, but there would have been record name collisions otherwise, so whatever, I guess.

Instead of sending records between components directly, you emit a tuple from the model and consume it in the view, potentially creating an intermediate record if you need to. This has pretty much all the downsides of Option 1, except that you don't have a single record name-space to deal with. If you take the Proplist approach, it gets very slightly better because you only need to put together the one abstraction layer to do look-ups, and if you make it complete enough, you don't need to change it whenever you change the record definitions. That's still a lot more annoying than just having this force pre-resolved.

I remember writing up notes from a talk Joe gave about Erlang. One of the points he covers under the "Missing Things" heading was Hash Maps, wherein he pointed out this specific issue with the fundamental architecture of the language. In the notes, I sort of acknowledge that he has a point, but don't linger on it too long. Honestly, I was thinking that it wouldn't bite at all, let alone as hard as it actually has. Joe, if you're reading this, you were right. And for the love of god, if you've got a solution in mind, DO IT.

lists:keyfind/3 and workarounds like this aren't nearly as satisfying as just having an actual, dynamic key/value construct built into the language from the ground up.

Footnotes

1 - [back] - Though I may be the only one who's noticing enough to bitch about it.

Friday, August 3, 2012

I've got some thinking to do, and given how long the alternative was taking, it's obvious that it can't happen effectively in my head. I don't want to tell you exactly what I'm working on yet, because revealing my projects before I'm done with them results in them never getting done. Witness the detritus that already litters this blog

cl-chan took about a year and a half to get a quarter of the way I was going, whereupon I was distracted by shinies

Strifebarge was supposed to be a quick weekend project to get me back into the groove of programming after a bit of a vacation, but it's taking months and counting

auth was meant to have a working external API layer by now, as well as two-factor-authentication capability[1]

clomments was a piece that I literally planned out in its entirety and proceeded to 0.1 in about four hours, then got bored and started poking at Arduinos

cl-leet took months of planning and a week of the CL Games Competition to get to a hemi-semi-playable state[3]

Hell, the only projects I've gotten to done, for some value of "done", are the ones I never really think about as projects.

emacs-utils is sitting quietly up on github, saving me a few hours per day on various tasks.

cl-css should probably be replaced by something closer to cl-who, and stop using so many `',@s, but it gets the job done in the meanwhile.

finally, my mplayer web-frontend is still as awful as the day I threw it together, but it actually functions and lets me "control"[4] my media center from any wifi-capable device in the house.

So, given the track record of "things I talk about first" vs "things I put together first", you'll pardon me for keeping my latest exploits under my hat until I'm ready to pull the big red lever. Thing is, there's a component that I'm trying to assemble that has me unsure about direction, so sitting down and throwing those thoughts through the loopback interface seems like a good idea.

Moderation

I've talked about this before, but not exactly in the same context. How do you moderate a system? Scratch that, how do you moderate a decentralized, public system with an eye for data transmission and potentially divisive discussions, in the light of recent-era copyright rules.

In totality, that offers some interesting challenges, even if no individual component is an unsolved problem. DMCA et al basically necessitate that there be a way to permanently and completely remove a piece of information from a given server, because legal battles may result otherwise. Maybe they don't happen often in practice, but that's still not the sort of risk I'd be willing to take. Trouble is, permanent and complete deletion of information gives some odd incentives to the moderators.

Ok, actually, lets step back a bit further, I've noticed another assumption that should probably be explained.

Moderators

And that's probably not far enough.

Authority Figures

Hmm. No, it's bigger than that too.

Market-Capable Primates

Right, that's far enough back. I'll try to zoom back in as quickly as I can while at least giving some clues as to my thought process.

The interactions of MCPs is predictable in a couple of ways[5]. When you get a bunch of them talking to each other, over whatever hardware and protocol they actually decide to use, you're going to get three basic types of messages going back and forth.

messages genuinely generated by some internal state (regular discussion, *signal*)

messages generated by external forces rather than intrinsic interest ("buy these dick pills!" or "one weird old tip to whiten your small intestine!", *noise*)

Authority Figures

In order to ensure that a given forum approaches the ideal message profile, most of them vest power in authority figures. These figures tend to be present whether the forum has other ways of telling *signal* from *noise*, and I'll argue that the reason is largely because of that third category of message we've identified as being somewhat useful to MCPs. The power vested in these authority figures is largely censorship; they kill the *noise* that slips through whatever automated/cloud-based/crowd-sourced/buzzword-compliant system is in place to catch the bulk of it, and are expected to make judgment calls bout *echo*s. If a given topic is judged as being *noise*y, it's deleted, or its visibility is artificially reduced in some way.

Authority Figures in this context do a lot of their work behind closed doors, and each of them is only human. The vague hope is that either they'll be kept in check by the community that develops around them, or by other Authority Figures. In meatspace, that's not a less-than-catastrophic assumption to make, but web forums tend to be viewed as less important (or perhaps better monitored), so something different seems to be happening.

The problem with *echo*s is precisely that they demand a judgment call. One human will take a look at the weekly /r/lisp argument about newLisp/Clojure/whatever-the-new-lisp-dialect-is and hit the spam button before she gets past the first sentence. Another will take a look at the exact same conversation, wonder why they've never heard about it before, and grumble quite loudly when someone closes it. That grumble incurs a cost on the system, measured in citizen good-will; someone who had no idea about a particular discussion is effectively prevented from having it, or forced to have it somewhere else.

This is the best-case situation, mind you; Authority Figures that are doing their very best to provide a balanced community free from inbuilt bias will still occasionally trip over an *echo* and shitcan it, or accidentally mistake a *signal* for *noise*. The typical case is probably going to be worse; AFs deleting *signal* they don't agree with, or aggressively permitting *noise* they enjoy on some level.

That's the trouble with Authority Figure-based *noise* reducing systems; false positives and negatives in situations where you'd rather not have them if you can avoid it. The naive response is fine-graining that Authority.

Moderators

Instead of having a set of Authority Figures for the whole community, shard the community and set up Moderators for each shard. That should reduce pressure on each Moderator, as well as allow them to work to their strengths by moderating communities centered around things they're more than baseline passionate about. The thing is, the output of this process is still not accurate sorting of message types. Moderators still commonly delete things for reasons other than objective merit. If you disagree, spend a few hours here[8].

Moderation

And we're back. Based on the principles outlined above, it seems like the best way to avoid over-moderation-related costs on a community is to make sure that any actions moderators take are

publicly viewable in context

fully reversible

As mentioned earlier though, there needs to be a way to actually, factually delete threads posts and images for legal purposes. If you get hit upside the head with a DMCA or similar, you can't really say "Yup, we've deleted it, only our moderator community can see it now", that shit needs to be gone. Which means I'm stuck implementing both, and worrying (perhaps excessively) about the effects of the nuclear option. In other words, I want

delete; meaning, "make sure no one but mods can see this, show everyone else a deleted tag"

undelete; meaning, "oops, that wasn't supposed to happen, release that one back to the public"

purge; meaning, "nuclear option, this is either unauthorized or illegal media and it needs to go. Log who hit the kill switch, and ask them for a reason (which should ideally be a copy of the C&D that came in requesting the deletion)"

That third one implies the presence of an outside deletions system that keeps track of information about vaporized stuff without keeping it for archival and post-mortem purposes later.

Perfect! That cleared my mind a bit. I think I can see the way through now. Hopefully, this doesn't prevent me from reaching it.

Footnotes

1 - [back] - That's still coming[2], the project I'm pointedly not mentioning uses auth for the user system, and actually started as a demo project for how you'd go about hooking that up to a larger system.

2 - [back] - Though, to be perfectly fair, I've been saying that about a lot of things.

3 - [back] - Though it did result in two articles that were reasonably interesting to write.

4 - [back] - It can browse one specified directory and play one video at a time. I don't a random wifi user to be able to do anything more than that.

5 - [back] - And a few of them might extend past MC, right into Social Primates in general, but I'm thinking of a particular primate species which disproportionately tends to internet use so we don't have to cast a net quite that wide.

6 - [back] - "Forum" in the general sense, not just the kind you find on the internet.

7 - [back] - You need to talk about these things, but given how often MCPs circle back to them, it's very unlikely you have a new idea, and we definitely don't need to keep hearing about it every week.

Ruby and Erlang each come with their own modes, and recent Emacs versions ship with a built-in Python mode and shell. Smalltalk uses its own environment (though GNU Smalltalk does have its own mode), and I'd really rather not talk about PHP. If you're writing in it, chances are you're using Eclipse or an IDE anyway.