Every so often, you have an idea for a little project that's both useful and fun
to implement (but mainly fun), and you get pleasantly surprised when you
manage to knock it out in a day.
stickytape is one of
those little projects. You can use it to take a Python script that depends
on a load of pure-Python modules, and convert it into a single-file Python
script. Admittedly, it's all one big hack -- the single-file script writes out
all of its dependencies to temporary files before running the original script.
It does the job for my purposes though, and it's much quicker and smaller than
setting up or copying virtualenvs.

All you need to do it point it at the script you want to transform, and where
stickytape should be searching for packages and modules:

As of 11 August 2012,
the Shed programming language is self-hosting. That is, the compiler written in
Shed could successfully compile itself, and also pass of its tests. So pleased
was I at the time, I apparently forgot to note the date, so I had to go and look
back at the commit logs to find the date that it happened.

Shed might be able to compile itself, but the implementation of the compiler isn't anywhere near finished.
To be able to reach this point as quickly as sensible, I've implemented a fairly
minimal subset of the language and the compiler. The next big step is to implement
type-checking in the compiler, along with corresponding language features
such as interfaces.

Since maintaining two separate compilers for the same language seems
redundant, not to mention time-consuming and tedious, I've decided to deprecate
the JavaScript compiler in favour of the Shed compiler. It also forces the
Shed compiler to be of higher quality – I've already added line and
character numbers to error messages, something that the JavaScript compiler had
and I quickly came to miss. Trying to hunt down syntax errors given only
the name of the file and the expected/actual token gets frustrating quickly.

The next couple of things to work on will probably be to add reference resolution,
and to modify the parser to make new lines significant so that I can get rid of
all those semicolons.

The format of the URI is the name of the VCS (version control system), then
a plus character, and then the URL of the repository. Optionally, you can
include a specific revision by appending a hash and an identifier for
a specific revision:

In Shed, we sometimes define functions that usually delegate to a method,
but also have some special cases or defaults if that method isn't implemented.
For instance, the function represent should produce a string
representation of an object. If the argument implements the method
represent, it calls that method. Otherwise, it uses the name
of the class of the object to generate the string. In code:

This code won't compile since we're calling represent with a
single argument, value, but within the scope of the class,
represent refers to the zero-argument function that implements
represent specifically for Some.

There are several possible solutions to this problem, but the simplest one
seems to be to use a different name for the method than for the corresponding
function. For consistency, we can introduce the convention that the method name should be a simple variation on the
function name. For instance, we might choose to use a leading underscore:

Although the leading underscore is perhaps a little ugly, that ugliness does
help to reinforce the idea that you shouldn't be calling
the method _represent directly. Instead, you should be using
the represent method. More generally, instead of calling a method
_foo, you should be calling foo (unless you're
actually implementing foo).

Note: this post assumes you already have some familiarity with
applicative functors

In this post, I'll show how to implement applicative functors in JavaScript,
specifically for options, and then show an alternative formulation that's
arguably better suited to languages that generally have uncurried functions
(that is, languages that tend to have functions that accept multiple arguments
rather than a single argument).

First of all, let's implement the option type (otherwise known as the maybe
type) in JavaScript as a functor:

Note that the use of the functor required us to curry the add
function. This isn't a problem in functional languages such as Haskell, since
functions tend to be curried by default. However, in languages that usually
define functions to have multiple arguments (uncurried languages, for short), such as JavaScript, things
get a little untidy.

My understanding of applicative functors is that they allow functors, or
rather map, to be generalised to functions that accept more than
one argument, such as add. Therefore, in an uncurried language,
we might imagine the following cleaner API:

Interestingly, the implementation of applyFunctorUncurried is most
easily expressed in terms of the original applyFunctor. I've found
cases like this explain why functional languages tend to favour curried
functions: it often makes the implementation of higher-order functions
such as applyFunctor much more straightforward.

This raises an interesting question: are these two formulations of
applyFunctor of equal power? That is, is it possible to implement
each in terms of the other? It's straightforward to see that we can implement
applyFunctorUncurried in terms of applyFunctor since
it's precisely the implementation above. What about implementing applyFunctor
in terms of applyFunctorUncurried? This turns out to be
pretty straightforward too:

Please let me know if you spot mistakes in any of the above -- I've not
exactly been rigorous in proof!

I'd be curious to know if there are any languages that include the
alternative formulation of applyFunctor, and whether there are
common cases where the original formulation is preferable even in uncurried
languages.

The problem with a smooth development process is that every day is pretty
much the same as the last. You might be writing great code and solving
interesting problems with other passionate people, but constantly
working on the same thing can begin to feel dull or even frustrating. By
having a silky-smooth development process with reliable code and regular releases,
you've removed those natural peaks and troughs, like the high of fixing another
critical bug in production before you head home and crash.
I think it was Steve Freeman
who once mentioned that sometimes it's valuable to put some of those peaks
and troughs back in, but preferably without putting critical bugs back in.

For instance, I like the idea of spending one day a week
working on unprioritised work. It might be that the developers are keen to try
out a new rendering architecture that'll halve page load times, or that
there's a piece of code that can be turned into a separate library that'll
be useful on other projects. Maybe there's a little visual bug that's never
going to be deemed important enough to be prioritised, but a developer
takes enough pride in their work to spend half an hour fixing it. This feels like
a peak to me: there's a lot of value to the product in polishing
the user experience, in refactoring the code, and trying out risky ideas, and
the developers get to scratch some of their own itches.

However, it's regularity can make it feel routine, and you're still working
on the same product. As useful as these small, regular peaks and troughs are, I think
you also need the occasional Everest. Maybe it's saying “This week, I'm going
to try something I've never tried before that's completely unrelated to the
project”. Or perhaps you need a Grand Canyon: “Today, we're just going to
concentrate on being better programmers by doing a code retreat”.
Finding something that works is hard, and you can't even reuse the same idea
too much without risking its value as an artificial peak or
trough. But I think it's important to keep trying. You don't just want a project
and its team to be alive: you need them to be invigorated.

Many advocates of functional programming suggest that the concept of state,
the idea that a value can change and mutate over time, makes reasoning
about your program much harder, leading to more bugs. Most languages allow some
form of mutability, and can therefore implement both functional and imperative
algorithms, even if the preference is strongly towards immutability.
In a completely pure functional language, mutability is
entirely removed. Since some concepts are arguably easier to understand and
implement when using mutable state, this can mean certain problems are
harder to solve in a purely functional language. But what if we allowed a limited
form of mutability in such a way that we still preserve many of the nicer
properties of functional programming, such as referential transparency?

To take a simple example: suppose we want to append an item to the end of a
list. In an imperative language, we might write something like this:

list.append("first")

so now list has an extra item, meaning that the original value
of list no longer exists. In a functional programming
language, we'd create a new value instead of mutating the original list:

val longerList = list.append("first")

We can now use both list and longerList, since
list was not modified during the append. This means we never need
to reason about what state list is in – its value never changes.
The trade-off is that a functional append tends to be more expensive than an
imperative append. If we don't actually want to use list again, then this is
arguably a bad trade-off. What if we could allow the list to be mutated under
the covers, but still be able to present a programming model that appears to
preserve immutability? So, we write the same code:

val longerList = list.append("first")

but list is now implemented as a mutable list. The compiler must
now ensure that list is never used after the append operation.
This means the actual implementation is effectively the same as when written
in an imperative style, but we ensure that whenever we change the value of an
object, we also change the name used to access it.

This approach does have some severe limitations. For instance, sharing
mutable state between many objects is likely to be impossible. If we allowed
mutable state to be shared, then mutating that state inside one object would
require marking all objects that hold that state to be unusable. In general,
having the compiler keep track of this is likely to be unfeasible.

Yet this sharing of mutable state is arguably the worst form of mutablility.
It means that changing something in one part of your system could change something
in another far away part of the system. This idea of changing
the name whenever we change the value is most useful for mutability in the
small, when we just want to implement a particular algorithm efficiently.

However, there still might cases where you'd quite reasonably want to share
mutable state between, say, just two objects. The more interesting question is:
is it possible to handle this case without requiring the user to write an
excessive number of hints to the compiler?

HTML has the <noscript> tag for when you want an element to be displayed if
and only if JavaScript is disabled, but what if you want the opposite? How do
you display an element if and only JavaScript is enabled? I came
across a
rather tidy solution on StackOverflow. In the <head>,
we add the following:

The advantage of this solution over others is that there's no delay. Most other
solutions hide the relevant elements by default, and then use JavaScript to show
them, but this means that the elements are hidden until that piece of
JavaScript fires. However, in some cases this is desirable. For instance, suppose an
element does nothing until some JavaScript hooks up an onclick handler. Showing
that element before the onclick handler is added might be frustrating since
clicking the element would do nothing.

The problem

The expression is a tricky problem in many languages that asks:
given a set of functions that operate over a set of types, how
do we allow both the set of functions and the set of types that
those functions operate over be extended without losing type
safety?

Abstract data types

Let's say we have the abstract syntax tree (AST) for a simple
mathematical language that contains literals and additions. In
ML, we can represent a node with the abstract data type (ADT)
node which has two data constructors,
LiteralNode and AddNode:

datatype node
= LiteralNode of real
| AddNode of node * node

We can then define a function evaluate that turns
the AST into a single number.

Note that evaluate is type-safe since it handles
all possible instances of node. Now, suppose we
want to add another operation over nodes, say to turn the AST
into a string. Using ADTs, this is simply a case of adding
another function. Importantly, this doesn't require any
modification to the existing source code.

The problem arises when we decide that we'd like a variant of
our mathematical language with the negation operator. We'd like
to be able to evaluate this extension of our mathematical
language, but we're not concerned with turning negations into
strings. There's no straightforward way of achieving this using
ADTs -- we're forced to add another data constructor to
node, which may not be possible if we don't own
the original source code. We also add the appropriate case to
evaluate.

Even if we can modify our definition of node, we
still have a problem: we can no longer safely create functions
that operate over our original language. Consider the function
nodeToString: since it no longer exhaustively
matches all possible instances of node, it's not
type-safe. To restore type safety, we're forced to update it
to handle the case of NegateNode:

In general, ADTs make it easy to add extra functions that
operate over existing data types, but difficult to safely
extend those data types. Now, let's take a look at the same
problem in an object-orientated language, specifically Java.

Object-orientation: interfaces and classes

We begin by defining the interface Node, along
with two implementations, AddNode and
LiteralNode:

Unlike our approach with ADTs in ML, extending our language to
support negation is straightforward. We simply add another
implementation of Node, which doesn't require any
modification of the original source code.

Unfortunately, safely adding another operation on nodes
requires us to modify the original source code for
Node, which may not always be possible. In our
case, we want to be able to turn our original language of
add and literal nodes into strings, so we need to add a
nodeToString method on both the Node
interface and the classes AddNode and
LiteralNode:

Even if we can modify the original source code, by modifying
the interface, we've forced all implementations of
Node to implement nodeToString even
though we only ever wanted to use such an operation on our
original add and literal nodes. In particular, we're forced to
add nodeToString to NegateNode:

By using methods on interfaces, we have the opposite problem to
ADTs: adding additional types of nodes without modifying or
affecting existing code is straightforward, while it's
difficult to safely add additional operations over those nodes.

Summary

In this particular example, our ideal solution would let us:

define AddNode and LiteralNode, and an
operation evaluate over both of them.

add a third type of node, NegateNode, which
evaluate can be performed on, without modification
of the original source code.

add a second operation nodeToString over the original
set of nodes, AddNode and LiteralNode,
without modification of the original source code.

not be forced to implement nodeToString
for NegateNode.

We can express these properties more generally as being able to:

define a set of data types and operations over those data types

add additional data types that can have the same operations applied
to them, without modification of the original source code.

add additional operations over those data types, without modification
of the original source code.

add these additional data types and operations independently. That
is, if an extension ExtData adds a data type D, and
another extension ExtOp adds an operation Op, we
should be able to safely use both extensions without implementing the
operation Op for the data type D, although we may
choose to do so if we want to apply Op to D.

The expression is a tricky problem in many languages that asks: given a set of functions that operate over a set of types, how do we allow both the set of functions and the set of types that those functions operate over be extended without losing type safety? If you're not familiar with the problem, I recommend reading the explanation by the author of Magpie. For our purposes, we'll use an abstract syntax tree for mathematical expressions as our data type. To start, let's have two sorts of node: addition operators and literals.

(As an aside: due to the design of the language, we can't give the arguments to a class the same name as it's getter, for instance def value fun() => value, since the body of the function would refer to the function rather than the class argument. Prepending each of the arguments with my is a poor solution, and although I have several ideas on how to rectify this, I'm still pondering on the simplest, cleanest solution.)

Suppose we want to implement a function evaluate that evaluates the expression to a single value. Our first attempt at an implementation might look like this:

There's one immediate with this solution: it's not type-safe. If somebody adds another implementation of Node, then evaluate no longer covers all possible cases. The solution to this problem is to define a union type:

This makes evaluate type-safe, but has had the unintended consequence of making evaluateAdd unsafe: add.left() and add.right() both have the type Node, yet evaluate only accepts the narrower type StandardNode. We fix this by adding type parameters to AddNode:

ExtendedNode cannot reuse the definition of StandardNode since AddNode[ExtendedNode] is a subtype of ExtendedNode but not a subtype of StandardNode. The solution is to introduce another type parameter, this time on StandardNode and ExtendedNode:

But this doesn't work either: we need to specify the type parameter to the second reference to StandardNode, which is StandardNode, which also requires a type parameter... and so on. The solution is to add yet more types that fix the type parameter to themselves:

It seems reasonable to write evaluateNegate, but the definition of evaluateAddExtended seems virtually the same as before. The difference is the type parameter for AddNode, and the function we use to evaluate the sub-nodes. So, we introduce a type parameter and argument to abstract both:

Hopefully you can now see how you'd extend the solution to include further
node types. Although not covered here, it's also possible to create functions
or classes to help combine evaluators, and functions generally written in
this style with a bit less boilerplate.

If we imagine an ideal solution to the expression problem, we might argue
that this solution is a little verbose, and I'd be inclined to agree.
The question is: is it unnecessarily verbose? There's an argument to be made
that this exposes the essential complexity of solving the expression problem.
Other less verbose solutions hide rather than remove this complexity. On the one
hand, this allows one to express the same ideas more succinctly without being
cluttered with the machinery of how the solution is achieved, compared to the
solution I just described where we have to constantly pass around the type
parameter T and evaluator argument. On the other
hand, if you want to understand what's going on, you don't have to look
very far since everything is explicitly passed around.

On the whole, I think it's simpler than some solutions I've seen to
the expression problem, and the verbosity isn't all-consuming. Pretty good
for a first go, I reckon.