Failed Experiments

a Mr. Fleming wishes to study bugs in smelly cheese; a Polish woman wishes to sift through tons of Central African ore to find minute quantities of a substance she says will glow in the dark; a Mr. Kepler wants to hear the songs the planets sing.

Saturday, December 9, 2017

In the development environments of singly-typed languages like
JavaScript and Python, one popular build step for improving code
quality is the “linting” or “style” tool. Tools like jshint and
pyflakes point out misspellings and references to missing
variables, idioms that frequently lead to bugs like == usage, calls
with apparently the wrong number of arguments, and many other things.

Much of this is meant to mechanize the enforcement of a team’s style
guidelines—at their best, sincere and effective tactics for avoiding
common sources of bugs. Unfortunately, many of these guidelines can
seem excessively pedantic, forcing the programmer to deal with cases
that could not possibly happen.

Normally, it would make sense to tell the programmers to just suck it
up and follow the rules. However, this tactic can lead to a couple bad
outcomes.

The lint check can lose support among developers, for being more
trouble than it’s worth. If programmers feel that a lint is causing
more code quality issues than it’s solving, they’ll sensibly
support its removal.

Lints without developer support tend to be disabled sooner or
later, or simply mechanically suppressed at each point they would
be triggered, via “ignore” markers and the like. At that point, the
bug-catching benefits of the lint are completely eliminated; in the
worst case, you have a universally ignored warning, and are even
worse off than if the warning was simply disabled.

If the errors spotted by the lint are serious enough, but the lint
warns for too many exceptional cases, the developers might decide
to move the style rule to manual code review, with an opportunity
to argue for exceptional cases. This is labor-intensive, carries a
much longer feedback cycle, and makes it easy to accept an argument
for an exception to the style rule that is actually erroneous, as
it is not machine-checked.

Many of these “too many false positives” warnings could be a lot
better if they simply had more information to work with about what
will happen at runtime. That way, they can avoid emitting the warning
where, according to the extra information, a construct that appears
dangerous will not be a problem in practice.

That is one thing that type systems are very good at. So lint users
are on the right track; the solution to their woes of needless ritual
is more static analysis, rather than less.

Let’s consider some common lints in JavaScript and see how the
knowledge derived from types can improve them, reducing their false
positive rate or simply making them more broadly useful.

Suspicious truthiness tests

(This example has the benefit of a recent implementation along the
semantic lines I describe, in Flow 0.52. I take no credit for any of
the specific lint suggestions I make in this article.)

In JavaScript, a common idiom for checking whether an “object” is not
null or undefined is to use its own “truthiness”.

This is concise and does the trick perfectly—if the value being tested
isn’t possibly a number, string, or boolean. If it can be only an
object, null, or undefined, then this is fine, because even the
empty object {} is truthy, while null and undefined are both
falsey. Unfortunately, in JavaScript, other classes of data have
“falsey” values among them, such as 0 for number.

The “lint” solution to this problem is to always compare to null
directly.

if(o!=null)// not null or undefinedif(o!==null&&o!==undefined)// required in more pedantic code styles

This might be encapsulated in a function, but still doesn’t approach
the gentle idiom afforded by exploiting objects’ universal
truthiness. But the object idiom is simply unsafe if the value could
possibly be something like a number. So the question of whether you
should allow this exception to the “always compare to null
explicitly” lint boils down to “can I be sure that this can only be an
object?” And if in review you decide “yes”, this is a decision that
must constantly be revisited as code changes elsewhere change the
potential types of the expression.

You want to mechanically rule out “possible bugs” that are not really
possible in your use of the idiom, so that the linter will not warn
about benign use of truthiness—it will save the warnings for code
where it could actually be a problem. Ruling out impossible cases so
that you only need cope with the possible is just the sort of
programming job where the type system fits in perfectly.

A type system can say “this expression is definitely an object, null,
or undefined”, and type-aware linting can use that information to
allow use of the truthiness test. If that data comes from an argument,
it can enforce that callers—wherever else in the program they might
be—will not violate the premise of its allowance by passing in
numbers or something else.

A type system can also say “this expression will definitely be an
object, never null”, or vice versa, thus taking the lint even
further—it can now tell the programmer that the if check is useless,
because it will always be truthy or falsey, respectively. This is just
the sort of premise that’s incredibly hard for humans to verify across
the whole program, continuously, but is child’s play for a
type-checker.

A type system such as Flow can even say “this expression could have
been something dangerous for the truthy test, like number | object,
but you ruled out the dangerous possibilities with earlier if tests,
so it’s fine here”. Manual exemption can regress in cases like this
with something as simple as reordering “if-else if-else” chains a
little carelessly—keep in mind here the decades of failure that “be
sufficiently careful” has had as a bug-avoidance tactic in
programming—but type-aware linting will catch this right away, waking
up to declare its previous exemption from the style rule null and
void.

The more precise your types in your program, the more understanding of
your use of this idiom—in only valid places—the type-aware linter will
be. It will not make the reasoning mistakes that human review would
make when allowing its use, so you can use it with more confidence,
knowing that the lint will only call out cases where there’s a genuine
concern of a non-null falsey value slipping in. And there is no need
to argue in code review about which cases need != null and which
don’t, nor to revisit those decisions as the program evolves; as
circumstances change, the type checker will point out when the verbose
check becomes unnecessary, or when the succinct check becomes unsafe.

References to undeclared variables

It’s very common to mistake a variable name that isn’t defined for one
that is. The larger a program gets, the easier this mistake is to
make.

This mistake comes in a few forms. It may be a simple misspelling. It
may be a variable you thought was defined here, but is actually
defined somewhere else. It may be a variable defined later, but not
yet.

Whatever the meaning of the error, linters can catch the problem
via a relatively simple method. As the file is scanned, the linter
keeps track of what variables are currently in scope. When
encountering a variable reference, it checks the variable name against
its current working list of variables. If one is not in the list, the
linter reports it.

This is better than nothing. Compared to what you can do with a type
checker, though, it’s not very good at all.

Suppose that you have a few local functions defined at the top level
of your module, foo, bar, and baz. A linter will point out an
undeclared variable if you try to call fop, gar, or bax. So you
don’t have to wait for the browser reload or test cycle; you can
correct these errors right away.

Later on, your module is getting larger, so you decide to group some
functions into objects to clean up the top-level namespace. You decide
that foo, bar, and baz fit under the top-level object q.

constq={foo(...){...}bar(...){...}baz(...){...}}

During your refactoring, references to these three functions are
rewritten to q.foo, q.bar, and q.baz, respectively. This is a
nice way to avoid ad hoc name prefixes as a grouping mechanism; you’re
using a first-class language feature to do the grouping instead.

But let’s give a moment’s consideration to q.fop, q.gar, and
q.bax. The linter will verify that the reference to q is sound;
it’s declared at the module level as const q. However, the linter
will not then verify that the “member” references are sound; that is a
fact about the structure of q, not its mere existence.

When all you have is “simple” linting, this becomes a tax on
modularity, so to speak. If a variable is defined locally—very
locally—references to it are checked. If it is defined remotely,
whether in another module or simply “grouped” in a submodule,
references to it are not checked.

A type system cuts the modularity tax by tracking the names that are
beyond the purview of the simple linter. In the case of q,
type-checking tracks more than its existence; its statically-known
structural features are reqorded as part of its type.

In this module, q is defined.

It is an object,

and is known to have properties:

foo,

a function which is…;

bar,

a function which is…, and

baz,

a function which is…

This continues to work at whatever depth of recursion you like. It
works across modules, too: if I want to call baz from another
module, I can import this module in that one, perhaps as mm, and
then the reference mm.q.baz will be allowed, but mm.q.bax flagged
as an error.

An undeclared-variable linter is all well and good, but if you want to
take it to its logical conclusion, you need type checking.

hasOwnProperty guards

The best lints focus on elements of code hygiene we’re reluctant to
faithfully practice while writing code, but come back to fill us with
regret when we fail to do so. One example arises with using for to
iterate over objects in JavaScript; the lint checks that you’re
guarding the loop with hasOwnProperty.

The purpose of these guards is to handle non-plain objects; that is to
say, objects “with class”. The guards are always suggested by the
linter to avoid nasty surprises should you try iterating over the
properties of an object “with class”.

The irony of this check is that the intent of such code is usually to
work with plain objects only, that is, classy objects dhould not be
present in the first place! The construct is still perfectly usable
with classy objects; it’s just that more caution is called for when
using it in that fashion.

As such, there are two basic scenarios of for iteration over
objects.

iteration over plain objects only, and

iteration over potentially classy objects.

The focus of the hasOwnProperty guard ought to be #2, but this
concern bleeds over into #1 cases, needlessly or not depending on how
pessimistic you are about Object.prototype extension. But this
question is moot for the linter, which can’t tell the difference
between the two scenarios in the first place.

By contrast, a type checker can make this distinction. It can decide
whether a variable is definitely a plain object, definitely not one,
or might or might not be one. With that information, depending on your
preferences, it could choose not to warn about a missing
hasOwnProperty guard if it knows it’s looking at scenario #1. So if
a hasOwnProperty is needless, it need not pollute your code for the
sake of the scenarios where it will be needed.

Computer, please try to keep up

Humans are pretty good at coming up with complex contractual
preconditions and postconditions for their programs, and making those
programs’ correct operation depend on the satisfaction of those
contracts. But humans are very bad at verifying those contracts, even
as they make them more complex and depend more on them.

“Simple” linting tools are good at checking for the fulfillment of
very simple rules to help ensure correctness, freeing the human mind
to focus on the more complex cases. What makes them a chore to deal
with—a reason to put solving linter warnings on the “technical debt
backlog” instead of addressing them immediately—is “they don’t know
what we know”; they can handle laborious low-level checks, but lack
the sophisticated analysis humans use to decide the difference between
a correct program and an incorrect one.

Happily, through static analysis, we can get much closer to
human-level understanding’s inferences about a program, while
preserving the tireless calculation of contractual fulfillment that
makes linting such a helpful companion in the development cycle, doing
the parts that humans are terrible at. When your linter is so “simple”
that it becomes a hindrance, it’s time to put it down, and pick up a
type system instead.

Tuesday, November 28, 2017

I previously wrote “Values never change types”, whose central
thesis statement I hope is obvious. And this still holds, but there is
something I left unsaid: values do not have identity, so the notion of
“mutating” them is as nonsensical as “mutating 4”. And the formal
system of Scala types treats objects with identity similarly, by not
permitting them or their variable aliases to change type, even though
they are not quite values. But this is a design decision, and other
choices could have been made.

There are very good reasons not to make other choices, though. Other
type systems come with features that come very close to making the
opposite design choice; by imagining that they went just a little
farther down this garden path, we can see what might have been.

Refinement, flow, or occurrence typing, by any name

In Flow and TypeScript, when you test properties of a value in
a variable, you can “change the type” of that variable. For example,
you could have a let s: any; if you write an if block that tests
whether s is a string, the type of s—at compile-time, mind
you—“changes” to stringwithin the if body. Within the body of
that if, you could perform further tests to refines’s type
further; you might also have various other if blocks alongside
checking for other types, so that s might variously “change” into
number, function, object with a whatsit property whose type is
another object with a whosit property, and so on.

So, instead of having a single type attached to a lexical variable
over its entire scope, a variable has several types, each tied to a
block of code that uses the variable. It is an order more
sophisticated, but still tied to the lexical structure of the program,
as if the variable has multiplied to honor all the faces it might
have. This is a great way to model how people are writing programs
at the type level without overly complicating the formal system, which
still must always obey a complete set of sound rules.

Contradictory refinement

In the systems I’ve described, no further refinement can contradict a
prior one. So once you determine that a variable is a string, it’s
not going to turn out to be a number later; at most, it can get more
specific, like being proven to be a member of a known static set of
strings. So this way you know that inner blocks cannot know less than
outer blocks about the nature of a variable; that is what I mean by
“tied to the lexical structure.”

What “real JavaScript code” could be written that would violate this
assumption?

functionfoo(arg){lets=argif(typeofs==='string'){// refines s to type strings={teaTime:s}// "point of no return"if(s.teaTime==="anytime"){drinkTea(s)...

The first if test establishes a block in which s’s type is
string. Then we pull the rug out from under the type-checker by
assigning to s; with that assignment, it is no longer true that s
is a string. Why does this make type-checking more complex?

Let’s twist and tangle the program to support our beloved mutation

The type of the variable s no longer follows the block structure of
the program, in the way we usually perceive blocks in a structured
program. That’s because the fact established by the outer if test is
suddenly invalidated partway through the block. So our first problem
is one of raising the complexity burden on human interpretation of the
program—the reader can no longer assume that the specificity of a
variable’s type only increases as you move inward, reading the block
structure of the program—but it is not fatal in itself, at least for
this example. We can salvage the model via the Swiss Army knife of
semantic analysis, the continuation-passing style (CPS) transform.

Now it is still possible for inner blocks to contradict outer blocks,
but at least this is only possible at the block level. So, by “merely”
revisualizing our programs in terms of the flow of continuations
rather than the visibly apparent block structure, we can sort of still
think of the type of variables as a “block”-level concern, as it was
before.

Unluckily, performing a CPS transform in your head with all the code
you see is a kind of “reverse Turing Test”, something that an AI
would have to not be able to do very well in order to fool us into
thinking it was human. So no matter what, we are stuck with a
significant new complication in our formal model.

But not a fatal one. Yet.

Loop unrolling according to the phase of the moon

What will prove fatal to the formal model of type-changing mutation is
delay. Let us see how the seeds of our destruction have already been
sown.

One necessary prerequisite is that it has been a full moon. I’m using
this boolean expression to inject the Halting Problem—we cannot
determine in a Turing-complete language whether a boolean expression
will evaluate to true or false, generally—but it is probably
sufficient to say that it is nondeterministic, even if not
Turing-complete. (Pragmatists love Turing completeness and
nondeterminism, because “more power is always better”.) So it’s hard
enough—by which I mean generally impossible—to say whether the s
assignment has happened.

The next prerequisite, uhich should drive us wholly into despair now
if hope yet remains, is that the moon has been full once. Eh? Here’s
where the tie of variable types to any semblance of code structure
breaks down completely, because s takes on a surprisingly large
number of types in this code sample.

To assign a precise type to this program, we have to accurately model
what is happening in it. So, suppose that prior to entering this loop,
the type of s is string. Each assignment to s—made each time the
while’s test is true and the moon is full—takes us one step down
this list of types.

string

{teaTime: string}

{teaTime: {teaTime: string}}

{teaTime: {teaTime: {teaTime: string}}}

{teaTime: {teaTime: {teaTime: {teaTime: string}}}}

•••

Now if you want to assign (potentially) all of these infinite
possibilities to the program, you have to go even further from the
block structure model. Imagine a third dimension of the text file: at
the surface, you see s having only the first two types above, but as
you look deeper, you see the branching possibilities—oh so many
iterations in, oh so many times the moon has been full—each assigning
different types to what is on the surface the same code. Looking at
this as two-dimensional text, you would only see the infinite
superimposition of all possible types of s, weighted according to
their probability.

Three dimensions might be too few for this code.

Of course, there’s a well-known, sensible way to type this code,
sans the log call: abandon the folly of modeling mutation and
assign this recursive union type to s for at least the whole scope
of the while loop, if not an even wider scope:

type TeaTimeTower = string | {teaTime: TeaTimeTower}

And supposing the drinkTea function is so polymorphic, all is
well, and as a neat side bonus, easy to understand. But we aren’t
here to pursue sanity; we gave that up to try to model mutation.

The devil in the delay

If fully desugared, while is a two-argument (not counting still
thinking in CPS) higher-order function, taking test functions as
arguments. Just like you’re writing Smalltalk.

The thing that makes so much trouble for flow analysis is this
delay. Type-changing requires us to contradict earlier refinements of
a variable’s type, not simply refine them further. But the ability to
capture a reference to a variable in a lambda means that we need a
deep understanding of how that lambda will be used. It might never
be invoked. It might be invoked later in the function, just when we
thought it was safe to contradict whatever refinement it was
type-checked with. It might be saved off in another variable or data
structure elsewhere in the program, making reasoning about when the
variable might be referenced in the future a futile endeavor.

Doing flow analysis with sources of delayed execution whose behavior
is 100% known, like if and for, is tricky enough. Doing it in the
presenceeg of unknown, novel, potentially nondeterministic sources
of delay is intractable, if not impossible.

And that’s for the computer. How many dimensions does the model in
your head have, now? Zero, no, negative points for abandoning this
ivory-tower static analysis and declaring “common sense” the arbiter
of your programs’ correctness.

Did anyone else see you come here?

An object with known properties can be thought of as a group of named
variables. This is a longtime, straightforward way to represent
modules of functions, or clean up a global namespace by putting a lot
of related items laying on the floor into the same drawer.

Since we love mutation, and we love mutating variables, we should
love mutating object properties (and their types) even more,
right?

functionschedule(s){s.teaTime="anytime"}

The type of s in the caller after schedule finishes is
straightforward: it’s whatever it was before, with the teaTime
field (whatever it might have been before, if anything) type set to
string, or perhaps the literal singleton type "anytime".

But what schedule is so eager to forget will not be so easily
forgotten by the rest of the program.

Namely, the contradicted, earlier type ofs is very hard to
reliably eradicate. This is an aliasing problem, and it brings
the excitement of data races in shared-mutable-state multithreaded
programs to the seemingly prosaic JavaScript execution model.

To type s in the aftermath of schedule, you must perfectly answer
the question, “who has a reference to s”? Suppose that teaTime was
a now-contradicted function. Any code that calls that function via
property lookup on its reference now takes on another dimension:
before schedule executes, it is safe, but afterwards it no longer
is, so it takes on the prerequisite “can only be called before calling
schedule(s).” The dimensional multiplication directly results from
the multiplication of possible types for s.

The problem broadens virally when you try to model other variables
that are nots, but whose types will still change due to
schedule being called! Here is an example of such a variable.

constdrinks={coffee:coffeeMod;tea:s}// where s is the value we’re talking about

So all the analysis of references to s induced by the type mutation
means references to drinks must undergo the same ordeal. And
references to something that refers to drinks, and references to
something that refers to that, and so on, ad infinitum.

And that is assuming we can statically determine what object
identities will be flying around the program. As with so much else in
this article, this is generally impossible.

By the way, the problem with lambdas is just a special case of this
one; it’s exactly that lambdas alias variables that causes so much
grief for our wannabe-mutating ventures.

A different kind of power, at a much more reasonable price

Since we are only imagining this insanity, not attempting to truly
partake of it, we have something good to feel about, after all: the
grass is really quite brown on the other side, alas, but that on
this side is a little greener than first glance might indicate.

Type systems like those of Haskell, Java, OCaml, Scala, and many other
languages simply don’t permit the types of variables to change. When
you consider the introduction of type equalities in Haskell or Scala
GADTs, or more direct refinements made in Typed Racket, Flow, or
TypeScript; you can include all of these languages in the broader
category of type systems whose variable types can only complement,
never contradict.

This is a powerful simplifying assumption, because under this
restriction, none of the above problems matter. “Functional types”
are not only powerful enough to model programs, and far easier to
understand for the human programmer, they are the only way out of the
quagmire of complexity wrought by trying to “model mutation”. Even
problems that look amenable to mutation analysis, like the while
example above, admit a simpler solution in an immutable type like the
recursive TeaTimeTower.

More power is sometimes worse.

When you forbid unneeded capabilities, you get back capabilities in
other areas. Sometimes this comes in the form of improved
understanding, such as we get for large programs by introducing type
restrictions. It makes sense to give up “power” that is not practical
to get benefits that are.

Take the forbidding of mutation. We take type-level immutability for
granted in the same way that many practitioners take value-level
mutability for granted. Perhaps one reason for resistance to
functional programming might be that we are so accustomed to the
drawbacks of unconstrained mutability that it does not seem quite as
insane at the value level as, seen above, it is at the type level.

Thursday, July 27, 2017

The desire to allow more programs to be typed—by assigning more
accurate types to their parts—is the main force driving research in
the field. – Benjamin Pierce, Types and Programming Languages

Type system design is a major factor in whether you can write
programs, and how easily you can do so. Simplicity is an important
consideration, but that entails a trickier question to start: “what is
simplicity?”

In this article, I want to consider two questions about type system
simplicity by comparing two—relatively good as these things go—type
systems, Haskell’s and TypeScript’s.

First: what dangers come from an excess of pragmatism in type system
design? By pragmatism, I mean the elevation of design compromise
as the highest virtue. The pragmatist seeks to compromise even those
“pure” designs that are entirely suitable when considering the
practical constraints in play.

I don’t use this word, ‘pragmatist’, because I think it’s nice and
accurate. Nor do I think that it’s fair to those who don’t fit the
definition I’ve just given, yet still think of themselves as
“pragmatic” in the broader sense. I use this word because the people
I describe have claimed the mantle of “pragmatism”, and reserved it
for themselves, quite successfully in the world of programming.
And first, we must name the enemy.

Second: what is so compelling about advanced type system features?
New type systems are often beset with users requesting features like
rank-N types, higher-kinded types, GADTs, existential quantification,
&c. There are good, practical reasons these features are requested;
they result in a kind of “simplicity” that cannot be had simply by
having a small number of features.

An unsound feature in TypeScript

Function parameters are a contravariant position; contravariance and
invariance are the only sound choices for them. So TypeScript’s
“function parameter bivariance” is a deliberately unsound choice;
if you’re unfamiliar with it, I strongly recommend stopping now and
reading the linked documentation, along with the explanation of why
they do it; it’s a good piece of documentation, describing an
eminently practical circumstance in which it might be used.

However, this example is worth examining more closely. Think about it
from the perspective of a type system designer: how would you support
the call to addCallback below?

The temptation of pragmatism

TypeScript’s design choice to support this sort of call is unsound.
This is explained by the documentation; again, please refer to that if
you haven’t yet.

There is always the temptation to poke a hole in the type system when
dealing with the problem, “how do I express this?” That’s because you
can then do something you want, without having gone through the
bother of proving that it’s safe. “You can’t do whatever you feel like
doing” is exactly what a sound type system must say. The benefits
of soundness diffuse across your program, filling in the negative
space of the tests that you no longer need to write; they can seem far
away when confronted with a problem here to be solved now.

In this way, unsound features are the greatest ally of the pragmatist.
They’re an asymmetric weapon, because sound features can never say
“just do what you like, here; don’t worry about the distant
consequences”.

We who have a strong distaste for pragmatism must make do instead with
research.

A sound alternative, in Haskell

Haskell is a testbed for many advanced type system features,
demarcated by extension flags. One of the joys of working with
Haskell is learning about a new extension, what it means, and
thinking of ways to use it.

Many of these features are guarded by an extension flag; we’re going
to call on one such feature by placing at the top of the Haskell
source file

{-# LANGUAGE GADTs #-}

One of the things this enables is that you can attach type parameters
to enum members. EventFlag gets a type parameter indicating the
associated type of event.

dataEventFlagewhereMousePress::EventFlagMouseEventKeyPress::EventFlagKeyEvent-- MouseEvent and KeyEvent can be-- related types, but don't have to bedataMouseEvent=-- ...dataKeyEvent=-- ...addCallback::EventFlage->(e->IO())->IO()

e is a type parameter; when you pass an EventFlag to
addCallback, the callback type (e -> IO () above) changes to
reflect what event type is expected.

It is sound; you cannot screw up the relationship between the
EventFlag argument and the event type that will be passed to the
callback.

It is more convenient; if you pass a lambda as the callback
argument, it will simply “know” that the argument type is
KeyEvent or MouseEvent; your editor’s coding assistance can act
accordingly, without you having to declare the lambda’s argument
type at all.

I would go so far as to say that this makes this addCallbacksimpler; it’s easier and safer to use, and can even be implemented
safely. By contrast, function parameter covariance requires you, the
user of the function, to think through in your head whether it’s
really OK, without type-checker assistance, and even then the
library function can’t offer any help to callers if they declare the
lambda argument type wrong.

What’s simpler?

A type system without powerful features for polymorphism makes it
difficult or impossible to describe many programs and libraries in
fully sound ways. A more powerful type system simplifies the task of
the programmer—its features give you a richer language with which to
describe generic APIs.

When the core of a type system doesn’t give you a way to type an
interface, you might follow the pragmatist’s advice, and poke a hole
in the type system. After that, you won’t be able to generally trust
the conclusions of the type checker throughout the program, anymore.

Instead, you might look at the leading edge of type system research,
for a sound way with which to express the API. This is not so
expedient, but yields APIs that are safer and more convenient to use
and implement.

With an unsound feature, the pragmatist can offer you the world, but
cannot offer the confidence that your programs don’t “go wrong”.
Powerful type system features might bend your mind, but promise to
preserve that confidence which makes type systems, type systems.

One of the most important advantages of Free, Open Source Software
(FOSS) is that it returns power to the community of users. With
proprietary software, power is always concentrated in the hands of the
maintainer, i.e. the copyright holder.

The [more] equal status of maintainer and user in FOSS creates a
natural check. It keeps honest, well-intentioned maintainers honest,
and permits the community to reform around new maintainership should a
formerly good situation change. And circumstances can always change.

This equal status does not fall out of the sky; it is mediated by a
legal constitution: the license(s) of the software and documentation
developed by the project. When users accept the license terms—by
redistributing the code or changes thereto—they agree to this
constitution. When maintainers accept contributions under that
license, as in an ordinary CLA-less project,
under
inbound=outbound,
they agree to the very same constitution as the users.

This is not to say that EPFL would be an ill-behaved caretaker! But by
what means could I make that determination? Moreover, why is it even
necessary?

The closest thing to an acceptable rationale for the Scala CLA is that
it addresses legal concerns left unmentioned by the license,
e.g. patent grants. These are important concerns, too frequently
unaddressed by projects using minimalist licenses such as
Scala uses. But the
appropriate place to do this is to address these concerns in the basic
legal constitution for all: the license. If these guarantees are so
important that EPFL must have them, then why should we, as
contributors, not ask them of EPFL, via inbound=outbound? If these
terms would make the license “too complex”, no longer minimal, what
about their placement in a CLA will make them any better understood?

It’s my hope that Scala will abandon the CLA, and switch to a
lightweight option that holds true to the principles of FOSS
projects. A couple options are

This may or may not be coupled with the switch to a longer license
that incorporates stronger patent protections,
like
Apache License 2.0. This
should alleviate the concerns that are currently addressed by the CLA,
but in a way that is equitable to the Scala project, all of its
contributors, and all of its users.

Sunday, April 9, 2017

If you’re interested in design with zero-cost type tagging, or some
cases of AnyVal I didn’t cover in the first article, or you’re
looking for something else I missed, check here. There’s a lot more I
didn’t have room
for
in the first article. Consider
this “bonus content”.

Unidirectional subst

We saw earlier that though substappears to substitute in only one
direction, that direction can easily be reversed. This is due to the
symmetry of type equality—if A = B, then surely also B = A.

Suppose that apply implemented some per-String validation
logic. In that case, you wouldn’t want users of the Label API to be
able to circumvent this validation, wholesale; this is easy to do with
the subst I have shown, and we saw it already when we tagged a whole
list and function, both designed only for plain Strings!

We can get an idea of how to fix this by comparing Leibniz and
Liskov. Looking at the signature of Liskov.subst, you decide to
introduce widen, replacing subst.

// in LabelImpldefwiden[F[+_]](ft:F[T]):F[String]// in val Labeloverridedefwiden[F[+_]](ft:F[T])=ft

This is interesting because if you pass anything and get back a
Success, the succeeding value is just the argument you passed in, no
reallocation necessary. (To reallocate, we would need Traverse
instead of Foldable.)

Unidirectional without subtyping

If you prefer to avoid subtyping, you can also constrain subst
variants with typeclasses indicating directionality. For Scalaz or
Cats, providing both of these would be a sufficient substitute for the
widen[F[+_]] introduced above.

However, this makes it very difficult for typeclass resolution to
reliably distinguish String and T. It’s also easy to accidentally
untag. That’s why we took this out of Scalaz’s Tags; discriminating
typeclass instances is a very useful feature of tags. If these aren’t
concerns for you, globally revealed tag subtyping may be the most
convenient for you.

Boxing Ints

AnyVal might seem to have better, more justifiable boxing behavior
in the cast of primitive types like Int. When putting than AnyVal
wrapper around Int, the custom box replaces the plain Integer box,
rather than adding another layer.

Where the primitive boxing is optimized, you get to keep those
optimizations; AnyVal subclass boxing effectively turns off these
optimizations. For
example,
Integer boxing is optimized,
but MagicInt’s AnyVal class is not.

The one remaining problem with the tag version of MagicInt is that
its erasure is still Object.

(The boxing/unboxing of MagicInt changes to match.) Unfortunately,
there’s no way to tell Scala what the erasure ought to be without
exposing that extra type information, which may be quite inconvenient.

Would you box a JavaScript string?

Maybe if we weren’t working with types. Since we are working with
types, we don’t have to box our strings in JavaScript in order to keep
track of what sort of strings they are. But Scala might want to,
anyway.

Surely it doesn’t have to for our tag-like Label. And indeed it
doesn’t.

valh=Label("hi")// compiles tovarh="hi";// fastOptJS is smart enough to know// that apply can be elidedvalhs=js.Array(h,h)// compiles tovarhs=[h, h];valstrs=Label.subst[Lambda[x=>js.Array[x]=>js.Array[String]]](identity)(hs)strs(0)+strs(1)// compiles to((""+$as_T(hs[0]))+hs[1])// fastOptJS is smart enough to know// that subst, too, can be elided

The possible existence of subst tells us something about the deeper
meaning of our abstract type definition, type T = String, that holds
true no matter how much of this equality we hide behind existential
layers. It is this: the compiler cannot predict when the fact that T
= String will be visible, and when it will not be. It must therefore
not generate code that would “go wrong” in contexts where this is
revealed.

For example, at one point, we saw that

Label.subst(Monoid[String])

would yield indeed produce a suitable Monoid[Label]. This means not
only is the value’s type reinterpreted, but also, by consequence, its
members.

However, in subst, we have charged the compiler with doing this
arbitrarily complex substitution with 100% accuracy and in constant
time. There are no opportunities to generate “wrappers”, not for these
structures that merely employ Label in their types. And, by
consequence, there’s nowhere to put code that would use some means to
treat Label and String differently based on runtime choices.

If you wish to automatically add “wrappers”, you have a difficult
problem already with parametric polymorphism. With higher-kinded
types, you have an intractable problem.

You can still “add a type parameter” to do abstraction on your
tagged types.

You can hide much of the complexity of a monad transformer stack,
allowing it to infer more
easily
with Unapply or
-Ypartial-unification. This
is because, unlike standalone type aliases, scalac can’t dealias
your abstraction away. (Warning: this doesn’t apply if you make the
type T “translucent”; hide your types to keep them safe from
scalac’s prying expander.)

This is also a specialization of the type-member based MTL encoding I
used
in
"Tagless final effects à la Ermine Writers". The
essential difference is that individual program elements were
universally quantified over the expansion of the abstract type, where
here, the entire program is universally quantified over that
expansion, because the existential quantifier is globally bound.

And, of course, if you are an ML (OCaml, SML, &c) fan, you’re probably
thinking “yeah, so what? I do this all the time.” Sorry. We can be a
little slow on the uptake in Scala world, where we greatly undervalue
the ideas of the functional languages before us.