Approximating Eigenvectors in Idris

Sunday October 11th, 2015

Idris is a functional programming language with dependent types. In a dependently typed language, types may depend on values, as well as other types. One advantage of this is that it makes the type system much more expressive and enables the compiler to formally verify more of the logic of the program. While extremely powerful, dependent type theory is less well-understood than other formal systems (see where dependent types fit into the lambda cube). Idris is exciting because it is a testing ground for research in dependent types.

Vectors and Matrices

A vector space forms an ideal setting for this exploration of Idris since many many matrix and vector operations have constraints which can be more easily represented and checked in Idris than in other languages. For example: vectors may only be multiplied if they have the same length, and two matrices must have a common dimension in order to be multiplied together.

Vectors in Idris have type Vect n a, where n is a natural number. That is, the type of a vector contains a value. This means that the vectors [1,0] and the vectors [0,1,0] actually have different types in Idris, and their multiplication via the dot product can be excluded at compile-time, rather than handled at runtime with an error or by silently discarding elements of the longer vector.

With a representation of vectors, it's easy to define a type for matrices:

Matrix : Nat -> Nat -> Type -> Type
Matrix m n a = Vect m (Vect n a)

Matrix operations can be checked in the same way as vector operations.

All of these helper functions, which we'll use in the implementation of the power method, benefit from the compile-time checking of vector and matrix operations:

So far, so good. The looks (pretty much) like the analogous Haskell code. The infix operators <:>, <#>, <+>, <->, and </> (below) come from the definitions of various algebraic structures - group, ring, vector space.

The Power Method

With these utility functions out of the way, we're ready to implement the power method. The heavy lifting is done by a function which takes a matrix, a "seed" eigenvector, and a list of previously computed eigenvectors and approximates the next eigenvector to the desired precision:

The power method is iterative. Each iteration, we start with a "seed" eigenvector, our best approximation of the eigenvector we're interested in. Then, we generate a new approximation by multiplying our matrix by the "seed" vector and normalizing the product, which is a vector of the same length.

We define the error of each iteration to be the distance between our original approximation (the "seed") and the new approximation. If the error is less than our desired precision, we stop; otherwise, we repeat the process with our new "seed".

The result of this process will converge (under a few conditions) to the "first" eigenvector - an eigenvector corresponding to the eigenvalue of greatest magnitude. In order to approximate successive eigenvectors, we first orthogonalize the "seed" to each of the previously computed eigenvectors in turn.

Note that we want the tmp and result vectors to be the same size as the "seed" vector, but the size of the seed vector is only carried in its type, which means that it's not automatically brought into the scope of the function body. This is a problem unique to a dependent type system, and Idris solves it with a notion of implicit arguments. The size of the seed vector is explicitly brought into scope using curly braces ({n}), which allows n to be used in the type signatures of result and tmp.

Effects and Randomness

There's a catch in what I've described so far - the power method needs an initial seed vector to jumpstart the iterative process of approximation. The choice of seed vector is important: if the seed vector is orthogonal to the eigenvector we're trying to compute, then the iterative process will not converge. In practice, this is addressed by initializing the power method with a random seed vector, so that the probability of non-convergence is infinitesimal.

Constructing random vectors requires a source of random numbers from the operating system and introduces side-effects into our previously-pure program. Handling side-effects has traditionally been a challenge for functional languages which aspire to purity. The Idris effects package offers a unique approach to handling effectful computations which is made possible by Idris' dependent type system.

Simple effects in Idris are parameterized by the return value of the effectful computation and the particular side-effects used in the computation. Specifically, a list of side-effects (i.e. a value) is included in the effect type. This guarantees a high degree of type-safety for effectful computations, since the compiler can ensure that only side-effects present in the type of the computation are used. At the same time, it's easy for users to combine side-effects, without worrying about the order in which those effects are applied (goodbye and good riddance, monad transformers!). In practice, the compiler seems to have trouble correctly resolving effect types, but hey - Edwin Bradytold you this is an experimental language!

The full Effects machinary is much more complicated and flexible than the simplified picture I've given here; there's an excellent tutorial on effects in Idris that explains effects in greater detail.

Our eigenvectors function uses two side-effects: RND, to generate random numbers, and SYSTEM, to seed the random number generator with the system time. Both side-effects are reflected in the return type of the function. (The ! in front of time unwraps the effectful value of time so that it can be used by srand). Notice again that the size of the matrix is captured via an implicit argument to eigenvectors.

Thoughts on the Elm language

Thursday September 10th, 2015

Elm is a type-inferredfunctional reactive language for web programming ("reactive" means that the language has built-in primitives for expressing time-varying values - see FRP). I decided to try Elm out on a small project, a web app for displaying incoming data streams from an array of environmental sensors.

I built a data display (code here) which reads measurements from a simple text file accessible via HTTP and produces a simple visualization. In order to show the incoming data in real time, the app must periodically check the HTTP endpoint to gather data and update the displays. These modest requirements turn out to make a great project for testing the maturity and capabilities of Elm, since they introduce a time-varying parameter at the heart of the project, and provide an opportunity to explore Elm's approach to handling impure functions like HTTP requests.

Writing a simple web app in Elm

There can be a steep initial learning curve to getting started with Elm. Even a job as simple as sending a periodic HTTP request in Elm introduces a lot of heavy machinery: signals, tasks, mailboxes, and ports. Here's a snippet of code that's responsible for periodically retrieving data for the data display:

The data we're going to retrieve will end up as a signal (time-varying value) associated with the mailbox we just created: rawData.signal : Signal String. To turn the raw HTTP response into something we can work with, we parse the response string, timestamp it, and then update the state (SensorBoard) with the latest batch of data.

So far, so good. The tricky part is getting the data to end up in rawData.signal. The function getSensorData : Task Http.Error () is responsible for a single HTTP request. The type signature tells us that getSensorData either returns () (the unit value, indicating success), or an error of type Http.Error. If getSensorData is successful, then it also updates rawData.signal with a new value as a side effect). In order to send multiple HTTP requests to periodically retrieve data, we map getSensorData over a signal which updates itself every Config.updateInterval seconds, but always has the same value (Config.sensorDataUrl).

Finally, running this impure function requires that we declare it as a port at the top level.

It's easy to write a solution to this problem which typechecks and looks plausible, but which only launches a single HTTP request. Getting the solution to work as intended requires thinking about the signal graph and understanding how tasks work. It's not intuitive (at least for me). It takes a bit of thought to realize that the URL needs to masquerade as a time-varying value in order to launch periodic HTTP requests.

Successes and challenges

As I became comfortable with Elm's FRP model, the biggest challenges I faced were the result of the relative youth of the language community.

There's a dearth of community libraries, and those which have been released are often in flux as their authors figure out how to express in Elm solutions originally developed in JavaScript or Haskell. The lack of stock solutions to common problems means that simple projects can take much longer than expected because so much must be built from scratch. Even the core libraries sometimes change dramatically (tasks and mailboxes were just introduced with v0.15) - and this means more time updating and refactoring code.

Layout and styling in particular were frustrating at times. There are two main display systems in Elm. The Graphics.* core libraries provide a simple layout system built of rectangular components of known width and height. This means that it's not possible to style Graphics.* elements with CSS. It is possible to use CSS with elm-html, but you give up the ability to get the dimensions of elements from Elm code. Fortunately, there's some good work being done on these issues right now, with Elm-Css recently released and discussions about alternative layout / styling systems.

It helps that the Elm community is friendly and supportive. When I've been truly stumped, I've gotten quick (and thoughtful) responses from Elm veterans on the elm-discuss mailing list. The elm-discuss list is an exciting place to be; there are questions every day from beginners trying out Elm for the first time and discussions (sometimes heated) about language features and infrastructure. Even better, many of the folks involved in those conversations go on to contribute to Elm's community libraries. Since I started working on this project, in March of 2015, the number of community libraries has grown tenfold.

It also helps that Elm ships with a suite of amazing tools - the "Elm Platform", including the compiler, REPL, package manager, and the reactor, an interactive development tool. These tools make it a pleasure to work in Elm.

But my favorite part about writing Elm, by far, is how maintainable it is. It's amazing to me how easy it was to come back to this project after 2 months away and begin to refactor messy code and add features once again. I love that I can be productive almost immediately when I sit down to revive a long-dormant Elm project. I may forget how the different pieces of my project fit together, but the Elm compiler doesn't forget. This is significant because it adds up to a big boost in productivity for web developers using Elm.

Final thoughts

Elm is a work in progress. It's frustrating sometimes, because APIs change frequently and there are missing pieces, but also tremendously exciting because the performance, power, and expressiveness of the language are constantly improving.

The Elm community follows a top-down BDFL model, with creator Evan Czaplicki putting in the most hours and reserving the final say on questions of language syntax, features, and implementation. Czaplicki is autocratic and sometimes brusque, but he is also responsive and supportive of active contributors to the language. The scope of Czaplicki's vision has attracted an active community of language maintainers, library developers, and compiler enthusiasts to contribute to the language.

This community is swiftly increasing the number of available Elm libraries. Even more exciting, there are a ton of exciting improvements and features coming (hopefully) with the next compiler release, including tail call optimization, exhaustiveness checking for pattern matches, and dead-code elimination. For me, at least, Elm is not yet an efficient and effective replacement for JavaScript - but of all the frameworks and compile-to-JavaScript languages I've worked with, Elm has the greatest potential.

Designing Leaflet.toolbar

Tuesday January 6th, 2015

Leaflet.toolbar, a JavaScript library for building flexible and extensible toolbars for Leaflet maps, is just released and available on npm. Leaflet.toolbar is an effort to extend the excellent Leaflet.draw plugin and to enable new kinds of map / user interactions. In particular, Leaflet.toolbar aims to:

Decouple the actions exposed by a toolbar from its overall behavior and appearance;

Enable both control-style and popup-style toolbars out of the box;

Make it easy for library developers to define new kinds of toolbars;

Be easy for application developers to instantiate;

Remain interoperable with Leaflet.draw.

Here's where Leaflet.toolbar came from and why I built it.

Why build a toolbar plugin?

Matt and Jeff were my mentors for a Google Summer of Code project focused on improving MapKnitter, a piece of software created by Public Lab for Open Technology and Science to enable grassroots mappers to do georectification.

Matt and Jeff had found that Public Lab community members, unable to communicate the purpose of their maps using MapKnitter, would export their maps as images, then draw on them in Powerpoint or by hand, adding arrows, lines, circles, and text annotations directly on the map. Matt noted in a wonderful research note that, historically, mapmakers have written and annotated their maps using different size fonts, multiple colors, and fonts of varying sizes to encode dense, overlapping geographic information. Until recently, web maps (and especially raster maps) lacked this degree of expressiveness.

It was important to us to bring the full expressiveness of analog mapping into the browser so that citizen scientists and mapmakers could easily communicate the purpose of their maps.

Migrating from OpenLayers to Leaflet gave Public Lab the opportunity to envision how we might enable this kind of expressive digital mapmaking. Leaflet ships with a full complement of vector annotation layers - circles, rectangles, polygons, and polylines - that can be overlaid on maps. Leaflet.draw makes these layers interactive, enabling users to create annotations dynamically by drawing them on the map. This was a great first step. However, Leaflet.draw didn't provide any way for users to alter the style of these map annotations - their color, weight, or opacity, for example.

Leaflet.draw exported a toolbar which was the entry point for its interaction functionality. In Leaflet.Illustrate, I began developing new kinds of map annotations. Yet, when I tried to expose these new kinds of annotations to users alongside those provided by Leaflet.draw, I found that its toolbar wasn't easy to extend. The toolbar was not the core of the new features I was trying to develop, but because it was the entry point for users, it deeply affected the user experience.

This plugin is an effort to generalize the elegant toolbar provided by Leaflet.draw and to overcome some of its limitations.

Toolbar actions

The toughest and most protracted challenge in designing Leaflet.toolbar was deciding how to represent toolbar actions. The difficult was that toolbar actions couldn't be instantiated when the toolbar was defined (for example, by a plugin developer) because these actions took as mandatory arguments to their constructors a map object, which is usually created by application developers.

Leaflet.draw solved this by defining the actions in the toolbar in a method of the concrete DrawToolbar class.

I struggled with this for a long time. I wanted to separate the specification of a toolbar's actions from its behavior as an generic interface, so I didn't feel that it was appropriate to define the actions in a method of a specific toolbar class. Yet I was facing the same difficulty as Leaflet.draw - the necessity to make the map variable available to the toolbar action constructors.

I'm very pleased with the solution I came up with (at long last, after months of re-re-refactoring). Leaflet.toolbar requires developers to define the actions in a toolbar as a simple array of constructors extending L.ToolbarAction:

This is direct and concise. What's more, this new pattern makes it easy for application developers to customize the behavior of an action at toolbar instantiation by extending the action to create an anonymous class:

This succint specification for toolbar actions is made possible by a little bit of wizardry with Leaflet's built-in classical inheritance system and by overloading L.Toolbar#addTo.

Overloaded #addTo method and _getActionConstructor

Conceptually, a toolbar is an interface that allows a user to manipulate the state of objects on the map or the map itself. Leaflet.toolbar allows the developer to specify which map objects the toolbar will control. In order to keep the API simple, L.Toolbar#addTo is overloaded to add the toolbar to the map, and to designate the map objects which the toolbar will control. The arguments of L.Toolbar#addTo are passed to each action in turn. That is,

calls new EditShape(map, shape) and new DeleteShape(map, shape) in turn.

The magic that makes this work (which didn't come to me until late December) is all contained in L.Toolbar#_getActionConstructor.

Variadic functions are easy to construct in JavaScript using Function.prototype.apply, but this technique doesn't work with constructors. To treat the toolbar action constructors generically, without regard for their arity, Toolbar#_getActionConstructor creates an anonymous class extending each toolbar action to initialize the toolbar action with the appropriate arguments:

Secondary toolbars

I thought I was done developing Leaflet.toolbar after about a month of tinkering with it in my spare time. Looking back at Leaflet.draw to confirm I had implemented all of its features, I noticed that each action in the toolbar, once triggered, provided a menu of secondary options related to the action at hand; for example, users drawing a polygon or polyline are given the option to cancel drawing, or to delete just the last point. No problem, I thought. I'll just add that in - it'll take a couple of days.

Wrong! My design at the time had no way of accomodating these secondary actions. It took me the next few months, still working in my spare time, to reorganize the code to accomodate these secondary actions.

The menu of secondary actions, I realized, was kind of like a toolbar. Why not just make it an instance of L.Toolbar? This was the solution I settled on: the subToolbar option of each toolbar action points to an instance of L.Toolbar containing the appropriate secondary actions.

This system of nested toolbars made for efficient code reuse, but meant that the CSS styles had to be seriously modified. Toolbars in Leaflet.toolbar are automatically given a leaflet-toolbar-n class, where n is the 0-based level of the toolbar in the nested toolbar hierarchy. This allows for styles to be applied easily across all toolbars and menus at the same hierarchical level. One drawback of this method is that styles have to be tightly scoped to avoid inadvertently styling containing toolbars (for example: leaflet-toolbar-0 > li > .leaflet-toolbar-icon {...}) - but hey, you can't win them all!

Looking back

The toughest part of this project, hands down, was dealing with the constraints of an existing API. Developing Leaflet.toolbar was different from working on other libraries that I've contributed to because, with Leaflet.toolbar, I was proposing to take over functionality from Leaflet.draw, rather than extending it. I wanted to inroduce a new dependency into the stack that would handle existing functionality. Working on a project with this degree of interdependence between the parts was new for me, and it was difficult. There were times when I felt that my ideas for Leaflet.toolbar couldn't possibly be accomodated by Leaflet.draw's API, and I dreamed about drastic rewrites of Leaflet.draw that might mesh more neatly with my toolbars. Thank god I didn't follow through on those dreams! Both Leaflet.draw and Leaflet.toolbar would have been worse off! I did make some small changes to Leaflet.draw, such as modifying each drawing handler to inherit from L.ToolbarAction rather than L.Handler, but for the most part, I left it as-is.

While it was sometimes frustrating, Leaflet.draw was an enormous help to me in developing Leaflet.toolbar. I used @jacobtoye's excellent control-style toolbar as the inspiration and the spec for Leaflet.toolbar, and I was constantly tabbing back to Leaflet.draw to check the functionality of the Leaflet.draw toolbars. I worked on porting the Leaflet.draw toolbar interface over to Leaflet.toolbar as I was developing Leaflet.toolbar, which helped maintain a tight feedback loop and informed the features I was building in to Leaflet.toolbar. If anything made my job difficult, it was that @jacobtoye and the Leaflet.draw set the bar so high!

Thanks are in order to @jacobtoye, for a wonderful drawing plugin, and, of course, to @mourner, for the amazing open-source mapping library that started it all.