So I got a (recurring) task to compile some stats from Jira. You now, some filtering and agregation. Possibly a pivot table and a chart. Sounds like a perfect task for spreadsheets. But Jira doesn’t have spreadsheets - at least without addons, but that is not an options for me.

So I decided to use Google Sheets. And figure out a way to automatically fetch data from Jira because there’s no way in hell I’m doing that manually every time. Turns out there are some scripts you are supposed to paste into sheets and give them all the permissions. But I don’t trust that, not without properly reviewing the thousands of lines of code.

Obviously I decided to roll my own. I figured a way that’s actually really easy. Just create a filter in Jira and click Export -> Printable. This will give you an URL with all your tickets formatted nicely in a <table>.:

Why is this useful? Because Google Sheets knows how to scrape HTML tables. If you append your credentials to the url. The whole formula will look something like this:

This magically pulls your Jira table into a spreadsheet. Now you can use all your spreadsheet tools to make reports out of this. Or even SQL! Google Sheets has a QUERY function which takes a data range and an SQL-like expression to run over it.

Now we just need a way to simply refresh the data (remember, this is a recurring task). Sheet re-load the data when the URL changes and Jira ignores unknown parameters. We can use this to add a dummy counter which we increment!

The Back Story

My first foray into world of Linux happened with Red Hat Linux 6 (original, not the enterprise one). It was magical but it didn’t stick. See, I was a kid who liked video games. So it was back to Windows until I finally got serious about programming and discovered I the development environment on Linux. So I formatted my hard drive and installed Ubuntu. I quickly learned absorbed information and slowly got bored. When Unity came out it was time to switch. I dabbled a bit with Mint but ultimately landed on ArchLinux. I really like the RTFM or GTFO philosophy of Arch because it means you learn stuff and you learn it quickly. Because you just have to. Oh, and there is great documentation. And everything was great. Until I borked my system doing an upgrade or some other system package operation. A few times. Btrfs to the rescue! But there must be a better way. (I was doing manual snapshots of the root filesystem before any potentially destructive operation.

Then I heard my colleagues talking about this amazing package manager called Nix and the derived distribution - NixOS. It’s tag lines are (respectively:

The Purely Functional Package Manager

and

The Purely Functional Linux Distribution

The Install

Being the Haskell geek I am, I needed to give it a spin. I tried the Virtual Box image but it was weird. It has KDE and I couldn’t really figure out how to alter it. So I decided to just try to do a fresh install in Virtual Box.

Turns out the manual pretty much has you covered. It’s not Arch Wiki level, but good enough to set up a minimal install without any major issues.

At this point I was so fascinated by the core concepts that I just bit the bullet - I decided to install it onto my physical machine. What could possibly go wrong?

So I just created a quick subvolume (Btrfs again!) and rebooted. About half an hour later I had a working NixOS install alongside my Arch. It was really minimal but I was happy and I built on it quickly. A few evenings of playing around later I just switched. I set up pretty much the same environment I had before so my work productivity didn’t suffer a bit. If anything it was better - now I no longer feared system upgrades because rollback had my back.

The Crash

And then it happened. My hard drive failed and there were issues with the external backup as well. I managed to recover data but not the system. No biggie - NixOS is trivial to rebuild from the config file. Or so I thought. I had to locally rebuild the world, starting from glibc. (I think it was my error for using an old boot image.) That took pretty much the whole day.

So I devised plan for re-installs which I tested out and I’m now documenting here for future usage:

download fresh nixos-unstable image

boot it and install minimal system with network manager

boot into the new system

update channels

fetch personal config

rebuild

With this approach I’m up and running an exact replica of the system in about half an hour. Yup, my hard drive crashed again. This time my backup was fine but I decided to actually go this route to test out the procedure. Works like a charm. And I also did it when a bought a new computer. It’s kind of creepy when after half an hour you are greeted by a familiar desktop on an unfamiliar hardware. Creepy but very cool.

One of the first things you learn as a newcomer to Scala is the difference between a list and a tuple: a list must be homogeneous but a tuple can be heterogeneous. That is all elements of a list must have the same types but a tuple can contain things of different types.

A direct consequence of this is that a list can define a by-index accessor and a tuple cannot. You can use list(3) but you can’t do tuple(3) - you need to do tuple._4 (and there is that pesky off-by-one).

So let’s use the awesome powers of Scala to negate this and implement apply method on tuples.

First steps

Let’s start with with baby steps and not tackle full blown apply with integer index but instead do an approximation with special access constants. Something like

It’s not a big step from regular tuple accessors but it’s a big move since it introduces a single polymorphic apply.

To pull this off we’ll type classes in conjunction with singleton types. The apply will take in one of the singletons and use implicit resolution to pull in the function that does the proper projection. The important part is that the implicit resolution also needs to compute the output type of the apply method.

The trick here is the type member that is defined to match the number itself. This way the type checker gets access to the “value. Now we just need to rewrite the instances to use this.

Except it doesn’t work. This mechanism will generate the Nats but it’s too weak - N will just be n.N and this is not enough information to power implicit resolution. If only there was a way to fully generate the types at compile time…

Macros to the rescue

We can keep the Nat type and the implicit conversion into it but implement it as a macro that just generates TNats directly (without recursion). This will give strong enough types to power implicit resolution.

Simplifying

Since we already have a macro sitting in there why not have it do all the heavy lifting? We can cut bulk of the code and remove a lot of the complexity of the implicits if we just use a macro that transforms a.apply(n) into a._${n + 1}.

This works but I think it’s not the best idea. See, macros don’t compose. You cannot really use this from another function, you always need to statically know the index. As where with the previous implementation you are good to go as long as you have a good Nat and the At instance. Which you can pass in programmatically. And you just push the implicit conversion to Nat a layer out. This way you can re-use the indexing mechanism for other operations and you just have a single macro sitting in the background powering things instead of having a bunch of one-offs.

In fact this is exactly what shapeless already does! It uses very similar ideas and pushes them much further. But most importantly, it already includes our apply so if you actually want to do this just import shapeless.

Just recently I was porting a toy sized parser combinator library (a proof of concept) from Haskell to Python. You know, for educational purposes. It turns out I’m not smart enough to keep the complicated types (and explicit laziness) in my head even for such a small project. So my solution was to do TDD. To clarify: I wanted to test happy paths through my functions to make sure at least types fit together.

So tests I will write. But that means I need another file for tests (yes, it was I single file project, that’s what I mean with “toy-sized”), some actual structure, a test runner, probably a package description for dependencies… How about no? I remembered seeing a thing called doctest once. Turns out it fit perfectly!.

And then run with python myfile.py. It is that simple (and built-int). You just put examples in the docstring and they will be machine checked.

But I’m lazy and don’t want to run tests every time by hand. I want a poor-man’s runner with --watch capability. And I can have it as a bash one-liner (given that inotify-tools package is installed on my system).

This will automatically clear the screen and re-run tests every time I change the file - save it from my editor. Now I can finally develop my toy projects in split screen with terminal and vim using only my editor’s save function to run tests :)

Bonus section

Doctest is ported to Haskell, so I can use it there as well. Just need to stack install it globally and it will be available for my one-file projects.

Quite recently I managed to make myself a corrupt git repository due to a file system failure. See, git stores everything in content addressable blobs - the file name of something is it’s hash. Which lends itself nicely to checking repository integrity - it keeps out malicious attackers as well as my file system problems.

I already hear you saying: Why not just make a new clone, git is distributed anyway? Well, I wasn’t diligent enough to push everything. I had local commits that were quite important, so I spent some time fixing it.

fsck

Git has a command to manually check integrity of the repository: git fsck. Running it lists all the errors.

Luckily in my case the list was quite short so I went ahead and deleted all the objects that were listed as corrupted. So now my objects are fine, but I’m missing some. Luckily (again) corrupted objects did not contain any data pertaining to unpushed commits so I thought I can use a close to restore them.

unpack

So I lied a bit, git doesn’t store every blob in a separate file, that would become huge pretty quickly. Instead it uses packfiles. It packs several blobs into one file and does delta compression to reduce disk usage. So I cannot just copy over blobs from a clone.

Fortunately git has commands for dealing with packfiles as well. The one of interest is git unpack-file which takes a packfile, extracts all the blobs and dumps them into the repo. Potentially producing loose objects, but let’s not care about that for a second.

We are used to encoding numbers on computers in binary. Binary is the “simplest” base that yields logarithmic length though it may not be optimal. But can we do simpler? How about unary?

Unary is often used with Turing machines where we don’t care for efficiency and I will assume this same stance. Let’s forget about efficiency and explore what can unary numbers do that binary can’t. Specifically lazy unary numbers as otherwise the systems are equivalent. I’ll be using Haskell as it is lazy by default and thus a good fit.

This are natural numbers as usually defined in mathematics. An example value (3) in this encoding would be Succ $ Succ $ Succ Zero. For the sake of simplicity I will from now on assume that we do not use bottom values that correspond to errors.

Infinity

The first interesting property of this representation is a simple encoding of infinity in finite space.

This ties the knot and only constructs one Succ that points to itself. The interesting (and useless) bit is that we cannot observe this in a given value! At least not in pure functions. Given a Nat we cannot possibly know if it is infinite or just very large.

Equality and ordering

I just derived Eq before not giving it much thought. But notice now that comparison between two Nats may not return if both are infinite. However if one is finite we can detect that the two are not equal. All is not lost. We can even do more and define ordering

Let’s walk through it. Addition recurses on first argument. Note that the second argument is reused. It will only allocate Succs matching the first argument.

Multiplication again recurses on one argument but does addition at each step. Since there is no way to represent negative numbers in this scheme I defined 0 - n = 0 which is a bit shady but works in most cases. Similarly negate throws an error. Anther consequence is that abs is just identity and signum always returns 1.

But the most useful function is fromInteger. I extended toEnum by handling negative cases. This does not look like much but due to literal polymorphism we can now write decimal literals and they will be automatically converted to Nat where this is the expected type.

Conclusion

There are two interesting things. First is encoding infinity. Not much on itself. But the second thing is partial evaluation. By traversing nSuccs we know that the number is greater or equal to n. This means we can compute even with infinite numbers as long as we don’t need to look at the exact result.

Sometimes you want to distinguish between different types that have the same underlying representation. For example both UserId and ProductId could be represented by Long. The usual solution is to introduce wrappers in order to make the distinction safe.

But this introduces runtime overhead of boxing and unboxing over and over which may add up in some cases. Luckily Scala 2.10 introduced value classes. We can ensure no runtime overhead by extending AnyVal (this can only be done with classes with one field).

Inheritance

So far so good. But we cannot make a value class extend this Id! A value class may only extend universal traits. This means we could define a trait that represents the notion of Id but we could not make it into a concrete type with values. And even more problems occur when we want to play around with variance. What now?

Tagging with types

A possible solution is to define an “empty” higher order type and store tags into type parameters.

A good thing about this approach is that unboxing is automatical since V <:< @@[V,T]. But sometimes you may want to untag your values in order to pass them somewhere where you don’t want to keep the tags. For this we just need a function that uses the automatic unboxing

The trick is just to “pattern match” on the type we implicitly convert.

Collections

Sometimes you want to tag a collection of something. You could xs.map(_.tag[Foo]) but this would actually create a new collection at runtime. We can get away just with casting (thus in constant time)!. Notice that collections are nothing special, we may just as well cast a json printer instead of creating a wrapper.

This is an abstraction over any M[_]. You could write abstractions for other shapes but in practice I never needed anything other since this covers collections and most typeclass instances.

Variance

An observant reader noticed I defined @@ to be covariant in both arguments. You probably should leave the value type covariant since it is by nature covariant at runtime but you may change it to invariant if you want to “disable” automatic upcasting. However the tag type may also be contravariant although I found that covarinace is what you naturally expect and covers most cases. Sadly I haven’t found a way to abstract over variance.

Disclaimer

I got this idea from ScalaZ but implemented it in my own way a while ago.

Please enable JavaScript to view the comments powered by Disqus.comments powered by Disqus
]]>Sat, 02 May 2015 00:00:00 UThttp://edofic.com/posts/2015-05-02-tagged-types.htmledoficApproaches to designing a Haskell APIhttp://edofic.com/posts/2014-10-05-haskell-apis.html
Posted on October 5, 2014

Recently I’ve been thinking about the design of programming interfaces, especially in Haskell. But don’t let the title misguide you; this is not supposed to be a tutorial or a guide but simply an showcase of different styles. Feel free to tell me I’m wrong or missed something.

The problem

Let’s say we are writing an interface to RESTful web service. Our goal is to create type safe functions and descriptive models but all in all easy to use.

The service should be simple so our examples are kept small. So let’s have a single resource that supports POST and GET on single items. In pure Haskell it would look like

But this functions couldn’t possibly be pure since they should be talking to our service, so we have to adapt this interface.

The OO way

Our functions should at the very least take some sort of a client as an input. Let’s define a client that does raw JSON requests to our service. This is just a rough sketch so we have something to work with

Which is actually not bad. We presume our service (or the network) will never fail, we push the burden of managing the client to the user and we clutter every function’s signature with Client but we could do worse. On the up side we can now use this in a object-oriented-looking way.

Monads

Everything is better with monads right? So we can define a monad to replace the client. What’s the essence of the client? It carries some configuration and has the ability to perform IO. So we build a monad that can do these two things. And since I’m lazy I’ll just do

But there is a problem. It is impossible to write an IO transformer so this new monad will have to be the base of the monad transformer stack. And this hurts modularity. Can we do better? Yes! Let’s just do a monad class

But now we can do better. We can write an instance for running tests that doesn’t perform IO but instead simulates the service locally.

This is somewhat lighter for the user since he doesn’t need to manage the client any more neither does he need to use the client for every request. He just need to put configuration and IO into his monad stack.

mtl?

Can we push the monad class approach further? Let’s take a look at the mtl library. It provides classes of operations for every monad. We can write a class for our whole service.

And now implement this in terms of our first (naive) implementation which is not directly exported any more. User only sees the class and gets an instance that will work with the configuration and ability to perform IO. This now makes it very easy to write an instance that uses State (or something similar) to simulate the service for testing without bothering with JSON and other implementation details of the actual service.

But it also has a downside. Imagine a bigger service with multiple resources. The class will explode and become unwieldy. Also making the implementation harder. If the implementor wants to support another backend he now has a big instance to write. If he wants to add a method he now has several instances to fix. Sound a bit like the expression problem.

Purity to the rescue

A good way to structure your code is to separate pure functions from impure actions and minimize the latter.

One way to do this is to make a pure description of requests and responses. Then define a uniform intermediate representation that works well with our protocol and the client that actually does the requests. Only the client needs to perform any effects, all other code can now be pure.

There are two things going on here. First we encode the type of response into the request so request can be safe. Second we use that type information that is locally available inside toRaw to pick the right instance for decoding JSON and put the specialised function into the raw representation.

We now have it all: safety, modularity (we can write tests in terms of pure requests and responses), we can simply plug a new backend and even explicitly talk about requests since they are just plain old values.

But we cannot statically determine the type of the request nor can we simply add a new type of request. Former being a philosophical remark and latter a real world requirement. We’ve again hit the expression problem. If we add a request we need to modify existing code in all functions creating an intermediate representation (or working directly with requests). At least adding a new backed is very simple since it only depends on the intermediate representation.

Type classes revisited

We want to be able to statically enforce types of requests. This is simply achieved if we define a single constructor type for every request instead of a sum type of all requests. But now we cannot have a function to convert it into an intermediate form. But we can have a type class. Moreover using multi param type classes and functional dependencies we can encode the type of result for each request and require the instances to parse the result from the intermediate form. Functional dependencies will ensure we can always compute the result type from the request type.

I believe we achieved our goal. We can add a new request without modifying existing code by simply adding new instances. And we can still add new backends that only rely on the intermediate form. We still can have pure tests and as a bonus big APIs will not require giant functions anymore, we can even break them up into several modules.

Free

I added this section after reddit user aaronlevin reminded me I forgot about the Free monad.

The essence of using free is building a pure description of the whole program and then writing an interpreter to run it. You can use FreeT transformer if you want to mix in some other effects.

We start of by defining a functor that specifies our language. I defined the structure and derived the Functor instance since it’s trivial.

get wraps a RequestGet and puts return as the continuation to lift the return value into the monad, create does the same for RequestCreate.

Since we’ve established that having an intermediate representation is a good thing let’s create another free monad that defines a program in terms of raw requests. Then we can write an interpreter that converts the program into this language.

If I recall correctly the operational package defines some machinery that simplifies definition of our language and interpreters but I haven’t used it yet so I’m not familiar with the details.

In either case you still have a closed set of operations that are defined by your functor. And modifying this functor requires modifying all existing interpreters for its free monad. But there are two more downsides. Some boilerplate is required to make usage of actions from our functor simple though this could probably be automated away. More importantly there are performance issues with Free in some cases.

Monad laws require the instance of Free to define >>= in a way that is associative. And it does that. But if you look at the definition of Free you will see that it closely resembles linked list [] which also has an associative concatenation operation ++. But ++ has performance problems if used in a certain way, namely repeatedly putting big lists on the left side. And Free’s bind has this problem too. Everything is okay if you just use the do notation with simple actions as there will always the smallest possible instance on the left, but problems arise if you compose and nest instances of Free. And by problems I mean bind being linear in the size of the instance on the left. There is active work to mitigate this(e.g. Codensity and Reflection without Remorse) but it’s out of the scope of this article.

Conclusion

I would argue that each of these approaches (except unsafe ADTs) has its pros and cons and therefore its place in some implementation. If I missed anything or made an error please let me know - I’ll be happy to update the post.

There are many reasons why to prefer dynamic linking to static but I’ll not go through them. Sometimes you just want static linking, period. In my case it was to show that Go’s static executables without dependencies are not something special and other languages can do it as good as well - Haskell included. My compiler of choice is GHC and I’m running ArchLinux. More on why this is important later.

-O is for optimizations, -static instructs GHC to do static compilation, -threaded includes pthread and -optl-static pushes -static flag to ld.

But it didn’t work. Instead I got a bunch of errors from ld telling me I’m missing librt and libgmp. Running locate librt turned up results as well as locate libgmp. I was flabbergasted.

Then I tried running the same thing on Ubuntu 12.04 LTS and it worked. The resulting binary also run on my Arch without problems. Now I was just sad. I tried searching online for my problem but apparently my google-fu is insufficient. I also tried setting gold as preferred linker but to no avail.

Few weeks later

Today I was playing around with C and when I got something working I decided to link it statically so I can send it off to a colleague who doesn’t have all these obscure libraries installed. And I hit into a similar problem. Now it couldn’t find libgc - a library I was using that worked like a charm when using dynamic linking.

Apparently the problem didn’t lie in GHC but in my linker. Time to put on my Sherlock Holmes hat and investigate.

Turns out I’m a bloody ignorant idiot. There are dynamic libraries(with .so extension) and there are static libraries(with .a extension). I remember knowing this once. And I had all dynamic libraries installed but not static. This was the root of my problems with GHC and now with GCC.

More researching turned up that Arch shies away from providing static libraries in order to encourage dynamic linking. If you want static objects you’ll have to build them from source.

Solution

I build libgmp and then also libc in order to get librt out. It wasn’t that long. But for your convenience here are the resulting files if you want them libgmp.alibrt.a

I dumped those into /usr/local/lib because I didn’t want to pollute my global libraries. Now I just need to convince GHC to use them. Easy. Just set LD_LIBRARY_PATH to that path.