Ganesh Sittampalamhttp://hsenag.livejournal.com/
Ganesh Sittampalam - LiveJournal.comSun, 23 Mar 2008 11:46:47 GMTLiveJournal / LiveJournal.comhsenag683501personalhttp://hsenag.livejournal.com/11803.htmlSun, 23 Mar 2008 11:46:47 GMTRestricted monads in Haskellhttp://hsenag.livejournal.com/11803.html
<br />I was playing around with restricted monads and came up with the following. It seems really simple, so I was wondering if it's either already known or obvious?<br /><br />The <i>restricted monad</i> problem is well-known in Haskell. We have some type constructor <font face="Courier New">Foo</font> and some restriction <font face="Courier New">Restr</font>, such that <font face="Courier New">Foo</font> is a monad, but only for contained types that are members of <font face="Courier New">Restr</font>. We can't make <font face="Courier New">Foo</font> an instance of <font face="Courier New">Monad</font>, because in the normal <font face="Courier New">Monad</font> class the types of <font face="Courier New">return</font> and <font face="Courier New">(&gt;&gt;=)</font> are fully polymorphic in the contained type. This in turn blocks us from using <font face="Courier New">do</font>-notation with our "monad". We can get round this using <font face="Courier New">NoImplicitPrelude</font> in GHC, but that's rather messy and means that normal <font face="Courier New">Monad</font>s don't work properly in that module.<br /><br />For concreteness, suppose that <font face="Courier New">Restr</font> is actually <font face="Courier New">Ord</font>, but we could use anything. We'll parameterise over the actual "monad" type, so we don't need to decide on that yet, but I have the usual <font face="Courier New">Set</font> example in mind.<br /><br />First, let's define a restricted monad class:<br /><blockquote><pre>
class OrdMonad m where
ordReturn :: Ord a =&gt; a -&gt; m a
ordBind :: (Ord a, Ord b) =&gt; m a -&gt; (a -&gt; m b) -&gt; m b
</pre></blockquote><br />Just to keep things concrete, obviously <font face="Courier New">Set</font> is a member of this:<br /><blockquote><pre>
instance OrdMonad Set where
ordReturn = Set.singleton
s `ordBind` f = Set.fold (\v ret -&gt; f v `Set.union` ret) Set.empty s
</pre></blockquote><br />Now, how can I make a monad from this? Let's start by defining a new type constructor, GADT-style. We intend to apply this type constructor to our <font face="Courier New">OrdMonad</font> instance.<br /><blockquote><pre>
data AsMonad m a where
</pre></blockquote><br />Now we need some data constructors. Firstly we want to be able to embed "proper" <font face="Courier New">OrdMonad</font>s. Here we'll need the full power of the <font face="Courier New">GADTs</font> extension, i.e. restricted return types:<br /><blockquote><pre>
Embed :: (OrdMonad m, Ord a) =&gt; m a -&gt; AsMonad m a
</pre></blockquote><br />OK so far, but what we're really after is a way to implement <font>return</font> and <font>(&gt;&gt;=)</font>. Well, let's take the easy way out:<br /><blockquote><pre>
Return :: OrdMonad m =&gt; a -&gt; AsMonad m a
Bind :: OrdMonad m =&gt; AsMonad m a -&gt; (a -&gt; AsMonad m b) -&gt; AsMonad m b
</pre></blockquote><br />Now we can implement <font face="Courier New">Monad</font> trivially (I'll ignore <font face="Courier New">fail</font>, but it's not hard to add):<br /><blockquote><pre>
instance OrdMonad m =&gt; Monad (AsMonad m) where
return = Return
(&gt;&gt;=) = Bind
</pre></blockquote><br />That was a nice bit of sleight-of-hand, but did it actually help? We've just delayed the problem till later.<br /><br />Well, actually it does help. "Later", what we'll want to do is get back to our <font face="Courier New">m a</font> type from <font face="Courier New">AsMonad m a</font>. But at this point we can restrict <font face="Courier New">a</font> to being in <font face="Courier New">Ord</font>. What we want is a function <font face="Courier New">unEmbed</font>:<br /><blockquote><pre>
unEmbed :: Ord a =&gt; AsMonad m a -&gt; m a
</pre></blockquote><br />The <font face="Courier New">Embed</font> case of <font face="Courier New">unEmbed</font> is easy:<br /><blockquote><pre>
unEmbed (Embed m) = m
</pre></blockquote><br />Since we've restricted <font face="Courier New">a</font>, the <font face="Courier New">Return</font> case is easy too:<br /><blockquote><pre>
unEmbed (Return v) = ordReturn v
</pre></blockquote><br />Now for <font face="Courier New">Bind</font>. Let's split that up into cases based on what the left-hand argument is. Yes, I know this seems like delaying the inevitable, that's how it felt to me too!<br /><br />If the left-hand argument is <font face="Courier New">Embed</font>, then both <font face="Courier New">a</font> and <font face="Courier New">b</font> are in <font face="Courier New">Ord</font>. So we can call <font face="Courier New">unEmbed</font> recursively and use <font face="Courier New">ordBind</font>:<br /><blockquote><pre>
unEmbed (Bind (Embed m) f) = m `ordBind` (unEmbed . f)
</pre></blockquote><br />For <font face="Courier New">Return</font>, one of the monad laws applies:<br /><blockquote><pre>
unEmbed (Bind (Return v) f) = unEmbed (f v)
</pre></blockquote><br />Now for the <font face="Courier New">Bind</font> case. My initial assumption when I was writing this code was that I'd be trapped in a loop, only able to break out the left argument of the inner <font face="Courier New">Bind</font> into yet more cases. Then I realised that actually we can just bring the monad laws to bear again:<br /><blockquote><pre>
unEmbed (Bind (Bind m f) g) = unEmbed (Bind m (\x -&gt; Bind (f x) g)))
</pre></blockquote><br /><br />And, well, that's it. We can use <font face="Courier New">do</font>-notation on the <font face="Courier New">AsMonad</font> type, and move freely between that and the base type using <font face="Courier New">Embed</font> and <font face="Courier New">unEmbed</font>.<br /><font face="Courier New">MonadPlus</font> is a simple addition along the same lines:<br /><blockquote><pre>
class OrdMonad m =&gt; OrdMonadPlus m where
ordMZero :: Ord a =&gt; m a
ordMPlus :: Ord a =&gt; m a -&gt; m a -&gt; m a
instance OrdMonadPlus Set where
ordMZero = Set.empty
ordMPlus = Set.union
data AsMonad m a where
(...)
MZero :: OrdMonadPlus m =&gt; AsMonad m a
MPlus :: OrdMonadPlus m =&gt; AsMonad m a -&gt; AsMonad m a -&gt; AsMonad m a
instance OrdMonadPlus m =&gt; MonadPlus (AsMonad m) where
mzero = MZero
mplus = MPlus
unEmbed :: Ord a =&gt; AsMonad m a -&gt; m a
(...)
unEmbed MZero = ordMZero
unEmbed (MPlus m1 m2) = ordMPlus (unEmbed m1) (unEmbed m2)
unEmbed (Bind MZero f) = unEmbed MZero
unEmbed (Bind (MPlus m1 m2) f) = unEmbed (MPlus (Bind m1 f) (Bind m2 f))
</pre></blockquote><br />Here's some test code:<br /><blockquote><pre>
newtype Wrap a = Wrap { unWrap :: a } -- not an Ord even if a is
test1 = unEmbed $ do x &lt;- Embed $ Set.fromList [6, 2, 3]
do y &lt;- return (Wrap x)
z &lt;- Embed $ Set.fromList [1..2]
guard (unWrap y &lt; 5)
return (unWrap y + z)
`mplus`
return 10
</pre></blockquote><br /><br />One annoyance is that we can't parametrise over typeclasses (at least not nicely), so we can't make <font face="Courier New">AsMonad</font> fully general, instead we need one for each restriction.<br /><br />Finally, if we are willing and able to add extra constructors to an existing type, I think it should be possible to directly make that type into a <font face="Courier New">Monad</font> using the same approach.<br /><br />The closest thing I've seen to this before is something like this: <a href='http://www.haskell.org/pipermail/haskell-cafe/2007-January/021086.html' rel='nofollow'>http://www.haskell.org/pipermail/haskell-cafe/2007-January/021086.html</a>. It's the same sort of approach, but I don't think it generalises to arbitrary restricted monads in the same way as this.<br /><a name='cutid1-end'></a>http://hsenag.livejournal.com/11803.htmlhaskellpublic15http://hsenag.livejournal.com/9648.htmlSun, 24 Sep 2006 18:53:47 GMTpatch-based versus tree-based merginghttp://hsenag.livejournal.com/9648.html
I don't normally post about deeply technical things I've been working on, but I've been thinking about this for a few days now and wanted somewhere public to record my conclusions.<br /><br />Almost uniquely amongst distributed version control systems (<a href="http://dev.libresource.org/home/doc/so6-user-manual" rel="nofollow">SO6</a> being the exception, but I'm not aware of it being in wide use), darcs is <i>patch-based</i> rather than <i>tree-based</i>. But what does this mean in practice?<br /><br />To get a better handle on this, we need to look at what happens during a merge, the key thing that a version control system needs to do.<br /><br />All of the tree-based systems are based around three-way merges. Given two repositories A and B to be merged, they pick a common ancestor (more on how later), diff each of them against the ancestor, then adjust one diff for any offset changes implied by the other diff, and apply it to the other repository. (Things become a bit more complicated in the case of merging directory operations such as file moves and renames, but there's nothing conceptually hard and I haven't investigated the details of how this is handled. It's not really important.)<br /><br />So, how is the ancestor chosen? Clearly it should be some repository state that did actually exist in the histories of A and B, or it would make no sense to use as a basis for the merge. It also should contain all the changes that have already been merged between A and B, because otherwise we will either get a spurious conflict, identical changes have to be silently merged, which can cause problems in the case where we really want a conflict from identical changes.<br /><br />Unfortunately, it is not necessarily the case that a repository state will exist with such a property. <a href='http://revctrl.org/CrissCrossMerge' rel='nofollow'>http://revctrl.org/CrissCrossMerge</a> gives some examples of when this can occur. It causes problems for most of the tree-based revision control systems.<br /><br />In fact, the correct solution is to <i>make up</i> an appropriate ancestor. It should contain precisely the intersection of the sets of changes in A and B. One way to construct it is to merge all the possible LCAs (an LCA of A and B is an ancestor of both A and B that is not itself an ancestor of another common ancestor of A and B). This is what <a href="http://git.or.cz/" rel="nofollow">git</a> does. However, I believe its solution goes wrong in the presence of conflicts. Suppose X, Y and Z are three such LCAs, and that they all conflict with each other. Git merges them in non-deterministic order, leaving the conflict markers in the merge results. The order in which they are merged will determine the order of conflict markers. This means that the contents of the base tree used for the merge are non-deterministic, which will make the merge results themselves non-deterministic. I haven't yet tried to actually construct an example of git going wrong, though. Another flaw with git is that if the conflict between two trees that need to be merged is at the directory level (e.g. between two different renamings of the same file), it just gives up.<br /><br />On the other hand, one of the fundamental properties of darcs is that any repository with the same set of patches must behave the same. When darcs does a merge, it implicitly constructs the correct ancestor as described above (the precise mechanics are different, but the effect is the same). Because of this fundamental property, it is guaranteed that merges are reproduceable; merge the same two sets of patches and you'll get the same result. Darcs handles the conflict scenario described above by competely ignoring the effects of conflicting patches when merging two repositories. (This strategy has its own problems, but that's a different topic entirely.)<br /><br />From a usability point of view, the fact that every merge in darcs is repeatable means that it doesn't need to track them as separate commits.<br /><a name='cutid1-end'></a><br /><br />I'm happy to discuss this further with anyone interested, either on #revctrl or #darcs on freenode, or in the comments of this post. However, they are just my current tentative conclusions after some investigation, so please don't take them as gospel truth or flame me too hard for being wrong :-)http://hsenag.livejournal.com/9648.htmldarcspublic2