This page shows several examples of how code can be improved. We try to derive general rules from them, though they cannot be applied deterministicly and are a matter of taste. We all know this, please don't add "this is disputable" to each item!

+

This page shows several examples of how code can be improved. We try to derive general rules from them, though they cannot be applied deterministically and are a matter of taste. We all know this, please don't add "this is disputable" to each item!

Instead, you can now add "this is disputable" on [[/Discussion]] and change this page only when some sort of consensus is reached.

Instead, you can now add "this is disputable" on [[/Discussion]] and change this page only when some sort of consensus is reached.

Line 9:

Line 9:

===Don't reinvent the wheel===

===Don't reinvent the wheel===

−

The standard libraries are full of useful functions, possibly too full. If you rewrite an existing function, the reader wonders what the difference to the standard function is. But if you use a standard function, the reader may learn something new and useful.

+

The standard libraries are full of useful, well-tuned functions. If you rewrite an existing library function, the reader of your code might spend a minute trying to figure out why you've done that. But if you use a standard function, the reader will either immediately understand what you've done, or can learn something new.

−

If you have problems finding an appropriate list function, try this guide:

+

−

http://www.cs.chalmers.se/Cs/Grundutb/Kurser/d1pt/d1pta/ListDoc/

+

===Avoid explicit recursion===

===Avoid explicit recursion===

Line 36:

Line 34:

and the reader knows that the complete list is processed and that each output element depends only on the corresponding input element.

and the reader knows that the complete list is processed and that each output element depends only on the corresponding input element.

−

If you don't find appropriate functions in the standard library, extract a general function. This helps you and others understand the program. Haskell is '''very''' good at factoring out parts of the code.

+

If you don't find appropriate functions in the standard library, extract a general function.

−

If you find it very general, put it in a separate module and re-use it. It may appear in the standard libraries later, or you may later find that it is already there.

+

This helps you and others understand the program.

+

Thanks to [[higher order function]]s Haskell gives you '''very''' many opportunities to factor out parts of the code.

+

If you find the function very general, put it in a separate module and re-use it. It may appear in the standard libraries later, or you may later find that it is already there in an even more general way.

Decomposing a problem this way also has the advantage that you can debug more easily. If the last implementation of <hask>raise</hask> does not show the expected behaviour, you can inspect <hask>map</hask> (I hope it is correct :-) ) and the invoked instance of <hask>(+)</hask> separately.

Decomposing a problem this way also has the advantage that you can debug more easily. If the last implementation of <hask>raise</hask> does not show the expected behaviour, you can inspect <hask>map</hask> (I hope it is correct :-) ) and the invoked instance of <hask>(+)</hask> separately.

−

''Could this be stated more generally? It seems to me this is a special case of the general principle of separating concerns: iteration over a collection vs operating on elements of a collection should apply. If you can write the loop over a data structure (list, tree, whatever) once and debug it, then you don't need to duplicate that code over and over (at least in haskell), so your code can follow the principle of Wiki:OnceAndOnlyOnce ; Wiki:OnceAndOnlyOnce is a lot harder in languages that don't provide a certain level of functional programming support (i.e. Java requires copy and paste programming, the delegate C# syntax is clumsy but workable - using it is almost Wiki:GoldPlating).''

+

''This is a special case of the general principle of [[separation of concerns|separating concerns]]. If you can write the loop over a data structure once and debug it, then there's no need to duplicate that code.''

−

Another example:

+

Another example: the function <hask>count</hask> counts the number of elements which fulfill a certain property, i.e. the elements for which the predicate <hask>p</hask> is <hask>True</hask>.

−

The function <hask>count</hask> counts the number of elements

+

−

which fulfill a certain property,

+

−

i.e. the elements for which the predicate <hask>p</hask> is <hask>True</hask>.

+

I found the following code (but convoluted in a more specific function) in a Haskell program

I found the following code (but convoluted in a more specific function) in a Haskell program

Line 63:

Line 60:

</haskell>

</haskell>

.

.

−

===Only introduce identifiers you need===

===Only introduce identifiers you need===

Line 151:

Line 147:

? It is no longer necessary to compute the length of <hask>l</hask> again and again. The code is easier to read and it covers all special cases, including <hask>tuples (-1) [1,2,3]</hask>!

? It is no longer necessary to compute the length of <hask>l</hask> again and again. The code is easier to read and it covers all special cases, including <hask>tuples (-1) [1,2,3]</hask>!

−

''Eliminating the <hask>length</hask> test can worsen performance dramatically in some cases, like <hask>tuples 24 [1..25]</hask>. We could also use <hask>null (drop (r-1) l)</hask> instead of <hask>length l < r</hask>, which works for infinite lists. See also [[Things to avoid#Don't ask for the length of a list, if you don't need it|below]].''

+

''Eliminating the <hask>length</hask> test can worsen performance dramatically in some cases, like <hask>tuples 24 [1..25]</hask>. We could also use <hask>null (drop (r-1) l)</hask> instead of <hask>length l < r</hask>, which works for infinite lists. See also [[#Don't ask for the length of a list, if you don't need it|below]].''

You can even save one direction of recursion

You can even save one direction of recursion

Line 212:

Line 208:

which is both clearer and re-usable.

which is both clearer and re-usable.

Actually, starting with GHC-6.6 you do not need to define <hask>comparing</hask>, since it is already in module <hask>Data.Ord</hask>.

Actually, starting with GHC-6.6 you do not need to define <hask>comparing</hask>, since it is already in module <hask>Data.Ord</hask>.

In the particular example you would write <hask>any isSpace</hask>, of course.)

+

The same way

+

<haskell>

+

allPrintable (a:as)

+

| isSpace a = False

+

| otherwise = allPrintable as

+

</haskell>

+

and

+

<haskell>

+

allPrintable (a:as) = if isSpace a then False else allPrintable as

+

</haskell>

+

can be shortened to

+

<haskell>

+

allPrintable (a:as) = not (isSpace a) && allPrintable as

+

</haskell>

+

(but <hask>all (not . isSpace)</hask> is even better in this particular example.)

Line 384:

Line 411:

which implements a factorial function. This example, like a lot of uses of guards, has a number of problems.

which implements a factorial function. This example, like a lot of uses of guards, has a number of problems.

−

The first problem is that it's nearly impossible for the compiler to check if guards like this are exhaustive, as the guard conditions may be arbitrarily complex (GHC will warn you if you use the <code>-Wall</code> option). To avoid this problem and potential bugs through non exhaustive patterns you should use an <hask>otherwise</hask> guard, that will match for all remaining cases:

+

The first problem is that it's nearly impossible for the compiler to check whether guards like this are exhaustive, as the guard conditions may be arbitrarily complex (GHC will warn you if you use the <code>-Wall</code> option). To avoid this problem and potential bugs through non exhaustive patterns you should use an <hask>otherwise</hask> guard, that will match for all remaining cases:

<haskell>

<haskell>

Line 403:

Line 430:

else n * fac (n-1)

else n * fac (n-1)

</haskell>

</haskell>

−

Note that <hask>if</hask> has its own set of problems, for example in connection with the layout rule or that nested <hask>if</hask>s are difficult to read. See ["Case"] how to avoid nested <hask>if</hask>s.

+

Note that <hask>if</hask> has its own set of [[If-then-else|problems]], for example in connection with the layout rule or that nested <hask>if</hask>s are difficult to read. See [[Case]] how to avoid nested <hask>if</hask>s.

But in this special case, the same can be done even more easily with pattern matching:

But in this special case, the same can be done even more easily with pattern matching:

Line 421:

Line 448:

This may also be more efficient as <hask>product</hask> might be optimized by the library-writer... In GHC, when compiling with optimizations turned on, this version runs in O(1) stack-space, whereas the previous versions run in O(n) stack-space.

This may also be more efficient as <hask>product</hask> might be optimized by the library-writer... In GHC, when compiling with optimizations turned on, this version runs in O(1) stack-space, whereas the previous versions run in O(n) stack-space.

−

Note however, that there is a difference between this version and the previous ones: When given a negative number, the previous versions do not terminate (until StackOverflow-time), while the last implemenation returns 1.

+

Note however, that there is a difference between this version and the previous ones: When given a negative number, the previous versions do not terminate (until StackOverflow-time), while the last implementation returns 1.

Line 435:

Line 462:

or compare the following example using the advanced [[pattern guard]]s

or compare the following example using the advanced [[pattern guard]]s

However, they are often criticised for hiding computational complexity and producing ambiguities, see [[/Discussion]] for details. They are subsumed by the more general [[Views]] proposal, which has unfortunately never been implemented despite being around for quite some time now.

+

However, they are often criticized for hiding computational complexity and producing ambiguities, see [[/Discussion]] for details. They are subsumed by the more general [[Views]] proposal, which has unfortunately never been implemented despite being around for quite some time now.

+

The <hask>n+k</hask> patterns are not in the [[Haskell 2010]] standard.

==Efficiency and infinity==

==Efficiency and infinity==

Line 581:

Line 608:

−

===Choose the right fold===

+

===Choose the appropriate fold===

−

See [[Stack overflow]] for advice on which fold is appropriate for your situation.

+

See "[[Stack overflow]]" or "[[Foldr Foldl Foldl']]" for advice on which fold is appropriate for your situation.

Line 609:

Line 636:

If you really need random access, as in the Fourier Transform,

If you really need random access, as in the Fourier Transform,

−

you should switch to [http://haskell.org/haskellwiki/Arrays Arrays].

+

you should switch to [[Arrays]].

Line 627:

Line 654:

====Lists are not finite maps====

====Lists are not finite maps====

−

Similarly, lists are not finite maps, as mentioned in [[efficiency hints]].

+

Similarly, lists are not finite maps, as mentioned in [[Efficiency_hints#Data.Sequence_vs._lists | efficiency hints]].

−

+

===Reduce type class constraints===

===Reduce type class constraints===

Line 691:

Line 717:

+

===Avoid redundancy in data types===

+

+

I often see data types with redundant fields, e.g.

+

<haskell>

+

data XML =

+

Element Position Name [Attribute] [XML]

+

| Comment Position String

+

| Text Position String

+

</haskell>

+

+

I suggest to factor out the common field <hask>Position</hask>

+

since this lets you handle the text position the same way for all XML parts.

+

<haskell>

+

data XML = XML Position Part

+

+

data Part =

+

Element Name [Attribute] [XML]

+

| Comment String

+

| Text String

+

</haskell>

==Miscellaneous==

==Miscellaneous==

Line 702:

Line 748:

that is you don't have to specify an order of execution

that is you don't have to specify an order of execution

and you don't have to worry about what computations are actually necessary.

and you don't have to worry about what computations are actually necessary.

−

You can easily benefit from lazy evaluation

+

Useful techniques are described in [[Avoiding IO]].

−

if you process data purely functionally

+

−

and output it by a short IO interaction.

+

−

+

−

<haskell>

+

−

-- import Control.Monad (replicateM_)

+

−

replicateM_ 10 (putStr "foo")

+

−

</haskell>

+

−

is certainly worse than

+

−

<haskell>

+

−

putStr (concat $ replicate 10 "foo")

+

−

</haskell>

+

−

+

−

Similarly,

+

−

<haskell>

+

−

do

+

−

h <- openFile "foo" WriteMode

+

−

replicateM_ 10 (hPutStr h "bar")

+

−

hClose h

+

−

</haskell>

+

−

can be shortened to

+

−

<haskell>

+

−

writeFile "foo" (concat $ replicate 10 "bar")

+

−

</haskell>

+

−

which also ensures proper closing of the handle <hask>h</hask>

+

−

in case of failure.

+

−

+

−

A function which computes a random value

+

−

with respect to a custom distribution

+

−

(<hask>distInv</hask> is the inverse of the distribution function)

+

−

can be defined via IO

+

−

<haskell>

+

−

randomDist :: (Random a, Num a) => (a -> a) -> IO a

+

−

randomDist distInv = liftM distInv (randomRIO (0,1))

+

−

</haskell>

+

−

but [[Humor/Erlkönig|there is no need to do so]].

+

−

You don't need the state of the whole world

+

−

just for remembering the state of a random number generator.

+

−

What about

+

−

<haskell>

+

−

randomDist :: (RandomGen g, Random a, Num a) => (a -> a) -> State g a

+

−

randomDist distInv = liftM distInv (State (randomR (0,1)))

+

−

</haskell>

+

−

?

+

===Forget about quot and rem===

===Forget about quot and rem===

−

They complicate handling of negative dividends.

+

They complicate handling of negative dividends. <hask>div</hask> and <hask>mod</hask> are almost always the better choice. If <hask>b > 0</hask> then it always holds

−

<hask>div</hask> and <hask>mod</hask> are almost always the better choice.

+

−

If <hask>b > 0</hask> then it always holds

+

<haskell>

<haskell>

a == b * div a b + mod a b

a == b * div a b + mod a b

Line 758:

Line 759:

</haskell>

</haskell>

The first equation is true also for <hask>quot</hask> and <hask>rem</hask>,

The first equation is true also for <hask>quot</hask> and <hask>rem</hask>,

−

but the two others are true only for <hask>mod</hask>, but not for <hask>rem</hask>.

+

but the two others are true only for <hask>mod</hask>, but not for <hask>rem</hask>. That is, <hask>mod a b</hask> always wraps <hask>a</hask> to an element from <hask>[0..(b-1)]</hask>, whereas the sign of <hask>rem a b</hask> depends on the sign of <hask>a</hask>.

−

That is, <hask>mod a b</hask> always wraps <hask>a</hask> to an element from <hask>[0..(b-1)]</hask>,

+

−

whereas the sign of <hask>rem a b</hask> depends on the sign of <hask>a</hask>.

+

−

This seems to be more an issue of experience rather than one of a superior reason.

+

This seems to be more an issue of experience rather than one of a superior reason. You might argue, that the sign of the dividend is more important for you, than that of the divisor. However, I have never seen such an application, but many uses of <hask>quot</hask> and <hask>rem</hask> where <hask>div</hask> and <hask>mod</hask> were clearly superior.

−

You might argue, that the sign of the dividend is more important for you, than that of the divisor.

+

−

However, I have never seen such an application,

+

−

but many uses of <hask>quot</hask> and <hask>rem</hask> where <hask>div</hask> and <hask>mod</hask> were clearly superior.

This page shows several examples of how code can be improved. We try to derive general rules from them, though they cannot be applied deterministically and are a matter of taste. We all know this, please don't add "this is disputable" to each item!

Instead, you can now add "this is disputable" on /Discussion and change this page only when some sort of consensus is reached.

The standard libraries are full of useful, well-tuned functions. If you rewrite an existing library function, the reader of your code might spend a minute trying to figure out why you've done that. But if you use a standard function, the reader will either immediately understand what you've done, or can learn something new.

because it is hard for the reader to find out
how much of the list is processed and
on which values the elements of the output list depend.
Just write

raise x ys =map(x+) ys

or even

raise x =map(x+)

and the reader knows that the complete list is processed and that each output element depends only on the corresponding input element.

If you don't find appropriate functions in the standard library, extract a general function.
This helps you and others understand the program.
Thanks to higher order functions Haskell gives you very many opportunities to factor out parts of the code.
If you find the function very general, put it in a separate module and re-use it. It may appear in the standard libraries later, or you may later find that it is already there in an even more general way.

Decomposing a problem this way also has the advantage that you can debug more easily. If the last implementation of

raise

does not show the expected behaviour, you can inspect

map

(I hope it is correct :-) ) and the invoked instance of

(+)

separately.

This is a special case of the general principle of separating concerns. If you can write the loop over a data structure once and debug it, then there's no need to duplicate that code.

Another example: the function

count

counts the number of elements which fulfill a certain property, i.e. the elements for which the predicate

p

is

True

.

I found the following code (but convoluted in a more specific function) in a Haskell program

Don't forget that zero is a natural number. Recursive definitions become more complicated if the recursion anchor is not chosen properly. For example the function

tupel

presented in DMV-Mitteilungen 2004/12-3, Jürgen Bokowski: Haskell, ein gutes Werkzeug der Diskreten Mathematik (Haskell, a good tool for discrete mathematics). This is also a good example of how to avoid guards.

As a rule of thumb, once your expression becomes too long to easily be point-freed, it probably deserves a name anyway.
Lambdas are occasionally appropriate however, e.g. for control structures in monadic code (in this example, a control-structure "foreach2" which most languages don't even support.):

People who employ syntactic sugar extensively
argue that it makes their code more readable.
The following sections show several examples
where less syntactic sugar is more readable.

It is argued that a special notation is often
more intuitive than a purely functional expression.
But the term "intuitive notation" is always a matter of habit.
You can also develop an intuition for analytic expressions
that don't match your habits at the first glance.
So why not making a habit of less sugar sometimes?

which implements a factorial function. This example, like a lot of uses of guards, has a number of problems.

The first problem is that it's nearly impossible for the compiler to check whether guards like this are exhaustive, as the guard conditions may be arbitrarily complex (GHC will warn you if you use the -Wall option). To avoid this problem and potential bugs through non exhaustive patterns you should use an

Another reason to prefer this one is its greater readability for humans and optimizability for compilers. Though it may not matter much in a simple case like this, when seeing an

otherwise

it's immediately clear that it's used whenever the previous guard fails, which isn't true if the "negation of the previous test" is spelled out. The same applies to the compiler: It probably will be able to optimize an

otherwise

(which is a synonym for

True

) away but cannot do that for most expressions.
This can be done with even less sugar using

if

,

-- Less sugar (though the verbosity of if-then-else can also be considered as sugar :-)
fac ::Integer->Integer
fac n =if n ==0then1else n * fac (n-1)

Note that

if

has its own set of problems, for example in connection with the layout rule or that nested

might be optimized by the library-writer... In GHC, when compiling with optimizations turned on, this version runs in O(1) stack-space, whereas the previous versions run in O(n) stack-space.

Note however, that there is a difference between this version and the previous ones: When given a negative number, the previous versions do not terminate (until StackOverflow-time), while the last implementation returns 1.

n+k

patterns

In order to allow pattern matching against numerical types, Haskell 98 provides so-called n+k patterns, as in

take::Int->[a]->[a]take(n+1)(x:xs)= x: take n xs
take__=[]

However, they are often criticized for hiding computational complexity and producing ambiguities, see /Discussion for details. They are subsumed by the more general Views proposal, which has unfortunately never been implemented despite being around for quite some time now.

A rule of thumb is:
If a function makes sense for an infinite data structure but the implementation at hand fails for an infinite amount of data, then the implementation is probably also inefficient for finite data.

It's not good to use the IO Monad everywhere,
much of the data processing can be done without IO interaction.
You should separate data processing and IO
because pure data processing can be done purely functionally,
that is you don't have to specify an order of execution
and you don't have to worry about what computations are actually necessary.
Useful techniques are described in Avoiding IO.

.
This seems to be more an issue of experience rather than one of a superior reason. You might argue, that the sign of the dividend is more important for you, than that of the divisor. However, I have never seen such an application, but many uses of

quot

and

rem

where

div

and

mod

were clearly superior.

Examples:

Conversion from a continuously counted tone pitch to the pitch class, like C, D, E etc.: