Opinions and Programming

Stuff about programming,programming style, maintainability, testability.Dedicated to my coworkers and friends. Everyone is welcome to leave comments or disagree with me. This blog does not represent views or opinions of my employer.

Saturday, September 3, 2016

Let me start with this statement: I have learned ways to write and structure code I did not ever thought possible. Ways which my OO trained mind has hard time accepting as real or even believable. Yet, in many ways, these ways are more correct and real than the ways I knew before.

This reading is intended for OO developers. This post will talk about some amazing code that is possible in FP. It will present high level intuitions and concepts behind that code. It will skip the steep theoretical background. When you see monad (or whatever) and do not know what it is, simply think it is ‘a computation structure that allow me to do that’ and move on. The focus is on ‘what is possible’, not on details of ‘how it's done’.

If you are an experienced Haskeller and would like to comment or add to my presentation you are welcome to do so as well.

It all started innocent. I just wanted to check if I can implement, in Haskell, a particularly complex piece of imperative logic I wrote for my work. So I did just that.

This logic was about graphs (as in wikipedia). At work I had to create a lot of graphs to test various special cases in the software. So, I decided to check how graph construction would work in Haskell. If I can build graphs, I should be able to adjust them too....

I was always interested in polymorphism. What it is, what it can do, what it can be. So what exactly is a graph? Can I define it using Haskell type constraints? So I worked on that.

Now I have these polymorphic Haskell graph type classes, I have a type class for readonly graphs, one for buildable graphs, one for adjustable graphs with inheritance going between them. I also have 5 or six specific graph instance types that `implement` my type classes, like adjacency map (edges-by-vertices) type, adjacency matrix type, simple list of edges and vertices types and so on. I start to play with graph creation.

Polymorphic data.Polymorphic what? So I created a polymorphic diamond (wikipedia) graph data structure. Which instance is it? It is all of them. It is polymorphic so it can be whichever you like! You want to look at adjacency matrix just select AdjacencyMatrix type, you want to look at is as EdgesByVertices map just select that type, and so on. So how weird that is for an OO programmer? Polymorphic data structure!

But it gets better than that. How about polymorphic graph distance calculation which can either return integer distance or shortest path depending on which type I select?

To demystify this a bit, Haskell type class polymorphism is opposite to the OO inheritance. OO is based on exists quantifier, Haskell on forall quantifier. That allows Haskell typechecker to enforce that polymorphic code does only things that apply to all instances. For example, this OO code is not truly polymorphic:

Graph createDiamond(){def res=new AdjacencyMap(); return res;}

because that code can manipulate AdjacencyMap, which is a specific instance. Did I muddy the waters even more? Just think about Haskell polymorphism as an upside-down version of OO (whatever that means) and let’s move on.

Now I am thinking: I have all these implementations of my graph type class (or -sses as there is Graph, DiGraph, EdgeSemantics, AdjacencyIndex and so on...) are they correct? How do I test these?

And, I thought, I wrote automated tests before.What does it mean for a graph to be correct or not correct? There must be some graphish things that make graphs ... well graphs. Call them graph laws or graph properties. I need to define them.

I came up with a set of graphish properties to verify my implementing instance types. How do I actually test these? How about I generate lots of random graphs and check that each satisfies the property in question. How hard would that be? Easy! Property based testing appears to be the most popular way to test stuff in Haskell. The library is called QuickCheck. What an interesting way to test! I have always believed that good testing should have some random aspect to it. And so I did it and got addicted a bit more.

Do-it-yourself polymorphism. But are type classes the only way to do polymorphism in Haskell? Do I even need Haskell to do polymorphism for me? Can I just do it myself? So what is polymorphism again? How about:

Ability to write one program and use it (consume it) in different ways

Or, write different programs and consume them the same way

How can I do that without interfaces, type classes, language inheritance? Here is an idea: I write my code as bunch of specialized DSLs (Domain Specific Languages) and write several interpreters to run these. DSL programs are abstract syntax trees and nothing more, interpreters are computations that actually do something (or many things) with these trees. This way I decouple production (DSL) from consumption (Interpreter) and things should be good. I will have a do-it-myself polymorphism, strong decoupling of concerns, and more.

Does the very idea sound weird to you? Actually, this approach is now a recognizable pattern for structuring code in functional programming. I may do DSL - interpreter decoupling simply because of code clarity, maintainability and composability benefits. Composa what?

By composable DSLs, I mean DSLs that do not know of each other existence and of the polyglot DSL that uses them. Similarly, interpreters need to be agnostic of other interpreters and compose into a polyglot interpreter.

Good to pause here. Think how easy would it be to do this in OO language. Try to visualize a syntax tree of a polyglot program. It will look like a multicolor ‘zebra tree’ with complete mishmash of instructions from various DSLs. How any interpreter could ever make sense of it? But, they compose just fine...

I will present several different ways this can be done, all with computation structure totally foreign to OO. And all of this in a type system that is strong, second to none strong, no duck typing.

DSLs for free: Each DSL language has its own type and inhabitants of that type are simply programs in that DSL. Free is a type, and is (among other things) a language generator.

Creating a DSL is a simple as typing this:

type MyDSL = Free MyDslInstructions

Well, there is also a completely mechanical conversion of instructions into the new DSL type. The cost is minimal and will probably will get automated with Template Haskell at some point.

I can now start writing programs in that DSL and Haskell type checker will verify that they are correct.

This does not look real! That Free must be some large chunk of code! Actually it is 1 line 45 characters long. So what gives? It is all about finding right computation structure to do the job. I will try to provide some intuitions.

Intuition: Start with a set of, preferably orthogonal, abstract instructions for a language. That is the MyDslInstructions type, it need to be a sum type (more on this later). Free type definition is recursive. Free explodes MyDslInstructions type to a type that includes all possible sequences of instructions. Programs are sequences of instructions and, bingo, we have what we want.

I also need to write an interpreter for my new DSL, but in this post let me assume this has been done already. So how could DSL composition work?

Computing programs or computing on programs?I have one DSL that specializes in a graph traversal and another one that knows how to build a tree. How about I use both to create a spanning (wikipedia) tree for the traversal? Each traversal step can be mapped to a simple program that adds that step to a tree. Then, all the simple DSL-Tree programs can be reduced into one big program. That program is returned as result and interpreted separately. It builds the tree.

I just have done map-reduce and my reduction step was done on programs! How cool! And I am getting addicted just a little bit more.

Stacking DSLs one on top of the other (like an onion) is another way DSLs can work together. Onion analogy also describes how interpretation would work, it peels layer after layer, working on one DSL at the time. That approach involves a very clever trick:

Intuition: Language of type A can host an arbitrary language of type B by providing an instruction that wraps state B (internal computation state of that language) in a 'bubble' that has type A. This way A has consistent state and can host B instructions. Clever!

This trick is not specific to DSLs and is widely used in Haskell. (keyword: monad transformer).

Actually as sophisticated as this is, there are more interesting ways to compose DSLs. But, before I can explain what they are, I need to take a sidetour and introduce 2 other unbelievable concepts:

Sidetour: Adding and multiplying Types. Imagine that we could define operators that work, hmm, on types! We could do crazy stuff like this (assume that the ‘ingredients’ Word4, Name, Email,… Get, Put, Up etc are all types):

:+: represents a type sum, also known as coproduct (type inhabitants inhabit one of the ingredient types). Product and coproduct are dual concepts. If you do not know what dual means, just think about it on intuitive level as ‘exact opposite’.

Even more interesting is the case of parameterized types, it would be nice to have something like this:

type Collection a = (List :+: Set) a

another words, Collection Int is either List Int or Set Int but is not Set String.

This sci-fi is reality. Type operators are more a syntax sugar, but the real eureka moment is that types can be product or coproduct and that you can do full fledged algebra on types. So there is a type representing 0 and type representing 1 and the usual laws like a :*: (b :+: c) == (a :*: b) :+: (a :*: c) (with some relaxed definition of == of course). Algebras are a good thing, algebras do not have bugs.

FYI, all or almost all of the types in traditional OO are product. Java class instances have all the class methods, not one of its methods. Some coproduct behavior can be modeled with inheritance but that approach is largely missing the boat. And, incidentally, OperatorTypes is Haskell language extension so you can write a very similar code to the pseudo-code examples I just wrote.

I hope you can see that it is hard to stop learning this stuff. But how is this related to the Free DSL story? It turns out that Free plays nice with coproduct types. Free ‘preserves’ coproduct. We will use this fact in a bit.

Sidetour: Extensible Effects. Probably, the most touted advantage of pure FP is that it provides type safety over programs that can do criminal activity: the side-effects. Haskell wraps effects (IO, variable mutations, etc) in a monad type. That type discloses the crime and the mark of shame cannot be erased. Unfortunately these monad types are a bit coarse grained. In particular, IO monad allows for any side effect there is.

Sidenote: I have heard of only 2 commercial languages that can provide this level of type safety: Haskell and Purescript (which is very similar to Haskell). But, I know nothing about OCaml, so maybe 3?.

With DLS-Interpreter pattern effect type safety really starts to shine. Effectfull type signatures distribute nicely between DSL and Interpreter. Here is how type signature of a program that uses a FileSystem DSL and a REST DSL and returns a String may look like:

myDslProgram :: Free (FileSystem :+: REST) String

Types are amazing! There is much more to extensible effects than what I have described, let’s get back to composing DSLs:.

DSL Injections, Polymorphic DSLs, and Interpreters are folds. This idea is conceptually simple to explain on a very high level, but there is a bit depth to it and it took me most of this summer to get it. As before, I have:

type MyDSL = Free MyDslInstructions

If I can map MyDslInstructions into bigger sum BigLanguageInstructions, it should be intuitive and expected that sequences of MyDslInstructions (programs) become sequences of mapped instructions (programs) in the bigger language. Thus, I can ‘inject’ MyDSL into any DSL that is an ‘extension’ of my language (can talk my instructions set). In particular I should be able to inject MyDSL into:

type BiggerDSL = Free (MyDslInstructions :+: SomeOtherDsl)

And the same is true about FreeSomeOtherDsl language. This means that DSLs compose and instructions (adjusted with ‘inject’ function) can be used in the polyglot composition. It turns out that the power of Haskell upside-down polymorphic type classes allows me to specify ‘injection’ without even knowing BiggerDSL! And I can define MyDSL instructions so they can be simply be used ‘as is’ in polyglot programs.

Interpreters are generalized folds. If you worked with folds before (e.g. map-reduce), this maybe one of these simultaneous aha and wow moments. Program is a sequence of instructions that need to be folded to produce result. It turns out that this can be also done polymorphically allowing interpreters to run in the context of a larger, polyglot program.

Sidenote (Edited): A good intuition for DSL ‘injection’ is to use analogy with Sets. Sets can be added (sum) and multiplied (cartesian product) and so can Types. Sets have a partial order (inclusion) defined by injections between sets, well, so have Types! How is that for polymorphism!?

Sidenote: There is a good amount of depth here as this solves Expression Problem in Haskell and is a testimony to FP expressiveness(wikipedia and Data types a la carte).

But this is not the end of it. Now the story gets really unreal.

Program-interpreter duality. DSLs add, Interpreters multiply! This amazing program simplicity is achieved by pairing dual concepts (types). Product types are dual to coproduct types. Free type has its dual: the Cofree type. Each of the DSL abstract instruction can be paired with corresponding interpreter that has dual type signature. By exploring all of these dualities DSLs combine almost without any code! Interpreters are simply co-programs.

Earlier I have created a DSL using this line of code:

type MyDSL = Free MyDslInstructions

For this to work properly MyDslInstructions type needs to be a coproduct type.

There is a dual version of the whole Free story. In nutshell we get this:

type MyDSL = Free MyDslInstructions

type MyInterpreter = Cofree MyInterpretations

MyInterpretations contains one abstract interpreter per abstract instruction, and as you are probably guessing it needs to be a product type. In addition, each of the abstract ‘interpretations’ needs to have signature dual to that of corresponding abstract instruction.

What this all amounts to is that languages add and interpreters multiply. That makes languages and interpreters very composable:

type HighLevelDSL

= Free (abstrInstructions1 :+: abstrInstructions2 :+: ..)

type HighLevelInterpreter

= Cofree (abstrInterpretations1 :*: abstrInterpetations2 :*: ..)

This was on type level but on the implementation level things are equally super easy.

If you would like to see more details, you can find examples in the Dave Laing blog linked below or in my github graph library, also linked.

That does sound to me like the best thing since the sliced bread. It allows for writing very composable code and for a very nice separation of concerns.

Limitations: There is one which, to me, makes things even more interesting: If I want my DSLs to be monadic (see Effects section of this blog) my interpreters will be comonadic and very un-effectful.

Basically, as a software designer, I have some decisions to make how do I compose my DSLs. Hey that’s what I am paid for as a developer! Wait, I am paid to work on mutating instance variables or something as criminal as Hibernate. Scratch that idea.

Vertical DSLs. With the DSL-Interpreter pattern, I have some choices to make. Do I code a fat DSL and a thin Interpreter or a thin DSL and a fat Interpreter? My OO mind shouts: fat interpreter, fat interpreter ... But there is often a better way, as described here: A Modern Architecture for FP

In nutshell, one quite straightforward way DSLs could work together is by implementing higher level DSLs in terms of lower level DSLs.

Intuition: Basically, to implement higher level DSL_HL in terms of DSL1, … DSLn, I need to implement every instruction in my DSL_HL as a polyglot program using DSL1, … DSLN. There is quite a bit of structure to this approach as well but I will spare you the theory.

So maybe the answer will be that nothing should ever get fat. There will just be more vertical DSL layers. This way we could get clear code separation that could accomplish the, so far hopeless, quest for separation of concerns (wikipedia).

So here is the sci-fi for you again: Think of coding your next app as a layered set of DSLs. Your app functionality, all business logic, everything, is decomposed into thin layers or single purpose DSLs.

I hope I did not lose you. The rest of the post should be easier to read.

Unreal FP reality. If the things I have described in my post look unreal to you, it is probably because your programming background matches mine.

Polymorphic data: On a daily basis we write code that uses LSP (Liskov substitution) so results of polymorphism that is based on type constraints and 'forall' quantifiers will appear unintuitive and unreal.

Free DSLs: As OO programmers we work almost only with product types; coproduct types are very much foreign to us. Monadic computing is totally foreign to us even though it is often a better match to real life problems we are trying to model and solve. The structure we use in our code is comonadic but we do we even know that? We use some free types like Lists but we think of them as 'arrays' and totally block out different ways of thinking: like recursive data types. Y-combinator means a company to us.

Types and Effects: As OO programmers, we have no need to learn Type Theory. As for the 'Trinity of Computer Science' (Proof Theory, Type Theory, Category Theory) we typically know none.Our types lack generality and, thus, are weak. They also do not provide much for code transparency as we can hide almost any implementation under any type signature. The concept of types that actually restrict programs (at the extreme end, think about a type that cannot be inhabited by any program) feels totally foreign. So does any computing on types like: type algebra, type recursion, or functors. Basically, our types just sit there and look pretty.

There is no magic here, we just lack intuition and conceptual understanding of how the code can be structured. The unreal becomes real after you figure out the structure that makes it work. What we are missing out on are concepts not Haskell.

Convictions I have abandoned this summer: I used to say stuff like this and I hear other people talk like this today:

"There is no such thing as a free lunch, the code needs to be somewhere to do these things"

The idea is to program with concepts, not with brute force! Placing computation in a correct computational structure will cut lots of lines of code.

Haskell has very different approach to generalization with very different end goals. Generalization in Haskell applies to large chunks of code as well as to all small sections of each line. Generalization yields a stronger typed code, often uncovers structure of the computation, and opens up the code reuse. It is a different ball game. KISS no longer applies.

"Programming is a branch of computer science"

I may have overdone this one a bit :) Did you know that there is a branch of math devoted to studying, well polymorphism and concepts that generalize computations? It is called Category Theory. In this post, I have purposefully silenced evidence of Category Theory use. Programming using category theory concepts is really polymorphic, across different domains of science polymorphic! Most importantly these concepts really work.

"Anything can be implemented, the question only is how much time is going to take"

Your compiler must be very forgiving. Try working with GHC ;-).

This is a manifestation of imperative ‘here are the instructions’ thinking. OO design provides a recipe to do anything. This expressiveness is a problem. It yields brute force solutions and not maintainable code. As imperative programmer I am jack of all trades and master of none. Imperative is sometimes needed. But I refuse for this to be all there is.

The End: Thank you for staying with me for so long and reading my blog. This post has 8 pages already, it covered just one path into FP, there are many other paths and unreal things to explore. Even the coverage of DSL-Interpreter pattern is far from complete (how about DSL language optimizations?).

There is an amazing FP programming world out there. The best way to embrace it is to start playing with a true FP language like Haskell. Nobody will ever learn monads by reading ‘monads in C#’ or ‘monads in Java’ articles. You will just get a headache. Get inspired to start playing with FP using FP! It is worth noting that Haskell has inspired a lot of Scalaz concepts. That includes Free DSL pattern in Scala. These people are playing with Haskell!

Wiki pages for that project are all literate programs. Literate programs are better than blogger! ... But since my typing is terrible, these maybe not as literate as one would have hoped.

Acknowledgements:

To the generous FP community: thank you for spending time to share and, thus, help me learn! I cannot possibly include here a complete list of excellent sources that should have been acknowledged in this post. I do feel guilty about it. Here is my very abbreviated list:

Data types a la carte original paper by Wouter Swierstra defines many of the presented ideas and describes them in much more detail.

Cofun with cofree comonads is amazing sequence of posts that includes Free-Coffree pattern by Dave Laing. These post also contain a better list of due credits.

Saturday, October 25, 2014

This series was born out of my frustration with Grails. But, instead of making it a comprehensive criticism of the framework, I have decided to focus on a few GORM and Hibernate issues. I had several reasons to do that.

Why GORM/Hibernate focus?
There is quite a few blogs which basically say: Grails is very buggy and then provide few or no details. There are also many blogs saying Grails is great and then provide equal amount of fluff to support their claim. All of this becomes very subjective.

(Section Edited for clarity, Oct 30, 2014)
It is not that hard to demonstrate that this is a very buggy environment. It has been founded on Groovy, and, in my experience, Groovy is and always was is a very buggy language. Here is one curious example (tested with Groovy 2.3.6, other versions I checked behave the same way): 1 as Long == 1 as Integer //true (note, false in Java) 1 as Integer == 1 as Long //true (note, false in Java) [(1 as Integer)].contains((1 as Long)) //false (inconsistent with equals!)(1 as Long) in [(1 as Integer)] //false

That is some scary stuff, right? I think it is scary. This one is not very fixable, but most Groovy bugs will eventually get fixed. Hibernate bugs will be with us forever. This series was about bugs other people call functionality and sophisticated design.

Designing a good web framework is not easy. I think the key is: the framework must be intuitive. It should do what developers expect to happen. That is one more reason for my focusing on GORM/Hibernate. It is hard to claim that these 2 are intuitive, but a similar claim becomes much more subjective in other parts of Grails.
Here is one example of a non-intuitive behavior: How does controller forwarding work. The intuitive behavior would be that forwarded method executes in a separate thread. It does not, it executes as part of the calling method. Instead of wasting time on disagreeing that this is a bad design, do this experiment: Create a controller with methods ‘a’ and ‘b’ and have 'a' forward to 'b'. Add a filter which simply prints action name in ‘before’ and ‘afterView’. Here is what you will get with Grails 2.4.3: before a, before b, afterView b, afterView b
(both afterView print the same action name!). Would it not be nicer if we got: before a, afterView a, before b, afterView b
Confusing design + mutating state == bugs. But how can I argue that this is not just an innocent overlook on the part of Grails/Spring framework?

One of the most irritating aspects of Grails is that everything is so intermittent, and that is by design. The fail-fast philosophy is totally foreign to this framework. For example, if calling object.save() no longer always saves the object (yes you are reading it right, see GRAILS-11797, GRAILS-11536) then would you not want object.save() to fail if the object is not going to be ever saved? Again, focusing on GROM/Hibernate simplified my job of demonstrating examples with a very intermittent behavior.
The uncanny ability to exacute bad code in Grails goes way beyond what I was able to demonstrate. This is the very scary: How did that ever work before? thing. I suspect something is wrong with Grails/Groovy compilation and a bad code sometimes magically works until some totally not relevant change exposes the problem.I cannot justify this claim. The only think I have is anecdotal evidence.

I have very strong opinions about what are the main causes of OOP bugs. That does not mean you would have agreed with me. Without good examples, this blog would have been labeled as a one more guy that thinks that 'FP is the new silver bullet'. Focusing on GORM allowed me to pinpoint the problems in a way that is hard to dispute. (And yet, I still got the label.)

Is it all about the cache?
All of the GORM problems listed in this series can be attributed in some way to Hibernate session/1st level cache. You can argue that having cache is beneficial and that some problems are unavoidable with any cache implementation.
Ideally, caching should behave as if it was not there: application using a cache should work exactly the same way if the cache was removed. However, synchronization of ORM cache and DB state is a very hard problem and achieving ‘opaque’ implementation may be hard or even impossible.

Dear GORM/Hibernate: If you can’t implement cache which behaves ‘like it is not there’ then don’t design your API like the cache ‘is not there’. Make the cache very explicit and optional (Identity Map?). Invalidate cache immediately, or at least, provide a way for the application to learn as soon as you know that cached data is stale. Design invalidating (you like to call it session.clear()) your cache in a way that does not make half of objects used by the application useless. Remember, you are just a cache, the data is still there! The data is what is important, not you. If you want to call yourself a cache, stop being so bossy! :)

In my blog, I have just scratched the surface. Some examples of 'bug generating' design that could use more discussion include: defaulted and recommend 'flush:false' in save() (maybe a moot point now), or what happens when Hibernate session closes suddenly and unexpectedly (like, when transactional code fails).
I need to stop somewhere and this post seems a good place and time to stop.

Sneaking-in FP and other things that interest me
I decided that complaining about Grails not working right is much less powerful than pointing out why it does not work right. I think analyzing and criticizing bad programs is a great way to advance programming skills. With each installment I tried to sneak-in some concept that explained why bad is bad: properties, shared state/side-effects, fail-fast, unit testability, even a bit of combinators. These are all common sense things that explain design flaws.

One thing I could not fit into my posts was types. I decided that this will be too foreign concept in the context of Java and Groovy. Types are very powerful and I regret not finding a good place for them in this series.

Is FP the ‘new’ silver bullet?
I was asked this question and it makes sense that I try to answer it.
FP is not new, FP predates OOP, Haskell is older than Java, combinatory logic is older than Turing machines, lambda calculus is about the same age. The silver bullet is and probably always was the ability to logically reason on the code.

In this series, I tried to emphasize the importance of logic. Programs should be logically simple and ‘mappable’ to logic. Programming and Logic are very related on a theoretical level (google: Curry-Howard). This 3 are called the trinity of CS: Type Theory, Category Theory and Proof Theory (google: Curry-Howard-Lambek).

Programs I write using Groovy and Grails may look straightforward and shorter than Java and Spring but from the point of view of logic these are still just spaghetti threads of programming instructions. Add to it a total disregard for side-effects and this exercise becomes equivalent to building a house of cards on a foundation that is shaking.

Quiz Question: Recalling Logic 101, here is a logical ‘formula’: (a^b)=>c ⇔ a=>(b=>c)
Do you know/can you figure out what that corresponds to in programming? Answer at the end of this post.

I used to love Grails
I started this blog site with a series about 'Imperative curlies'. My idea at that time was that I can count the number of curly braces ({}) in my code and use that number as a measure of how good my code is. The fewer 'curlies', the better the code. The idea was to break away from coding and thinking using imperative sequences of instructions (for loops, if statements all use 'curlies'). I remember it worked very well for me. If you look at these old posts, you will see that there was a time when I really liked Grails.

Is all OOP bad?
I think that is a complex question. Good OOP is about things like decoupling, separation of concerns, eliminating shared state, meaningful polymorphism, etc. These things may achieve some of the same goals FP is fighting for. The concept of a shared session state (Hibernate session) is not very OO. Hibernate StatelessSession interface which does not extend Session is not a great example of OO polymorphism. Ability to decouple is mostly gone due to Hibernate non-localized side-effects. Hibernate is simply not a good OOP.

What any OOP will always lack is this: a clear and simple correspondence to logic. This is what makes FP unique.

Parting thought:Answer to the Quiz Question: It is currying. To see it, compare these 2 lines: (a^b)=>c ⇔ a=>(b=>c) (a,b)->c ⇔ a->(b->c) (2 argument function) (function returning a function)
First line is the logical formula. I have changed arrow-like symbols '=>' to look slightly different '->'. I have replaced '^' with ',' and ended up in FP! This process is a mini-Category Theory in action. Cool, is it not?

There is no helping it, Groovy and Grails are OOP not FP. It is still important to be able to think outside of that box. Otherwise we will start convincing ourselves of things like ‘static definitions are always bad’, ‘unit testing is not about finding bugs’, ‘using refresh() resolves stale object problems in Hibernate session’, or some other nonsense.

Thinking in C++, Thinking in Java: it is worth trying to stop it, even if you program in these languages.

Grails is and will be a very popular and buggy framework. We can only blame ourselves for that. Writing this series was a big effort for me. My biggest hope is that it made some of my readers stop for a moment with a 'hmm'.

The End (for now).

(I ended up republishing this blog due to some weird formatting issues - if I created a chaos in your RRS/Atom feed - sorry!)

Saturday, October 18, 2014

Repeatable finder (concurrency issue described in part 2) is what started/motivated this series. I had hoped that that this issue will draw some reaction from the community. It did not. Why? Tallying up all the answers/responses from the last 2 moths amounts to: 6, none of them useful or even correct. What does that mean?

In this series I tried to sneak in some things that interest me like FP and logical reasoning of code correctness. I will use repeatable finder problem as a way to sneak in a bit more of this stuff later in this post.Denial isn't just a river in Egypt?
This has been a twilight-zone.
To refresh you memory: if more than one query is executed in a single Hibernate session and the result sets intersect then the second query returns a weird combination of old and new data. That can break the logic in your code, for example:Users.findAllByNickName('bob')

can return records with nickName != 'bob'. Other things can go wrong too: Maybe you have used a DB unique key to define equals()? Or maybe you have used a DB unique key as a key in a Map? Any of this could go very wrong.

At first, I thought that the issue must be well know and I am missing some way of handling it. This, unfortunately, is not the case. Very recently, I came across this blog from 2009: orm-sucks-hibernate-sucks-even-more"... take a look how even a silly CRUD application would suffer, once you've got "not-very-recent" object from the session"
that quote points to a (now non-existing) page on the hibernate website. Did we know more in 2009 that we know now? If we did know, why have we allowed for this issue to stay unresolved? Well, this is all speculation.

I tried my best to do 2 things: make the community aware and persuade Hibernate to fix it. I have failed miserably on both accounts. Here are the results of my efforts (as of Oct 17, 2014, tallied after a bit over 2 months since I started my crusade):

Hibernate ticket was the weirdest experience. It got rejected very fast (not a bug) with a comment to just use refresh(). After pointing out that this workaround is a total nonsense, I was sent to read some completely not relevant documentation about concurrency. After that, my (and Tim's) comments have been ignored.

What can I conclude from these 3 facts?:

nobody seems to know how to resolve or even work-around this issue

experts provide advice that is incorrect

there is no interest in solving, discussing or even acknowledging it as a problem

I do not know, but probably nothing good. I think it is interesting to try to puzzle out the few responses that the problem did generate. I will try to do that here.

The replies I got from the expects fall into 2 categories. The first category are answers like this:

It is any ORM issue

Any database application will have an issue like this

It is true that the issue can be resolved with DB locking. In particular, I could prevent repeatable finder by having all HTTP requests wrapped in long transactions and configuring higher (repeatable read) isolation level. Indeed, it is a big framework design failure, if we need to resort to things like this.
It is NOT true that any ORM and any database application will have this problem. The most likely explanation for this type of response is that developers do not think about side-effects. They see Hibernate query and think of a SELECT statement only. If I see a problem, it must be from the SELECT, where else would it come from? This is consistent with the point I tried to make in my previous posts.

The second category are answers that suggest using refresh() or discard() to fix the problem:
'To fix your problem'

add refresh() to your code

or: add discard()/evict() to your code

My first reaction was: Grrr, my second: Hmm. If I could only continue this conversation I am sure it would go like this:Me: Where do I add these? Expert's Reply: Add them where you have that problem.

If you have been following various Hibernate discussion forums, you must have noticed that the same type of advice (either to add refresh() or to add evict()) shows up very frequently. This advice is never right.

Grails and Hibernate experts: I am very disappointed in you.

Add refresh()... add evict()... Thinking in Hibernate.(Here is where I sneak-in some interesting stuff.)
This is how we typically reason about our code: the problem is on line 57 because variable xyzis ... and then on line 89 we do that..., and then on line 127 we have an if statement that goes like that...
We reason about our code by examining chains of programming instructions.

This is called imperative thinking and imperative programming. If you read my previous posts you may assume that I consider such programs not logical, they are logical, only the logic is very cumbersome and complex.
A well designed OO program is where lines 57, 89 and 127 are all in the same class and the chains of instructions we need to examine are relatively short. In a procedural program lines 57, 89, and 127 can be anywhere and chains are long. Badly designed OO programs behave like procedural programs.

Repeatable finder is a great example where imperative reasoning fails. The problem is not something between lines 57 and 89 or something on line 115. The problem is (or can be) anywhere.
Answer 'add refresh()' or 'add discard()' is a very imperative thinking: it assumes I can add it on line 89. (I can only conclude that this expert advice is not to sprinkle refresh() all over my code just for fun ... and because my code will run too fast without it.)

So what is the alternative? The idea is to think about a block of code in a way that can prove certain behavior of that block. If we know that a code block 'A' exhibits the same behavior no matter where it is placed or how is used, then we no longer need to think about lines 57, 89, and 127 or about chains of computing instructions.

This is called declarative thinking and programming. Logical reasoning is now simplified, I no longer need to follow chains of programming steps to reason about the correctness.

Declarative thinking works great, except, if I cannot trust any property, even that:Users.findAllByNickName('bob').every{it.nickName == 'bob'}
then I am stuck.
That may sound like a limitation of declarative programming: I can still keep going using imperative approach. That is true, and we all 'keep going'. I did not stop programming my project and no, I did not add refresh() all over my code. That is why our applications are so buggy: we ignore logical problems unless we can pin them to line 127.

Side Note: The fun starts when I start combining my declarative code blocks into bigger blocks. Code needs to be logically composable. I want a bigger block (composed of smaller blocks) to have properties too. Some like to call it programming with combinators.

Conclusions:

I would like to suggest this as a new rule of thumb: the answer to use Hibernate/GORM refresh() or evict()/discard() is wrong regardless of the question.
(with exception of Functional Tests - which may need to refresh some records used in asserts). Please comment below if you find a counter example to this rule.

I am not claiming that I know the solution to Repeatable Finder. Maintaining Hibernate cache synchronized with the DB is hard, maybe impossible. One way of dealing with hard problems is: make them somebody else's. If GORM/Hibernate just told me when the query is lying (returns stale data even if it has new data) or allowed me to request/configure the query to refresh all records... That would go a long way.

It looks like the community has decided to not acknowledge Repeatable Finder as a problem. There is really no good solution for it and acknowledging it would be admitting to that fact. This issue is likely to remain unsolved and ignored. More complex Grails apps are doomed to work incorrectly under heavier concurrent use.

I have added a label to my posts (which does not work so do not click on it - and that convinces me that blogger must be using Hibernate): 'Stop thinking in C++ and Java'. I think we need to stop thinking imperative or at least stop thinking only in imperative terms.

Next post: I need to do one more to wrap-up. I will be finishing this series next week.

I have a busy period ahead of me. I started going over a set of courses published online by University of Oregon (Oregon Programming Languages Summer School) and that will be many hours of not very easy listening and learning. I also have to start preparing/training for a ski camp in early November (I live in CO and skiing has started here already). No, this will not be a SKI calculus camp;) - but then, believe it or not, technical skiing is (or should be) a fun intellectual activity too.

About Me

I am very interested in functional and declarative programming.
My day job is very imperative so this page is my way to vent about what I find cool and fascinating and about what I find not so cool.
Learning Haskell was a big thing for me. I was always into learning something new, but nothing else can come even close to Haskell learning experience.
My 'bring Groovy to FP' project:
http://github.com/rpeszek/fpiglet