Why using the Stream API to parse our CSV?

The stream API is a handy abstraction for working with aggregated data. This becomes particular handy when we need to perform multiple actions, such as transforming the content, apply some filters and perhaps group them by a property. With the Stream API we are able to register a lot of actions we want to perform on each row in the CSV file, and doing it with a descent level of abstraction. We want the framework to handle the low level stuff, such as reading and looping the data, but still be in control of what we want to achieve.

The Stram API is a perfect fit for the task I want to solve today. I have a CSV file with a lot of persons, presented in the example below (simplified). The first task is to read all the “lines” of persons and make them in to a list of persons, List.

This list can be tremendously long and I want to fetch the 50 first adults (age > 17). Luckily the BufferedReader in Java 8 has been upgraded to provide me with the Stream abstraction. All I need to to is to call the .lines() method on BufferedReader. (The Stream abstraction is where Java 8 as store all the functional sweetness coming in Java 8, such as map, filter, max, min, sum, etc).

This problem is really easy to solve using the new Stream API, as shown in code example above. The key to the solution is the joining-collector provided as a ready to use Collector. The joining-collector uses StringBuilder under the hood, to build up the resulting String. How would the solution look like with imperative styled for-loops? How many garbage variables would you need?

The example touches multiple new concepts, such as filter, sorted, map, collect, introduced in the Java 8 Stream API. Later I will write about the Stream API more deeply.

In this blog post I will briefly introduce lambdas which will be included as a new language feature in Java 8. I recently wrote a short introduction to functional programming support coming in Java 8. In this post I want to focus on lambda expressions, what they actually are and why they are awesome.

The motivation behind lambda expression is to provide super nice and simple syntax for passing functionality as arguments to another method, such as what to to when someone clicks a button. Pre JDK8 we used anonymous inner classes to do that, which typically implemented a functional interfaces (more details below). The problem we faced with that approach is that the syntax was verbose and unclear. It was really hard to write and read. Lambda expressions let you express instances of single-method classes more compactly [1]. We can think of lambda expressions as a way to define anonymous methods.

We can think of lambda expressions as a way to define anonymous methods.

Take lambdas for a spin

Let’s start simple. We have a List of numbers and we want to print all of the numbers using the new super awesome forEach-metod which accepts a consumer as argument. Without lambdas we can achieve this by implementing a Consumer:

This is way better!! But There still is some noise in my eyes. Why do I have to tell the compiler the type of value? Can value possibly be anything other than Integer?

The answer is: NO it must be an Integer And it turns out that the Java 8 compiler can help us out by understanding the type for us, we don’t have to, hurray! This concept is known as type inference. Lets have a look:

Wow, this starts looking like something readable. We even got rid of the parenthesis for the value parameter.

It’s important to notice that the forEach method still accept a Consumer as input and that it is the compiler that takes the provided lambda expression and converts it into a valid consumer.

Method Reference

Even though we ended up with a simple and easy to read lambda expression there is still something bothering me. We have created a function which takes the input argument and just calls a new function with the same argument as input.

Can’t we just use the println-function instead? The answer is method reference and here is an example:

We see that we use the special ‘::’-notation which allows us to borrow methods elsewhere. The result in this example is that the forEach method will call the println-method from System.out for each element in the list.

(Side note: It is also possible to refer to the constructor method with new: User::new).

Multiple blocks

Can we execute multiple lines of code in a lambda expression? Yes of course, just add some curly-brackets:

Lexical Scoping & effectively final

Lambda expression closes over the scope of its definition, lexical scoping. From within a lambda expression we can only access local variables that are final or effectively final in the enclosing scope. Effectively final means that Java 8 relaxed the requirement to use the final keyword, but the variable can still not change if we want to access it inside a lambda expression. If the compile detects that the variable is mutated, inside or outside of the lambda-expression, it will complain.

More examples

Functional interfaces

We can use lambdas with methods which takes a functional interface as argument. This section briefly introduce what a functional interface is.
The only requirement for a functinal interface is that it have one abstract unimplemented methods. It can have 0 or more default methods. In the example below I have showed a snippet from Predicate interface part of JDK8. The “FunctionalInterface” annotation is optional, but when present it will make sure that the interface have exactly one unimplemented method.

The predicate is used to check whether an input argument satisfies our requirements. It has one abstract method “test” which should be implemented to verify the requirement. This functional interface also comes with default methods and, or, and negate. The first two are used to contact multiple predicates and the latter is used to invert a predicate. This allows us to reuse and build on top of existing predicates.

Other common functional interfaces found in the jdk8 java.util.functional package includes:

Counsumer<T> – takes an input and performs an operation on it. Will cause side effects!

Supplier<T> – a kind of factory. Will return a new instance or a existing instance.

Predicate<T> – Checks if argument satisfies our requirements

Function<T, R> – Used to transform an argument from type T to type R.

BinaryOperator<T> – two T’s as argument, return one T as output

Summary

Lambdas are awesome and the corner stone in the introduction of functional programming in Java 8. It’s a clever way to introduce functional programming in Java, making it super simple to write them. Letting lambdas be defined via functional interfaces (already heavily used, e.g. eventListener) allows existing code to be forward compatible with lambdas. Clever!

You might think that lambdas is just a pretty syntax for creating anonymous inner classes under the hood. Then lambda capture just becomes constructor invocation. This is (thankfully) not the case. It would lead to performance issues (one class per lambda expression). Instead the language team uses the fifth bytecode method invocation mode introduces in Java 7, called invokedynamic. I want to do a special post on lambdas under the hood in a later blog post.

In September I attended JavaOne 2013 in San Francisco. Oracle was showing off Java 8, scheduled for GA in Q1 2014. The feature comming in Java 8 which exited me the most was the functional part, introduced with Project Lambda.

All the other major platform, such as C#, has had this for years now and finally Java is growing up and will introduce functional programming in Java 8. In previous versions of Java we have been so used to imperative style programming that it is hard to even realize the alternatives. It has worked fine, but is very low level and a extremely verbose syntax compared to the alternatives. With the new functional features we are now able to express what we want to achieve more consciously and not worry so much about how to actually do it. Java 8 enables us to used old school rock solid OO design and combine it with functional patterns. Combined we will be able to achieve more with less, meaning fewer bugs and more value delivered. This is a big change for Java, even bigger than generics introduced a few years back.

In this post I will present a few simple examples on how you can utilize functions in Java 8.

Setup

As a basis for each example I will have a list of persons as showed in the snipped below.

Passing functions in Java 8

Say you want to print the name of each person in the list. How do we do that in Java today (pre Java 8)? EASY! We loop the list and for each item in the list we print the name. We even use the enhanced for loop. Pretty simple, right?

for(Person p : persons) {
System.out.println(p.getName());
}

This is referred to imperative code syle. There is mainly two problems with this example:

We have to introduce a temporary variable (p)

We have to know HOW to iterate a list (the for ioop

Not only do we express WHAT we want to do, we also have to express HOW to do it, iterating all the elements and introducing a mutable element. In Java 8, we now have a forEach method on collections, which allows us to pass a function. The underlying framework will take care of how to loop each element. We will need to pass a Consumer, which performs an operation on each element:

persons.forEach(p -> System.out.println(p));

Remove elements

The collection also makes it super simple to remove elements from a collection. We just use a lambda expression, a predicate, to express which elements we want removed. How would you implement this pre Java 8?

persons.removeIf(p -> p.getAge() > 20);

* We could also use the syntax “(Person p) -> p.getAge() > 20)”, to specify type, this is optional, as it is automatically inferred by the compiler.

WARNING. I generally do not feel it is a good practice to use this function as it mutates the actual list. In my opinion it would be better if it returning a new list/view, without the elements matched by the predicate.

Method references

In Java 8 we will also be able to borrow functions from other classes using the “::” notation:

Today I got myself thinking, what are the main benefits of system testing? We all think testing our code is important, but why? In this blog post I have collected a few points which in my opinion makes writing good test cases worth it. And of course, we are talking about automated tests.

Documentation – Good tests forms an excellent documentation on how to use and understand the code. How should you instantiate a class? How should you call that service? What are the limit values? The list here can be very long…

Improved code quality – I like to believe that good test provides at least some level of code quality. The fact that the coder bothered to write test is an indication on that he actually tried to do some quality work. Of course, if the tests sucks, they are not worth the bytes used to store them.

Verification of requirements – Without tests, how can you be sure that the code actually solves the issues it was supposed to solve? How do you now we are building the right thing?

Safer to refactor – With many tests of high quality I would have a better feeling refactoring code. In my opinion refactoring code is extremely important to ensure that we constantly improve the design of our solution all the time. I can’t imagine how to do this without tests (at least if I must refactor others code)

Instant feedback – The best feedback we get is from our users. Test though have the strength of giving us instant feedback while we develop. This instant feedback is important because it allows us to detect bugs earlier. The earlier we catch a bug, the less costly it is to fix it.

Limit values– Some application states are hard to reach with manual testing. With test we can just mock those services and instruct it to return those hard-to-reach corny values

The cost (time required) of fixing bugs rises if we discover them late in the development phase. The price is highest if the bug is discovered after the functionality is released. I usually visualize this in my head as the figure illustrated below. In my experience the cost of fixing a bug discovering in production is significantly higher, than if we manage to catch it during development.

I guess there are plenty more benefits of automated system testing. Please leave me a comment to let me know what I left out.

If you enjoyed this blog post, you would probably also like the one I did about TDD by example. TDD is a great way to make sure you are testing your code.

The goal of TDD is not writing the tests first, it’s a design process where you iterate over the design as you develop new code. Instead of doing a full upfront design, you design little by little as you need more functionality.Writing test first is just a tool to force you to focus on a small part of the code, how that part should work and improve the design all the time.

Generally doing TDD at-least guarantees tests and testability. It does not provide efficiency or quality in itself (I have not found any documented evidence of this). But it does provide some minimum level of quality and encourages the developers to think about the problem in a modualized way. It also makes sure that developers write tests. We all now that writing tests after committing the code is hard because of all those excuses: late friday night, pressure, the sprint is ending and we just need some other functionality done, etc.

Another big benefit from TDD is that it also (help) eliminates the waste from created from developers implementing stuff that might be useful. No code can be written before a test-case requiring that functionality.

will make it easier to refactor your code, because of the high test coverage

makes you iterate and improve the design throughout the whole development process

high test-coverage provides excellent documentation of your code

Shortcomings of TDD

A higher number of tests can not guarantee higher code quality, it can only provide you with a minimum level of quality of the resulting product.

can be time-consuming, especially in the beginning

can be hard, especially dealing with frameworks which put constraints on your code

done wrong: can make it hard to change the code. This is because there is so many tests everywhere that verify every little part of your code all the time

can be hard to prove that it actually are more cost effective, especially in the beginning

Can make you less productive, especially if you follow a very strict TDD-model where you only do the smallest change possible to satisfy a failing test. I often feel that I would be able to solve larger part of problem at once when i do TDD.

TDD in action

Now, after providing some background, lets start doing TDD, iteration for iteration. I will show you all the steps required.

Problem description

The task is to implement an factorial method in Java. The definition of factorial is:

Examples:

0! = 1
1! = 1
2! = 2 x 1 = 2
3! = 3 x 2 x 1 = 6

Limitaions: We will limit our self to only use the primitive int type in Java. This simplifies our problem, but limits the resulting number to 32bit. this means we will only do up to 10!.

For me this feels natural. I do this to avoid having multiple of testMethods for every input value we are testing.

Summary

As you saw, we quickly found the recursive solution of this problem. Because the faculty operation is a well known problem, I would probably head straight to a similar solution without a test-first approach.

The point of this example is just to show the process and how it is performed. The benefits is generally more “visible” when the task faced is larger and more complex, where it is harder to see all the challenges required to solve upfront.

TDD gives us tested code, with shorter feedback loop, higher code confidentiality and hope of code quality and improved design. At least the developers have been forced to implement with speration of concerns in mind. More tests does not provide quality in itself and it all comes back to highly skilled developers.