Best Practices

November 26, 2011

When thinking about threading issues in Java it helps to separate the concerns based on whether the concurrency issue is in the component (the application), or in the container (the application context, the framework, or the application server).

I am a bit hesitant about putting the "pattern" label on this separation of concerns, but I was not certain what other label would be more descriptive. The main point is that if you are a component writer, your concerns are very different from those of a container or framework writer. It helps greatly simplify thinking about threading issues if you separate them along these lines. I make this point several times, because I believe in the value of repetition, and because this note is not about how to design and to code for threading - there are hundreds of resources on the web, in articles, and in books that do a far better job. But none of these resources, to my mind, do a proper job of separating the concerns in a way that is beneficial to the typical application developer, whose main task is to deliver business functionality, and not infrastructure design. I maintain that if an application developer masters the concurrency design literature, they will be diverted from delivering business functionality and they would be better suited to develop frameworks or design new containers, or work for companies that develop application infrastructure products.

The Container/Component Separation of Concerns

Java threading issues, and designing for multi-threading, poses a different problem for the application programmer than it does for the system programmer. An application programmer is mainly concerned with designing and implementing a business application. He or she has enough to deal with in understanding the business requirements, the domain model, data handling, and presentation handling, in a way that satisfies a business need. Infrastructure issues, and threading is an infrastructure issue, intrude and seem to be "out of scope" for the application developer.

Still, the application developer cannot dismiss threading concerns by relegating them to the infrastructure, and declaring, "these are issues taken care of by my framework, I don't need to be concerned with them". He is half right and half wrong, in taking this stance. Handling threading is half an infrastructure concern and half application design concern. But exactly how? Where is the separation of concerns, and how do the responsibilities divide?

This is what I will explore in this article.

To do this we will create a couple of classes to simulate the interplay between the application and the infrastructure. We will call the infrastructure class Container, and the application class Component. You can use names like ApplicationConenxt, and Application, or ServletContainer, and Servlet. The model is the same. Some container holds a reference to your application object. It instantiates your application object (one or more instances), and it also creates a thread (or pulls one from a thread pool). Inside the thread's run method work is delegated some method (a doWork() method) in your component. The division of labor is that the container is responsible for managing the lifecycle of the thread and for picking the next one to run. The component writer is responsible for the doWork() method of his component and for ensuing that his component "plays nice" with other components running in other threads in the container, i.e., it is thread safe.

Now how does this division of labor help the application programmer handle the threading issues? It does so by relieving him of the concern of creating threads and managing thread pools and managing allocating requests for work and handing them off to worker threads.

Lets look at code to make all this concrete.

Example 1: Stateless components

File: example1/Container.java

package example1;

// Working with stateless components// No shared state// No race conditions, no synchronization issues

Notice how Threads 1 and 2 are running concurrently, but they don't interfere with each other. They don't share state and they don't need to be synchronized. The component writer (the application programmer) does not need to do anything in the component to deal with concurrency.

Example 2: Stateful Components with Shared State.

The stateful component has a field value that must not be interfered with while the component is "doing work". The component writer (the application programmer) has to make sure any critical section (that reads or updates the shared variable is protected from interference from another thread running the same code). This is what is meant by the component is "thread safe".

We will examine example 2 without doing anything to guarantee thread safety, and we can observe what happens with the state of the shared variable.

File: example2/Container.java

Same code as example1. The application programmer does not have the luxury to re-write infrastructure code, or tamper with how the container internals run.

Notice that both components enter the the business method with the same value for the shared variable. There is a "lost update" for the second component. It does not see the update made by component 1. But during the run of both components the shared variable goes through all the updates made by all the components! So there is a race condition, and a corruption of data problem.

Example 3: Synchronizing the work of the business methods on shared variables

Notice now there is no overlap in the execution of the businessMethod. Thread 2 has to wait till Thread 1 is done, and is guaranteed to see Thread 1's updates to the shared variable.

We can do more experiments, and study more concurrency and synchronization issues, by continuing with different designs of the component class. This is the main point of this article: the application programmer need not be concerned with the container design, or the creation and management of threads, but does need to know how to make access to critical sections of his components thread safe.

This separation of concerns should make it easier to an application developer to make sense out of the literature on concurrency, without feeling that she needs to become an expert in designing and writing multi-threaded code frameworks that belong in system programming and infrastructure.

There is a lot more to concurrency and synchronization - and there are many good books and articles. My purpose here was not tp survey best practices and best theory and patterns - this is well documented in other places. My purpose is to only emphasize that the application developer needs a different focus than the framework and infrastructure developer while studying and designing for concurrency.

One of the best books on Java concurrency (Java Concurrency in Practice) covers the new concurrency features of Java SE 5. It covers many of the classes and interfaces of java.util.concurrent, but it does so in a manner that is very frustrating for the Java application programmer. About half the book is about designs and patterns for safety of a component, the other half is about the structures useful to a framework or container developer. The book's intended audience, it seems, is the system or infrastructure developer. Most application developers will throw their hands up in frustration, and miss on benefitting from a great book.

My advice, and this is the whole point of this blog, is to separate the component thread safety design parts from the container threading management parts. In another note, I will present such a division, and show how the book can be made much more useful had the editors chosen to structure the book around the component/container seperation of concerns pattern that I advocate here.

August 24, 2008

In my last blog entry on this topic I made the following bogus statement:

"We knew the thing was a Stack, meaning, it implemented the abstraction of a Stack abstract data type, by reading the implementation, and slowly figuring out, "hey, this looks like a stack", and these operations look like operations we normally do on Stacks." to explain how I translated some foreign code into English.

The statement is bogus, and someone should have taken me to task for making it, (anyone reading these postings?) because you should not have to figure out the semantics of an interface from its implementation. That's pretty circular reasoning. How do you know the implementation is faithful to the "intent" of the interface, if your understanding of the interface comes from your understanding of the implementation?

How do you tell a Stack, from a Queue from a Foo? The answer, lamentably, points out to a glaring deficiency in the Java interface mechanism: the interface cannot possibly tell you the semantics of the defined operations. In other words, a Java interface is inherently unreadable. Claiming that you've read and understood a Java interface would be a vacuous statement. You have no clue what the intended semantics are.

Aah. That looks much better. Now we know the intent of the programmer. But wait, how did we know that Pila meant Stack? We looked it up in a Spanish-English dictionary? We could do that, but what if we did not realize that this was Spanish? When you meet "foreign" code, the "foreign" language is usually a domain you're not familiar with. This is very contrived just to make that point. I hope everyone sees that. Besides, the code could have been as follows:

We knew the thing was a Stack, meaning, it implemented the abstraction of a Stack abstract data type, by reading the implementation, and slowly figuring out, "hey, this looks like a stack", and these operations look like operations we normally do on Stacks. This required that 1) we had a mental model of an abstraction that we know as "Stack". 2) We recognized the similarity between our mental model and what the code is trying to do. That is the essence of readability. How close can the developer layout his mental model in such a way that it is easy to see it in the code. Much easier said than done!

But is that enough to understand the interface? Not really. We need to specify the contracts that the interface is to honor. The preconditions and postconditions for every method.

August 10, 2008

Is this code readable? If it is not, why not? If it is, tell us what it does. Obviously it is a mixture of English and another language. Knowledge of the other language is not required. Or is it? Can you reason through the code without knowing the other language?

August 03, 2008

If you have smart engineers that know their craft, know industry best practices, and know their business domain, why would you need a framework? If you have looked at this issue, or are now looking at it in search of a framework, could you state, in one word, what is the most important consideration that motivates your desire to have a framework?

You can elaborate further, but that probably requires a good deal of space, so lets try to keep it to the one word, for purposes of a canvassing, and maybe a sentence or two in elaboration.

My one word answer: Consistency

My elaboration: I want designs produced by my team, my group, my department, my company to be, above all consistent, so we can have easier sharing of knowledge, easier transfer of knowledge across teams, easier transfer of skills, easier ways of evaluating and judging quality. Different people will have different notions of what "goodness" means - what attributes of a design make it superior to other designs. A framework gives us a common world view, and a common approach to design and construction, a common solution to a set of problems. So I value consistency most.

What is your answer?

An acceptable answer, of course, is "None". We don't need a framework. Would be interesting to read the elaboration accompanying that answer, though.

Other contending one word answers: standardization, productivity, reliability, uniformity, quality, predictability, simplicity. You can add to the list as many as you like, but please choose just one as your overriding concern.

An issue that always arises, or is there as a context on development projects is issue of the relationship between frameworks and projects. I think the relationship should be just like oxygen to a human being. We need it, we breath it, we have it, we don't spend too much time debating how much we need it, how to obtain it, and which is the best technology to provide it.

By the same token, a framework should be available to the project team just as oxygen is. And it is just as vital for their continuing health, as Oxygen is to living things. They should not have to spend any project time debating or selecting frameworks.A framework should be part of their environment, the infrastructure they rest on. The project team should spend every minute of the project time delivering end value to the customer. They need to think through their business requirements, their domain model, the stories they are required to deliver by when, and have total focus on that task.

A framework, therefore, must be provided by a higher order "provider" than the project's immediate builders. It should be part of the environment, or infrastructure. If you are scientifically inclined, the selection of a framework should have been done through a scientific process, based on broader engineering requirements than the needs of one project. If, on the other hand, you are religiously inclined, a framework should be provided by "God" - whoever the Gods of your environment are.

July 27, 2008

With a couple of AWK scripts, called from an ANT script, you can - in about an hour - perform the first step of reading a large body of code. The "take inventory step". Here is a pretty quick analysis of the Spring 2.0 source code.

* 1437 Classes and interfaces. * 178 Unique suffixes. * Most classes are around 30 lines long. Very few above a 100. Longest class is 269 lines. * Class names are long and descriptive. * Longest class name is 56 characters: DelegatePerTargetObjectDelegatingIntroductionInterceptor

Reading the Spring class names is very instructive. Extremely expressive names. A great model to follow as a naming standard.

The suffixes that occur more than 5 times with their frequency counts (studying the suffixes give you a good idea of the "concepts" in the code :

July 20, 2008

Over the years, when faced with a large body of code, I have approached it as I approach any technical reading task: an exercise in "concept extraction" - a De Bono simplicity technique. The essence of the technique is to try to reduce the sheer size of the problem to a manageable size - a variation of the "how do you eat an elephant" theme. In this post I will deal with reading very large amount of code. Reading a small amount of code (under 10,000 lines), line-by-line, still has different challenges, and is best done in an IDE, like Eclipse, using browsing, searching, and cross referencing facilities, together with some reverse engineering of classes and method executions. But before we get there we have to deal with the problem of size first.

First an anecdotal background to set the stage for the need for a code reading methodology. The largest code reading task I had was about 2 million lines of "legacy" Java code (about 5 years worth of development by about 200 developers) that I had to wrap my mind around when I joined a project at a large financial company. The documentation (although I was given three binders of it) was a couple of years behind the code, and I was told by the chief architect (who hired me) that if I read it, I will get the wrong idea about what the software does. I was part of the software architecture group (one chief architect, one system architect, and 15 subsystem architects). When I was hired, I was promised that I would have 2 hours a day with the system architect for a month, and any time I needed from any architect in the group). I was also promised time with each of 15 team leads and 15 project managers (each subsystem had a subsystem arch, a team lead, and a subsystem project manager as well as a build master). After a month on the job the total time I was able to get from all of these people combined was 3.5 hours. From the system architect, whose boss hired me, I got exactly half an hour - and that remained the average (half an hour per month) for my entire 19 month stint at that project. So the short version of this story is: you can't rely on "knowledge transfer" from others to get to know the system. My job, with an official title of "Subsystem Architect" was really "Performance Analyst". I was to be the software architects' representative with the performance testing and the engineering team (2 distinct teams one does the testing, the other performance modeling and capacity planning). The performance test leads, performance testers, and performance engineers, looked to me for "application knowledge". Since I was hired in from the outside, and have never worked on this project, I needed to ramp up very rapidly, and reading the source code was the only way to do it. Documentation was old. People were not available, and were not allowed to be available. I was not a priority to any of them. The urgency of the task was made clear when I was told that I was part of a 13 person "performance triage team" that had a "check-in" meeting every day at 10:30 am. Each and every day for three months! I was the "look to" guy for "application knowledge". I was greener than a broccoli and colder than an ice cube! You do the math on pressure per square inch.

Faced with 2 million lines of code, documentation you were told to avoid, because it was misleading, and UML that is 2-years behind (they had a full Rose model), how do you learn the system?

The obvious approach, reading the code line by line using a text editor or an IDE, will not be much help. It rapidly becomes overwhelming and lead to frustration and discouragement eventually leading to abandoning the effort. Reading line by line comes at later stage, after the scope of the task has been narrowed.

So here was my general approach, step by painful step:

Step 0. Preliminary: Use the application. Get familiar with what the application does from its user's point of view. Understand the operating concepts. If you can get time with key people, that would be great. In my case I got half an hour with the system architect, and the URL for the app. The architect did a one-page sketch on an 8 1/2 by 11 sheet stating that the system is basically very simple: we have to buy loans and contracts from banks - here is how it flows. The GUI had 110 use cases - but the basic ones are less than a dozen.

Step 1. Inventory the code. What do we have?

- Number of files- Total lines of code- Number of classes- Average size of class (methods per class)- Number of methods- Average size of method (lines per method).

I used some tools that were freely available (Understand for Java) was one - for its trial period. I ended up just writing simple scripts to do the counting.Step 2. Get an overview of the source code tree. What is there other than Java files? Config files, metadata files, XML, HTML,Step 3. Understand the build process. What are the build-units (build chunks/subsystems/deployment packaging/...)

Step 4. Inventory the databases

- Number of databases- Number of distinct schemas- Tables per schema- Average size of tables (columns per table)

I usually produce a "distilled schema" with this format: <tablename>(colum1, column2, ...) one line per table (only about six or seven columns - pk plus a few more). Essentially a table to me is <tablename>(primary key, info). You can put most databases on one or two pages. It is useful to show foreign keys as part of the info. A very large database (several hundred tables) would reduce to a a few pages.

Step 5. Inventory the main concepts. The database schema is the best source for that. If there is a domain model in UML that would be very nice. Distill the concepts down as much as possible. Most systems narrow down to about a dozen major concepts, while there may be hundreds of tables in the schema, and hundreds of classes in the domain model.

These basic preliminaries give you the big picture. The very big picture. I usually write scripts (mainly in AWK and Korn Shell - even on a PC with MKS toolkit - can't live without it). The scripts filter and summarize the code down to a digestible size.

A most useful script is one that extracts class names and sorts them by suffix. Most systems have class name patterns similar to the following:

I like to use AWK embedded in Korn Shell, but you can write a Java program, embedded in an Ant script, or use your favorite scripting language - Perl, Jython, whatever. If scripting gets too hard, I put the code in a database, and analyze it with SQL. On one project I had code for 1000 screens written in an HP 4GL proprietary language that I did not know. The language was pretty declarative and quite elegant. I wrote an AWK parser (simple one) and loaded the code into a relational database whose schema reflected the structure of the 4GL language. Then I wrote a GUI to browse through the code and extract requirements for the new system. The purpose of the project was to port the system fro the proprietary language to Forte - an OO 4GL.

Once you've sorted the class names by suffix, you can determine the number of distinct types of classes the system has. This list of suffixes is a great help in classifying the major concepts (and patterns) used to implement the system. Since there is a tremendous amount of repetition in the implementation (a major implementation pattern is usually repeated hundreds of time), this summarization alone will reveal most of the system secrets. In the financial system, there was a basic pattern of BizController, BizHelpers, BOs, DAOs, and DTOs, repeated 110 times, once for each use case. The architecture standards did not allow deviation from the basic pattern. I could have been saved tens of hours, if one of the architects had clued me in. But, as explained earlier, no one had time to meet with me. Actually, after I arrived at this pattern that most use cases used, and presented it to the architects, they were in denial! No way, our system is vastly more complex than that! Another side lesson you learn, when you start valuing simplicity, is that the "complexity priesthood" will not like what you say! They have a vested interest in complexity. They are the guardians of complexity. If things were suddenly to become simple, they could be out of a job!

The output of this script, a list of the distinct suffixes, and the supporting detail list of the all the classes, becomes "the index" to the system, and a guide to reading it.

Step 6. You can use reverse engineering tools to study the classes. I find the simplest UML the most helpful (just class names, no attributes or methods). For the main use cases, a sequence diagram of the major method can be helpful - if you have a good reverse engineering tool that can reduce the size (which can be overwhelmingly large - some tools will choke on it).Step 7. Once you have put your arm around the big picture and you understand the "pattern of patterns", so to speak, you can start reading selectively. This is the time to fire up the IDE and start reading. I usually do that one use case at a time - guided by the system's operational concept, or its UI.Step 8. My main reading goal still remains "concept extraction". What is this trying to do? Why is it doing it this way?

Step 9. One thing that can help is "exploratory testing". Read using the IDE and JUnit tests. Write test cases to explore what the code is doing. This may get complex if the code has external dependencies (accessing external services, writing to databases). You may have to use mock objects to fake any external dependency.Step 10. Another approach that is useful is "mock refactoring". While reading the code, if it is hard to read, I go ahead and refactor it for readability. The most important refactoring is "extract method", "move method", and "introduce parameter object". Since the body of code is read-only for me, I won't re-check it back in the source code control, I slice it and dice it and refactor it (if it is poorly written).

I think a lot of these steps can be programmed as Eclipse plug-ins. If you know of any, please let me know.

June 29, 2008

To be able to deliver great software, we need to be mindful of software quality.

One of the best expositions on software quality I have found is Chapter 1 of Bertrand Meyer's Object-Oriented Software Construction. Nineteen pages of the best advice on what constitutes quality software. Rest of the book (1200+ pages) is very concrete, detailed, and specific design and construction advice on how to actually achieve quality software. The short version of the advice is to focus on correctness. How do you ensure correctness? Design By Contract - Chapter 11.

What are the quality attributes for software? According to Meyer, they are the following:

Correctness. The ability of software products to perform their exact tasks, as defined by their specifications.

Robustness. The ability of software systems to react appropriately to abnormal conditions.

Extendibility. The ease of adapting software products to changes of specification.

Reusability. The ability of software elements to serve for the construction of many different applications.

Compatibility. The ease of combining software elements with others.

Efficiency. The ability of a software system to place as few demands as possible on hardware resources.

Portability. The ease of transferring software products to various hardware and software environments.

Ease of Use. The ease with which people of various backgrounds and qualifications can learn to use software products and apply them to solve problems. It also covers the ease of installation, operation, and monitoring.

Functionality. The extent of possibilities provided by a system. How much functionality is enough? Do you sacrifice reliability, extendibility, etc. for more functionality?

Timeliness. The ability of a software system to be released when or before its users want it.

Tradeoffs are necessary, as these factors may conflict with one another. Economy often fights with functionality. Efficiency may require perfect adaptation to a particular environment, which is the opposite of portability. Reusability requires solving problems more general than the one initially given. Timeliness may require using rapid techniques whose results may conflict with extendibility. A true software engineering approach implies an effort to state criteria clearly and to make the choices consciously.

Necessary as tradeoffs between quality factors may be, one factor stands out from the rest: correctness. There is never any justification for compromising correctness for the sake of other concerns such as efficiency. If the software does not perform its function, other attributes are useless. I recall a conversation with a client who insisted that we make the software much faster. I explained to the customer, that I can make it extremely fast, if it did not have to do all these things that he wanted it to do. That is the definition of a useless quality attribute! Very fast, but doesn't do much.

So I maintain that outrageously great software is software that does what it is supposed to do, in other words, honors its contract with its potential users. A user of a piece of software, most of the time, is another piece of software. Only the UI modules of a software system have humans as the end user.

So to deliver great software, it would pay to know how to make a software module honor its contracts with other software modules (tests are those software modules that represent the user). This leads to the realization that great software boils down to DBC - Design by Contract!

Other definitions, of course, abound. Great software is great because:

- It generates great revenue for the company.- It is a great market success- It is great performing. Lightning fast.- It has a great user interface. - It is bug free. Never breaks down.

You can deliver great software by other means, like test-driven development, and by being relentlessly test-driven. Having explicit contracts checkable at runtime, is complementary, and will get you there faster.

If you don't have a data model, come up with a domain object model from your requirements. Create a database schema from the domain object model. Don't go to step 2 until you have a domain model synchronized with the data model (synchronized does not mean identical!). If you are allowed to refactor the data model, do it. If not, live with it (and curse the DBAs!), but refactor the object model you must. If you have a highly denormalized data model, normalize it using Hibernate components and class-hierarchy mappings. Decent thought work needs to go into this. Can't be blindly automated.

Don't follow mad processes, like RUP, that tangle you up in use-case hell, and an object model that "falls out" of the use-case model through a bunch of VOPCs (View of Participating Classes). That's total hooey. Nail down the domain object model. Look into domain-driven design (Eric Evans).

2.b Define and implement unit and acceptance tests for the data access layer's API. This should be much simpler than the tests for the domain layer, since you are mostly implementing CRUD operations. But make sure at this stage to test business rules and data integrity rules (mainly having to do with correct object creation/deletion and relationship maintenance). You don't need to worry about business logic or business process at this stage. This separation of concerns can save you great deal of work later and is key to robustness of the domain layer which will rest on top of the data access layer.

3. Do steps 3.a and 3b in parallel

3.a Implement the domain layer. You can proceed one-story at a time, or one subject-area (a set of collaborating objects) at a time. I prefer the latter, but your project tasking may differ (if planning by story).

3.b Define and implement unit and acceptance tests for domain layer. Now is the time to think about business process and business logic. Use an acceptance testing framework like Fit (Framework for Integrated Tests - Ward Cunningham). You are resting on a solid foundation of a tested data access layer.

Note that in both layers you can follow test-first, or test-last approaches. I prefer test-first. It makes tests drive the development - but that is not a requirement - merely a preference and a sequencing decision. First, or last, make sure you have sufficient unit and acceptance tests implemented.

5. Don't let the GUI distract you. Implement as little of the GUI as you can get away with - preferrably none. Once the domain model API is functional, stable, and robust you can pay more attention to the UI. If the UI is necessary for demonstration to stakeholders, implement the UI in slices on top of fully tested domain layer. I like to implement a complete text ui, or a domain-specific language, that exercises the domain APIs.The GUI is left to the usability and presentation layer specialists, who will use the text UI to learn and exercise the API before they hook into it. They can have all the creative freedom they need to come up with the most user friendly and the best user experience they can come up with, comfortably knowing what the domain layer is capable of giving them.

Note: Many customers will not allow the implementation of a text UI, or a domain-specific language (as it seems unnecessary, since the
text ui, or the language interpter, is not made available to the end user), but for those customers
who allowed me to use it, specacular results in terms of a very rapid
schedule, and very performant code, were achieved. In the schedule, I
put that layer under "acceptance test framework", and that was
acceptable to customer management, who were glad to pay for the high quality and the rapid schedule.

June 08, 2008

At Intel, the largest semiconductor maker in the world, all full-time employees are given training in "constructive confrontation", a hallmark of the company culture. Intel preaches that the only thing worse than too much confrontation is no confrontation at all. So the company teaches employees how to approach people and problems positively, to use evidence and logic, and to attack problems and not people.

Intel's approach is backed by a series of experiments and field studies done at the Kellog Business School, Wharton Business School, and Stanford showing that destructive conflict is typically "emotional", "interpersonal". Groups that fight in these ways are less effective at both creative and routine tasks, and their people are constantly upset and demoralized. In contrast, these researches find that conflict is constructive when people argue over ideas rather than personality or relationship issues, which they call "task", or "intellectual" conflict.