Wednesday, July 15, 2009

A few days back a couple of interns working in my office came up to me and said "It's not working". They were talking about a part of an application they had built. Their application was invoking an API given out by my team and it was throwing an error. They were invoking one of our Web Services through a client side Java API provided by us. Their app was running on Tomcat and that was accessing our product that runs on another Tomcat server. So they have access to both these servers. After we exchanged a few IM messages we solved the issue - but I am sure that if they once again face any issue, they would not immediately ping and ask "Why is it not working?".

After we figured out the issue, I was thinking of sending a note to the interns on the first step they need to do when they face any such issue and I ended up writing this.

Check the Logs. Repeat- Check the Logs.

From what I have seen 99.99% issues get solved the moment you Check the Logs. And Yes, many people don't do that before they report the issue. In the present case, the WS client in the SDK was throwing an exception and it said "An Error occurred with the client". So when they were seeing this, they actually could not make out anything. But that was at the client side. The first thing anyone should do is check the server logs. Server logs in this case would mean the Application logs on the App Server where the service is hosted and the Application server's own logs.

Many times you see that the logs are way too much to deduce anything. They run into pages. And I have seen, this very fact demotivates any newbie who looks at it. In that case, If the issue can be replicated I suggest -

Shutdown the server and client

Backup your logs

Cleanup the logs directory

Restart the servers and reproduce the issue

Now the logs are much lesser and comfortable to read. Also it would be helpful if you can clearly separate the server start-up logs from the subsequent logging that happened during the issue replication. Now read that line by line and most of the times the issue would be solved. If not, read every line of the logs few more times. Yes, Few more times.

Still unable to figure out?

Check the logging level

Every logging framework provides you various logging levels, that you can configure. Most of them have a "debug" logging mode, which when enabled- logs the most granular details. Enable this level of logging on the application and the server. Well written applications log their code flow on to the debug level. Hence with this level of logging set, there is a very high chance that you get to reach the root of the problem.Even if you are not able to figure out the issue even after looking at the debug level logs, you still have helped the person who has to take a look at them. You have reduced the turnaround time. These debug logs will definitely help the API/Service developer who is going to look at your issue. He now has most of the information with him.

Not everything on the Server - There's client too

Most Apps are like this - You write a client program that accesses a service running on a server. The Application that runs on the server would also provide client side API's that would do the task of marshalling your arguments and passing it on the wire through SOAP/RMI etc. A whole lot of things happen between the time you invoke the client API and the time the message is sent on the wire (Goes out of this VM). The same is the case when you get the return message - from the time the message comes through the wire and till the time it's handed over to your client program, in the format it accepts. Typically these are marshalling/unmarshalling tasks carried out by the client side API's. Now the concern is - what happens if something goes wrong here?

Most times we ignore the logging/trace on the client side. Yes, on the client side too you could configure the logging level. The way you do this depends on the logging framework that the API you are accessing uses. Most times you would have to place a logging config properties file in the client classpath. The logging framework on the client API that you use, loads this file and determines the level of logging and the appender/handler to which logs have to be written.So once you have the client side logs, you would be able to figure out what really was sent from the client, what it received and other info related to marshalling/unmarshalling etc. One suggestion that I have here - In case you have access to both the client and server ensure that both of them have the same system time. With this, you could easily track the flow from client to server and the way back with respect to time.

All these really help even if you are debugging an issue on something that you have developed or are developing - Something that can be done before debugging your code by placing breakpoints or even worse - by adding those System.out statements.

Too many threads?

Imagine the case where you have too many threads in your application that are doing their work simultaneously. The log records might be jumbled together. In such a scenario, to be able to uniquely identify and track the flow of one particular thread, you could have an identifier in the log record that contains the thread name.

Logging only failures?

A lot of times when your application is running on production, it would not be a very good idea to log all of the threads on the server side. You would be interested in logging only the failures. In such a case, what you typically do is - Accumulate the log records for each thread and at a later point of time depending on the success or failure of that thread, you decide whether to write its records to the log files or not. Note that these records are in the memory for the whole duration of the operation - and that could be quite expensive.

Beware

Most application servers would have their own logging framework which logs the tasks performed by the server. In certain logging frameworks you need to tell the framework when to start and stop logging on a thread. So when the server assigns a particular thread from a pool to your request it would do the job of telling the framework to start the logging. So If you explicitly create threads on an application server, these threads might not be logging information at all, though your code might have log statements.

Another important thing to note is not all issues that you encounter will get replicated even one more time. A lot of times I have seen that the moment you change the logging level or add a new log statement the issue goes away. Often this the case with issues related to multi-threaded programs, where even one new instruction would change the code dynamics or timing.

Solved. What should I do?

If a log information really helped you to better understand an issue, you then have a very important learning. You too should "Log" in your application. Many times I end up spending more time determining what should be logged and at what levels than the time I take to write the code. After all, logging is an art.