Thoughts on Java, Java EE and web services development

Wednesday, August 11, 2010

I always feel that the mighty IDE is a developer's most precious possession - just as a pen is for a calligrapher. It is a developer's personal choice.But many a times it simply doesn't work that ways. You might have to switch to a different IDE when you change your employer (or when your company is acquired- Not sure if Sun Alumni use JDeveloper now - Oracle might say NetBeans is cool but JDevloper is cooler). I was fortunate enough to stick on to IntelliJ idea for more than five years now. Many a times I have made forced attempts to switch to Eclipse - but in vain. For pure Java development, the features are pretty much the same- but the difficult part is the new look and feel and the keyboard shortcuts. More than the keyboard shortcuts it was the appearance that was an issue for me - the syntax coloring and font. The other reason could have been that every time I decided to move to Eclipse - I always had some pending work or deadlines. You won't like to change your IDE when you really need to write code.

Here at IBM for reasons obvious - it's Eclipse everywhere. In fact many components check-in the .project and .classpath files into the source control. Now that I am moving to a new team, I decided this time I would give it a serious try. Luckily this time around, I seem to have settled in a bit with it. My requirement was to make eclipse *look* like Idea, rather - make the code in Eclipse look as if it was opened in Idea. Then the keyboard shortcuts.

For the keyboard shortcuts, the IdeaKeyScheme by Santosh works perfect. On top of this plugin, I have applied Idea code style, syntax coloring and comment/javadoc format defaults and added few other idea keyboard shortcuts. I have exported those preferences and can be found here.

Applying these two, and after writing code for the past week - I am now as comfortable as before on my Idea'l Eclipse.Idea Preferences download

Friday, November 13, 2009

The iterators of the Collection implementations of the Java runtime throw a ConcurrentModificationException[1] when they detect that another thread has modified the Collection while a thread is iterating over it. Such iterators are generally called fail-fast iterators.

Looking at the implementation one can see how this has been made possible. The Collection subclass maintains an integer modCount that is incremented on every operation that structurally modifies the collection (like add, remove, clear). The fail-fast iterators also contain an integer field expectedModCount which is initialized to the modCount while the iterator is created. Later on, during every iteration, the iterator verifies if the expectedModCount is same as the modCount of the collection it is iterating over. A mismatch means that the collection has been modified during the life cycle of the iterator and a ConcurrentModificationException is thrown.The same happens during the collection serialization process. The modification count prior to writing to the stream should be same as the count after writing to the stream.

But is this behavior guaranteed? Are these iterators always fail-fast? I used to think so, until I saw the declaration of the modCount field in the Collection implementations.

The modCount field is not marked as volatile. This would mean - while a thread makes structural changes to the collection and increments the modCount field, the other thread that performs the iteration or serialization might not see the incremented value of the modCount field. It might end up doing a comparison with the stale modCount that is visible to it. This might cause the fail-fast iterators to behave in a different way and failing at a later point. The field is non-volatile in Apache Harmony too.

A bug[2] in Sun Java explains this. It says the modCount was intentionally made non-volatile. Making it volatile would have added contention at every collection modification operation because of the memory fencing that’s required.

Also the Javadoc clearly says “fail-fast behavior cannot be guaranteed as it is, generally speaking, impossible to make any hard guarantees in the presence of unsynchronized concurrent modification”.

This behavior would be the same even on the Synchronized wrapper's of these collections.

Wednesday, July 15, 2009

A few days back a couple of interns working in my office came up to me and said "It's not working". They were talking about a part of an application they had built. Their application was invoking an API given out by my team and it was throwing an error. They were invoking one of our Web Services through a client side Java API provided by us. Their app was running on Tomcat and that was accessing our product that runs on another Tomcat server. So they have access to both these servers. After we exchanged a few IM messages we solved the issue - but I am sure that if they once again face any issue, they would not immediately ping and ask "Why is it not working?".

After we figured out the issue, I was thinking of sending a note to the interns on the first step they need to do when they face any such issue and I ended up writing this.

Check the Logs. Repeat- Check the Logs.

From what I have seen 99.99% issues get solved the moment you Check the Logs. And Yes, many people don't do that before they report the issue. In the present case, the WS client in the SDK was throwing an exception and it said "An Error occurred with the client". So when they were seeing this, they actually could not make out anything. But that was at the client side. The first thing anyone should do is check the server logs. Server logs in this case would mean the Application logs on the App Server where the service is hosted and the Application server's own logs.

Many times you see that the logs are way too much to deduce anything. They run into pages. And I have seen, this very fact demotivates any newbie who looks at it. In that case, If the issue can be replicated I suggest -

Shutdown the server and client

Backup your logs

Cleanup the logs directory

Restart the servers and reproduce the issue

Now the logs are much lesser and comfortable to read. Also it would be helpful if you can clearly separate the server start-up logs from the subsequent logging that happened during the issue replication. Now read that line by line and most of the times the issue would be solved. If not, read every line of the logs few more times. Yes, Few more times.

Still unable to figure out?

Check the logging level

Every logging framework provides you various logging levels, that you can configure. Most of them have a "debug" logging mode, which when enabled- logs the most granular details. Enable this level of logging on the application and the server. Well written applications log their code flow on to the debug level. Hence with this level of logging set, there is a very high chance that you get to reach the root of the problem.Even if you are not able to figure out the issue even after looking at the debug level logs, you still have helped the person who has to take a look at them. You have reduced the turnaround time. These debug logs will definitely help the API/Service developer who is going to look at your issue. He now has most of the information with him.

Not everything on the Server - There's client too

Most Apps are like this - You write a client program that accesses a service running on a server. The Application that runs on the server would also provide client side API's that would do the task of marshalling your arguments and passing it on the wire through SOAP/RMI etc. A whole lot of things happen between the time you invoke the client API and the time the message is sent on the wire (Goes out of this VM). The same is the case when you get the return message - from the time the message comes through the wire and till the time it's handed over to your client program, in the format it accepts. Typically these are marshalling/unmarshalling tasks carried out by the client side API's. Now the concern is - what happens if something goes wrong here?

Most times we ignore the logging/trace on the client side. Yes, on the client side too you could configure the logging level. The way you do this depends on the logging framework that the API you are accessing uses. Most times you would have to place a logging config properties file in the client classpath. The logging framework on the client API that you use, loads this file and determines the level of logging and the appender/handler to which logs have to be written.So once you have the client side logs, you would be able to figure out what really was sent from the client, what it received and other info related to marshalling/unmarshalling etc. One suggestion that I have here - In case you have access to both the client and server ensure that both of them have the same system time. With this, you could easily track the flow from client to server and the way back with respect to time.

All these really help even if you are debugging an issue on something that you have developed or are developing - Something that can be done before debugging your code by placing breakpoints or even worse - by adding those System.out statements.

Too many threads?

Imagine the case where you have too many threads in your application that are doing their work simultaneously. The log records might be jumbled together. In such a scenario, to be able to uniquely identify and track the flow of one particular thread, you could have an identifier in the log record that contains the thread name.

Logging only failures?

A lot of times when your application is running on production, it would not be a very good idea to log all of the threads on the server side. You would be interested in logging only the failures. In such a case, what you typically do is - Accumulate the log records for each thread and at a later point of time depending on the success or failure of that thread, you decide whether to write its records to the log files or not. Note that these records are in the memory for the whole duration of the operation - and that could be quite expensive.

Beware

Most application servers would have their own logging framework which logs the tasks performed by the server. In certain logging frameworks you need to tell the framework when to start and stop logging on a thread. So when the server assigns a particular thread from a pool to your request it would do the job of telling the framework to start the logging. So If you explicitly create threads on an application server, these threads might not be logging information at all, though your code might have log statements.

Another important thing to note is not all issues that you encounter will get replicated even one more time. A lot of times I have seen that the moment you change the logging level or add a new log statement the issue goes away. Often this the case with issues related to multi-threaded programs, where even one new instruction would change the code dynamics or timing.

Solved. What should I do?

If a log information really helped you to better understand an issue, you then have a very important learning. You too should "Log" in your application. Many times I end up spending more time determining what should be logged and at what levels than the time I take to write the code. After all, logging is an art.

Thursday, May 21, 2009

These days I often use the routing and "Get Directions" of Google Maps on my Window Mobile. And Yes, It works in India too to a great extent. One feature that always surprises me is "My Location" - Auto-detecting my geographic location. Simply because my mobile does no have a GPS receiver.

So how does Google find out my location to a precision of 500 metres (sometimes 5KM when I go for a drive outside the city)? When I used to talk to people about this the unanimous answer was "Google finds it from your mobile signal". I too was under the assumption that Google Maps, through your handset asks your mobile phone operator the geographical co-ordinates of the tower from where it's catching the signal. Simple huh? But not.

On reading I found out that the Mobile operator does not say anything about the location coordinates of your cell tower or your device. Google does a trick here. Each mobile coverage cellular area (remember the Hexagon shaped cell) has a number attached to it (Some kind of serial number to uniquely identify that hexagon shaped cell). Google then uses a database that maps this cell number to geographical coordinates.

But how does Google get the coordinates that fall under that cell?

To do that, Google uses other GPS-enabled devices that are active in your cell area. It collects data like geographical location and service cell ID from these anonymous devices. At the end what Google has is an expanding database of service cell ID (Vs) set of coordinates that fall under that cell. So that's how the "My Location" in a Non-GPS enabled device is a bigger circle and Google says its your location approximated to some specific metres. It also explains why Google does not show it at all at some remote places when I travel on the highway outside the city. Looks like nobody has gone there and activated their GPS reciever and "helped" Google :-D

Monday, February 16, 2009

Everyday as I use Google talk, I wonder "Why are they not coming up with a new version? It's been almost two years now and every month they add something new to the web based talk." Be it invisible mode, conference chat, video chat or even emoticons - the web gadget has it all.

Whenever I read about a new feature in the web version, I make up my mind to start using the talk gadget or Gmail chat. I login through the gadget, start chatting with people, in parallel do my work, and at some point when I do an alt+tab to bring the browser to focus, I see my friends irritated - I have not seen their messages for the past 30 minutes :-( Yes, those notifications are something I miss the most in the web version. If you minimize the browser, while a different tab was on focus, you would not be able to see the blinking Google talk tab, when someone messages you. For me, when it comes to an instant messenggers, the usability is much higher when it sits on the desktop.

And the need to use Google talk - All my friends have moved to it.Otherwise I still prefer Yahoo! Messenger. Thank God I am not a Linux user - they don't even have the desktop version on Linux. Sigh.

After seeing Google Chrome, my doubts of Google never again releasing a new desktop GTalk version has gone up steeply. The new home page, the new tab page, the very idea of application shortcuts, the browser task manager etc. look a lot like Google's attempt to make the browser the new home of the PC.

Does Google consider browser to be the next desktop? Sometimes it looks like; but remember - Microsoft Sells and would continue to.

as key leads to the entry being garbage collected.Huh!Tried all combinations. No clarity. Last resort - Pinged Rajiv.And so went the conversation -

Me : If you put ("abc" + "def").intern(); as key, it doesnt get GC'd but if you put (new String("abc") + "def").intern() it gets GC'dRajiv : Decompile and see if "abc"+"def" is being converted to "abcdef" by javacMe: Yes it is. So?[This could be the clue. Am still thinking.. tick tick tick]

Rajiv: Check if "abcdef"== (new String("abc") + "def").intern()Me: It is... printed the identitity hashcodes.Rajiv: In the class you have both "abcdef" and (new String("abc") + "def").intern() and still (new String("abc") + "def").intern() gets gc'ed?Me: God! Then it doesn't get gc'd.[Now Rajiv cracks it -]

Rajiv:"I think intern is weak map and constant pool has a strong ref"Me: ohh!Me: In that case (new String("abc") ).intern(); should get GC'd right? But we saw it doesn't. The maya happens only when someString is '+'d to (new String("abc")) and then the resultant String is interned.Me: Just (new String("abc")).intern() doesnt get GC'd.Rajiv: When you say (new String("abc")).intern() there is a string "abc" in constant pool.Me: Yes "abc" in constant pool would be the literal we created and passed as argument to the String constructor.Rajiv: (new String("abc")).intern() returns that string. So wont get gc'edMe: Oh yeah. Got it!Me: So only when you do a "+" you get a String which is not there in constant pool and hence it gets GC'd ...Rajiv: ya right.

I had earlier thought of intern pool and constant pool to be the same. But Rajiv 's prediction of intern being a weak map and constant pool holding a strong ref looks quite convincing.Oo la.. That solved our mystery. Thanks Rajiv:-)

while (true) { System.gc(); /** * Verify Full GC with the -verbose:gc option * We expect the map to be emptied as the strong references to * all the keys are discarded. */ System.out.println("map.size(); = " + map.size() + " " + map); } }}

What do we expect the size of the map to be after full GC? I initially thought it should be empty. But it turned out to be 2.

Look at the way the four Strings are initialized. Two of them are defined using the 'new'operator, whereas the other two are defined as literals. The Strings defined using the 'new' operator would be allocated in the Java heap, but the Strings defined defined as literals would be in the literal pool.The Strings allocated in the literal pool (Perm Space) would never be garbage collected.This would mean that String 'str2' and 'str3' would always be strongly referenced and the corresponding entry would never be removed from the WeakHashMap.

So next time you create a 'new String()', put it as a key in a WeakHashMap, and later intern() the String, beware - Your key will always be strongly referenced. [Invoking intern() method on a String will add your String to the literal pool if some other String equal to this String does not exist in the pool]

//Discard the strong reference to the key name = null; while (true) { System.gc(); /** * Verify Full GC with the -verbose:gc option Since there is no strong reference to the key, it is assumed that the entry has been removed from the WeakHashMap */ System.out.println(cache.size()); } }

Now when the testMethod() is run what do you expect the output to be? Since the strong reference to key is discarded, we assume that the entry from the map would be removed, and map would be empty after a full GC.But that does not happen though.

Let us see what was the put operation on the WeakHashMap.

cache.put(name, new ComplexDO("1", name));

Here the value ComplexDO was holding the key name. This would mean that the value always strongly refers to the key, and hence the key would never be garbage collected. The entry would always remain the map.

This is what WeakHashMap API says - "The value objects in a WeakHashMap are held by ordinary strong references. Thus care should be taken to ensure that value objects do not strongly refer to their own keys, either directly or indirectly, since that will prevent the keys from being discarded."

//Since s1.equals(s2) is true and hash is same, the earlier value //against key s1 ("good") in the map is replaced by the new one. ("ok")

8 s1=null;

9 System.gc(); //Verify Full GC with the -verbose:gc option

10 System.out.println(map.size());11 }12 }

What do we expect the output to be? 1? No, Not exactly.

Here s1 and s2 are two different objects on the heap. So in line 5, a new (key,value) pair with key s1 is put into the map. Later when a (key,value) with key s2 is being put into the map, it checks for equals on s1 and s2 and their hashcode. When it finds the equals returns true and hashcde is same, it replaces the value of the earlier entry with the new value. But the issue(?) here is, WeakHashMap/HashMap does not replace the earlier key while adding a (key, value) pair whose key is actually a duplicate key in the map.So even after putting an entry with key s2, the WeakHashMap has only one entry whose key refers to the object refered by s1 and not s2.Now the object on the heap refered by s1, has one strong reference(through s1) and one weak reference through the WeakHashMap.Later when I say s1=null, the object on the heap refered to by s1 lost the strong reference and when gc happens, the entry is removed from the map.

So thats how it works.

Also note WeakHashMap is only a wrapper over HashMap and the HashMap's put api says " If the map previously contained a mapping for this key, the old value is replaced by the specified value."

So just be careful when you use WeakHashMap and your usage scenario is similar to the above.