Monday, December 03, 2007

Babbleknot New Features

The babbleknot.com team has been gulping diet rockstar & busy adding new forums and features. Currently the board is loading at the rate of about 4,000 new forums per day, with a goal of 150,000 boards during the beta. Just now over half-way there with 78,184 forums loaded.

I continue to be amazed at the variety of boards out there, and the activity that still occurs, even with the popular web 2.0 destinations. I was a fan of King of Queens, but below is a thumbgraph of an entire board dedicated to Leah Remini. Anything is there. You just have to look. In the graph below, the "green" lines are indications of direct replies in threads, which is a clue to relationships between members.

We'll crank up the search & daily indexing rates in the future. Right now, many graphs are done on demand or at the rate of 2,000 per day in a background job. Babbleknot is running on Amazon's Elastic Compute Cloud, EC2, with plans to add servers for spiders, large graph layout, and other types of large scale analysis as demand increases.

The new features are key milestones that I am happy to finally check off. Here they are in my perceived order of significance.

The biggest new feature in my opinion is background generation of graphs. Before, if you clicked a forum link & the data was unavailable, the board was parsed while you waited, hourglass clocking, driving you crazy. Now if the data isn't on hand, the generation is pushed to the background. Most complete in 30 seconds or so, but a large board with lots of content & posts will obviously take longer. The forum names appear on the right sidebar immediately & will change to a hyperlink once complete. Sending the generation the background is good for both usability & the overall performance of the site. Right now the requests are persisted for the current web session, but the thought is we will keep that history for a short time so you can go back a bit.

Finally, image flagging is now in with very basic functionality. Clicking the red flag next to content image will remove the image from view on subsequent retrieves of that graph. It's possible entire forums should be flagged. To me, flagging isn't necessarily bad. I believe there is a large segment of the population that would probably like to surf only the flagged content & boards. This will eventually become part of the Safe Search feature, which I think I'll brand as an opposite, like Wreckless Search.

More updates coming soon. Please post questions and comments if you love or hate babbleknot.com.

Sunday, December 02, 2007

Social Graph Visualization with Babbleknot.com

Babbleknot scans the index pages of over 70,000 message boards to generate thread / topic metrics. Things such as velocity, mass, acceleration to name a few. These metrics are used to generate input to a spider that indexes the content and generates graphs of the hot threads.

On the graph above, you see people, threads, and images, with lines representing relationships between objects with a "circular" layout.

Graphs? Why? The graphs provide a top-down view of the content, people and their relationships. You can see 10s, 100s of threads & topics at once. More coming hopefully as we learn to scale. In the future you'll also be able to generate graphs that span boards & forums. The best part of these graphs are that they aren't just static images. You can zoom & pan (screencam), and all the embedded content images are wrapped in a lightbox for easy display and perusal. You have quick access to the thread & user profiles with links representing post responsibility and direct quoting. More coming.

Privacy? Yep, there will be complaints. I expect some push back. I'll be happy to pull any board that doesn't want to be indexed. We'll be respecting robots.txt soon to make that easy.

What else is coming? Identity claiming so you can link your babbleknot.com identity to individual board identities, and delve deeper into your own social graph. Tagging of all kinds is 99% complete & will be on the home page soon. Full content indexing, thread tracking, alerts, content recognition beyond images, more board types, and maybe even posting capability.

Watch this blog for more information and give babbleknot.com a try today.

Tuesday, June 19, 2007

Java Powered Submarine

Friday, February 16, 2007

I picked this up in 1997 at the Java Internet Business Expo. It was held at Javits in NY & I remember McNealy spouting something about the 1st 100 days of Java. I know it was invented earlier... I am just telling you the marketing crap I heard that day.

The settlement is said to have been named for a petticoat lost (and evidently found) at a local dance. The garment had been recycled from an old coffee sack and had retained the stenciled name: Java

The area was first settled in the late 1840s and early 1850s by settlers from Alabama and Tennessee, but as a community, Java did’t expand until the 1890s, when prison crews from the Texas State Penitentiary in Rusk came to mine coal to fuel the state-owned iron furnace. A small trading post consisting of a general store and sawmill grew up at the site, and a post office was opened there in 1895.

In 1906, after the Texas State Railroad was constructed from Rusk to Palestine, the Java post office was closed. Within a short time most of the merchants and residents had moved to the newly founded town of Maydelle, on the railroad.

Thursday, August 31, 2006

Java Tip #11 - Beware the Fickle Session ID

In March of 2005 I posted the article Performance 101 - Avoiding Work. In that tip I described a process where we store the users last jdbc connection in a hashmap using the HTTP Session ID as the key. In the last few months, several of our customers had been experiencing random, unexplained connection leaks when under a heavy load.

We were unable to duplicate this issue in house, even though we executed the same business process against a copy of the customer database. This of course made it very hard to find... we had no choice but to debug a production system. A couple of weeks ago I flew to Minneapolis (nice place, btw) for a week and the hunt was on.

I had been looking at all our fancy connection handling code trying to find the leak, and missed the problem for several days. Finally after staring at logs for several hours on the day we were scheduled to leave, we noticed something odd. The HTTP Session ID changed for a LIVE session. I had never even considered that a Session ID could change after a user logged in. Here is the scenario that caused the leak.1) Connection X goes into map with key 123!456!NONE2) Session ID changes to 123!456!78903) The next request comes in, there is NO connection in the map using key 123!456!7890, so connection Y is retrieved from the connection pool.4) Connection X just leaked.

The servlet api doesn't say anything about the ID changing, only that it must be unique. The underlying issue is Weblogic appending proprietary information to the session "cookie" and returning that as the ID when HttpSession.getId() is called. We use the default name of JSESSIONID for the cookie and a sample would look like:JSESSIONID=1E9Xwn7nLYfOsc1oSD7iaWHMXzpHga5cQj!-1587343083!-1587348922

If you experience a session replication failure, the SecondaryServer JVMHash will change to NONE.JSESSIONID=1E9Xwn7nLYfOsc1oSD7iaWHMXzpHga5cQj!-1587343083!NONE

We actually saw it start with NONE, and then get a JVMHash. The theory is that the replication failed, then succeeded at a later time. We also wonder if during a heavy load, did a delay in replication cause the ID to begin at NONE then eventually get assigned. We didn't have the opportunity to research, but plan to during our next load test.

Our fix was to use only the 1st 52 bytes of the value returned by HttpSession.getId() as the key into the hashmap. The length is configurable in weblogic.xml, so you need to use that configured value as the length.

I'm sure BEA just doesn't know what to do here. There is probably tons of code out there relying on HttpSession.getId() returning the full JSESSIONID cookie, so changing its default behavior would not be a good idea.

Watch out for the fickle session id! Please let me know if you've experienced this before on application servers other than Weblogic. I tested this on 8.1 - 9.2 and saw the same behavior.

Friday, May 26, 2006

Java Tip #10 - Constructor Exceptions are Evil

Suppose I have the following class Foo and the constructor of Foo throws an exception. This might be due to some kind of security constraint or an invalid state. Suppose we don't anyone unauthorized to call the doEvil() method.

The moral of this story is to use the final keyword if you want to prevent incomplete instances from being accessible. The presenters of the session also suggested using a "initialized" field that is set as the last line of the constructor. A critical method could validate this field before doing work. I think final is a better solution.

Friday, May 12, 2006

I spent some time developing a proof of concept for Tapestry and RSP-UI for others to review. I've forwarded the POC directly to Wolfgang Gehner at Infonoia for possible integration into the sample application.

The zip file contains two workspaces that can be saved to your c:\rsp\workspace folder & imported into eclipse. The org.apache.tapestry plug-in contains the latest tapestry jars & dependencies. The org.rsp.sample.usage.tapestry plug-in implements the Tapestry "DirectLink" quick start application. You'll have to add a "link" file to the rsp web app that references the plug-ins. If you don't understand what I just wrote, review the article at Infonoia to get the RSP-UI development environment setup.

A few points:- The tapestry servlet path in app.application is set to account for the "platform" node. <meta key="org.apache.tapestry.servlet-path" value="/platform/app"/>- A servlet that extends org.apache.tapestry.ApplicationServlet is used but only overrides one method. This helps hivemind & tapestry find everything.@Overrideprotected ClassResolver createClassResolver() { return new DefaultClassResolver(this.getClass().getClassLoader());}- We are interested in using RSP-UI in our application framework if it makes it as a full eclipse project.

I will try to answer questions on the POC, but I am not really a tapestry expert. Enjoy.

Wednesday, April 19, 2006

Salon Newsletter PHP errors

Monday, April 17, 2006

I found this thread on Artima forums discussing where logic should reside in database apps: stored procedures or java.

We've had this discussion many times at my place of work & I am currently on a project with zero stored procedures. But stored procedures are used heavily in our still strong legacy app & they helped us achieve acceptable performance at the time. One of the issues I remember from working on the legacy app was the problem that stored procedures caused with caching. With stored procedures updating rows w/o the knowledge of the application server, it makes cache consistency a real problem. We worked out a couple of ways to address this: 1) callbacks to the application server and 2) bringing cache into java vm in oracle database. We wrote some "stored procedures" in java as well.

I am in the camp that stored procedures have their place in batch processes that need the performance boost. But you need to plan for them up front & design your architecture accordingly.

Pair that up with the toString() result of your connection and you have a pretty decent way to figure out what isn't getting closed. Oh yeah, use some dummy exception logic like this to print how each connection handled. This is definitely NOT for use in production.

Connection leaks do suck. Oh, metauser isn't an oracle function, it's one of ours. But it also has a purpose in this connection pool world. The metauser is established when a connection is given out based on the user identity, since the Oracle user function will return the id for the pool. We call metauser & store the value in some session level package variable. We use metauser in stored procedures (which we have loads of) instead of user. I hear there is a way around this in Oracle, but not sure how to do it. Ideas welcome.

Thursday, May 05, 2005

Business Week has an interview with James Gosling. I like this response.

Q: Many people have tried to take credit for Java's success over the years. At this 10-year anniversary, are there any unsung heroes?

A: One of the things that has always kind of bugged me is everyone talks about me as the guy who created Java. That was true up until about 8 to 10 years ago. I wrote the first thing. But you look at the engineering teams we have today -- we have many really talented people that nobody ever hears of.

Friday, March 11, 2005

Performance 101 - Avoid Work

We have a "performance" team at my day job. I am often unofficially assigned to this team when we are benchmarking a new release. While this is important stuff, I think more needs to be done to train the developers on performance basics.

First off, let me say that I do not consider myself an expert on performance. I do know how to analyze performance data and tune algorithms, but that's down in the details. I think someone experienced in performance begins with good design and then iteratively improves performance. The number one rule I tell people is the fastest way to execute a process is to NOT execute the process. Meaning, you should always be looking for a way to avoid work.

I'm going to explain a bizarre application to you now... we have millions of lines of PowerBuilder code that is still in production at over 100 companies worldwide. We started on Java several years ago and just over a year ago we cut the cord to the database and started sending all traffic thru the application server using XML over HTTP. Note that we were doing this to build a "merged-client" several years ago when SOAP was still something you washed your butt with, so we have yet to adopt it. This gives us a "psuedo n-tier" architecture that performs well on some things and sucks on others. Any kind of chatty process will suffer due to the extra network hop & xml parsing. Going XML may not have been the best idea in retrospect, but the performance is fine now. It just took a lot of tweaking to get there. The real problem is that extra network hop. If you came from a PowerBuilder or VB environment, you know how C/S apps work. Many of our processes were coded to act on a single row at a time & have performance issues now that the Oracle client is a hop away. Often the users identity is embedded in the SQL. (such as the user variable in Oracle)

Back to performance. When the client opens a transaction, the connection is stored in the http session. This works fine & we use SessionListeners to make sure things get cleaned up. Really haven't had too much trouble with it even though it is frowned upon as a practice. Now, sometimes the application is just running in a tight loop doing "selects" and not updates. (stupid, but that's how it was coded 6 years ago) The design was to return the connection to the pool after each request IF there was no active update transaction. That connection pool time becomes significant when you include having to establish the users "context" in the database to set their timezone, locale, and identity. The solution was to "avoid the work". I took the ConcurrentHashMap & DelayedQueue classes in JDK 1.5 and created a DelayedHashMap. The connection is stored in this DelayedHashMap & returned from there instead of the regular pool. After a configurable time, the object expires and is returned to the pool. So when we're in a tight loop situation, this constant need to reeestablish the users identity is alleviated and performance improves significantly.

The most effective way to speed up a process is to avoid it altogether.

Sunday, March 06, 2005

MyEclipse has improved

I bought a subscription to MyEclipse over a year ago, but let it expire. Recently I downloaded an update and after using it for a couple of weeks, I must renew my subscription. We use Struts Studio at the office. In fact, I helped select it. But this $30 / year package blows it away in many areas. It goes beyond just doing struts development and gives you nice little tools for everything. If you use multiple appservers, you'll really appreciate the support for just about every application server around. Struts Studio doesn't try to be this comprehensive and seems to only concentrate on the struts aspects. Exadel, the maker of Struts Studio, does have a JSF product now.

Give MyEclipse a try if you are doing web development. I think you'll be pleased. A 30 day trial is available for download.

Tuesday, February 01, 2005

Java Tip #8 - Go to JavaOne

Go to JavaOne. I have attended the San Franscisco summer JavaOne for the past few years. The 2004 event was a return to the glory days. The session quality continues to improve and the evening BOFs are great.

Bring your laptop cause wi-fi abounds. In the evening BOFs, beer is acceptable everywhere. I've also seen folks buying beer during the day at the snack bars. That would just make me sleepy... not like you can sit there and have a dozen. I always run into people I know and there are plenty of opportunities to strike up conversation while waiting for a session to start.

Last year the hands-on labs were excellent and covered topics from JSF to customizing swing to peer-to-peer with java.

The relaxation areas are the best with video games both old & new available for free.

Of course, you can't beat San Franscisco for food & fun. I highly recommend Johnny Foley's for Fish & Chips & a pint of Guinness. Go down to the wharf & eat some crab. It's good at every place I've tried. Also, the cablecar museum is extremely cool.

To those that complain about the quality of the sessions, I have this to say. If you don't like the topics, do a presentation on something that interests you. Be part of the solution, instead of part of the problem. Let's see... put up or shut up... if you can't say something nice, don't say anything. Stuff like that.... oh yeah, do no harm.

I don't work for Sun, but I have some visibility into the process & believe they do the very best job possible in selecting talks and think JavaOne is well worth the $$$ spent. Of course, if you give a talk they waive your fee. The call for papers ended on 1/31/05, so start planning for next year.

I'm planning to attend again this year. Once the list of session is released, I'll add my picks for 2005.

Monday, January 24, 2005

Java Tip #7 - Use .hotspot_compiler file to stop compilation

The hotspot compiler has known bugs that can cause the compilation thread to go cpu bound. We had seen this behavior occasionally in production. A customer would call & report that no one was logged into the system, but it was at 100% on one cpu. The instance would normally be restarted & things would be fine. We couldn't figure out why & of course the customer was frustrated at having to restart the system. I can't blame 'em. For a time we had no clue what was causing the problem.

Recently we did some load testing at Sun's Market Development Engineering lab (very cool, btw) and really stressed our application. We kept seeing this "java/9" thread go to 100% cpu during the warmup period. It didn't happen every time, but it happened often enough to slow us down. The busy process was visible in prstat using the -L flag to list lightweight processes (lwp). We used pstack to look at the offending lwp & saw it was the hotspot compiler. We applied the +XX:PrintCompilation vm flag and found that the compilation was stopping at various times & it always seemed to happen on the same methods. We used the .hotspot_compiler file to exclude various methods and the stuck thread problem was solved.

The .hotspot_compiler file goes in working directory & has a line for each method to be excluded.
For example:
exclude com.whatever.TheClassName theMethodName

This would prevent the com.whatever.TheClassName.theMethodName() from being compiled.

Thursday, January 13, 2005

Java Tip #6 - Don't capitalize first two letters of a bean property name

This is in our java standards. You should not create a java bean property name that begins with a capital letter in the 1st two places. It can lead to confusing results. We had this happen a few times & finally added it to our standards & enforce it in code reviews. One place we saw problems was in struts. The form bean properties are used in the JSP page, but the Struts framework has to use the getter() & setter() to interact with the bean. This mapping happens based on the java bean spec & in certain cases can cause a method not found error if the developer doesn't name the method just right. The java bean spec provides guidelines on how to map between property and the associated getter() & setter().

Thus when we extract a property or event name from the middle of an existing Java name, we normally convert the first character to lower case. However to support the occasional use of all upper-case names, we check if the first two characters of the name are both upper case and if so leave it alone. So for example,

“FooBah” becomes “fooBah”
“Z” becomes “z”
“URL” becomes “URL”

We provide a method Introspector.decapitalize which implements this conversion rule.

Tuesday, January 11, 2005

Java Tip #5 - Avoid 64KB method limit on JSP

Ok, this is hard to do on purpose with plain old java, but I've seen plenty of JSPs that when compiled had a method that exceeded the 64KB limit. Depending on the VM it may just barf and end with a hotspot error or it might give you some kind of useful message. I find it's just a hotspot error. Tomcat describes this problem in their release notes.

The simplest way to fix this is to break the page into parts so that each compiles separately using a jsp include. Avoiding big, monolithic JSPs is easy if you use Struts with Tiles. Of course, this shows how behind I am now since I'm not using one of the newer frameworks.

This seems to be less of a problem as container vendors have improved. They all generate different code, so it varied in our environment from Tomcat, SilverStream, and BEA. At one time or another, we had JSPs that exhibited the problem on one of the three platforms, but not the other two. I am happy to say we now target only one vendor, BEA.

The amount of code per non-native, non-abstract method is limited to 65536 bytes by the sizes of the indices in the exception_table of the Code attribute (§4.7.3), in the LineNumberTable attribute (§4.7.8), and in the LocalVariableTable attribute (§4.7.9).

Note that using the @include file directive does not solve the problem, since it just inlines the JSP. You need to use a jsp include or tiles or similar.

Java Tip #4 - Endorsed Standards Override Mechanism

An endorsed standard is a Java(TM) API defined through a standards process other than the Java Community Process(SM). These standards might be revised between releases of the Java 2 platform. Sun defined the Endorsed Standards Override Mechanism (ESOM) to allow a developer to provide a newer version of an endorsed standard than those include in the Java 2 platform.

The classes that need to be overridden should be jarred up & placed in one or more directories specified by the java.endorsed.dirs System property. If unspecified, then the default location is

Java Tip #3 - Don't use fields on Struts action classes

This is very specific to the Struts web application framework. Don't use a field on a Struts action class. Struts caches action classes & reuses them to call the execute() method. (perform() on 1.0) This can cause unexpected behavior & performance issues under a load. Add a check for this to your code review process.

Thanks to the anonymous person who pointed out even without the 'static' modifier, this is a bad idea. Unless the field is marked as final, just don't do it.

Monday, January 10, 2005

Java Tip #2 - Contiguous Switches

When coding a switch statement, use contiguous ranges even if they are dummies. The compiler can optimize switches that use contiguous ranges. If you decompile to bytecode using DeCafe or similar, you SHOULD see the "tableswitch" bytecode where this switch begins. If you see "lookupswitch", it will use a O(n) lookup instead of directly jumping to the statement. I've seen nice performance increases using this technique for large switch statements.

Here is an example class with three methods. The method switchTest1 has contiguous values for 1-5. The method switchTest2 has values for 1 and 3-5 and the final switchTest3 has values for 1,3,4, and 128. This is a nonsensical class, but does show the behavior.

The results are that:
1) switchTest1 uses the desired "tablelookup".
2) switchTest2 is modified by the compiler to add the missing "2" to the default case and thus uses the desired "tablelookup" bytecode.
3) switchTest2 is too complicated for the compiler to resolve, so it falls back to the "lookupswitch" bytecode.

Below is a decompile with bytecode annotations from DeCafe of a 1.4.2 built class. Need to test this in 1.5.

Java Tip #1 - Constant Comparison

I'm going to start writing down some of the java tidbits I've picked up over the years. I find I'm starting to forget some of them... so this serves as a permanent reminder for myself and possibly education for some. Many of these will seem trivial and I doubt any are original ideas.

When comparing a constant to a variable, always put the constant on the left side of the expression. This will prevent NullPointerExceptions and avoid extra comparisons to null.

In a code review, I typically will make a developer switch to the style in greeting3 if there is a chance of an NPE. If it's already guarded by a comparison to null, I'll point it out but let the code pass review. Some people might actually prefer the explicit comparison to null for readability.