Tapestry Training -- From The Source

Friday, July 30, 2010

Thank you Google Alerts, for pointing out this article on choosing a Java web framework. It's over a year old, but I think the things that make Tapestry special have only gotten stronger in the intervening time.

Wednesday, July 28, 2010

By default, Mac OS X uses a case insensitive file system, and Git seems to honor that. The problem is, most programming languages, especially Java, are case sensitive. Class "JavaScriptSupport" needs to be in file "JavaScriptSupport.java" and not "JavascriptSupport.java". This is even worse when sharing code via a repository since some other developers may check out code on a case sensitive file system.

I was just renaming some classes, from things like "JavascriptStack" to "JavaScriptStack" (because the language is called "JavaScript" not "Javascript") ... and I was dismayed that Git saw that as an in-place update to a file, not a rename of the file.

Unfortunately, it's not as simple as git config core.ignorecase false to make Git do the right thing. That's an essential part of it, but Git still sees changes to the original naming of the file as a change, not a deletion.

I had to use the trick of one commit renaming JavascriptStack.java --> JSStack.java, then a second commit renaming JSStack.java --> JavaScriptStack.java.

Monday, July 26, 2010

I'm once again partnering with SkillsMatter to teach my full Tapestry workshop. This is an expanded version of the class, which is growing from three days up to four; the additional day will ensure that we have time for all the existing materials, and add a new section on testing using TestNG, Selenium and Groovy. It will also give us more time to explore student directed ideas, such as security and meta-programming.

The class will be taught at SkillsMatter's offices in London, from October 5th through the 8th.

Wednesday, July 14, 2010

Tapestry applications are inherently stateful: during and between requests, information in Tapestry components, value stored in fields, stick around. This is a great thing: it lets you program a web application in a sensible way, using stateful objects full of mutable properties and methods to operate on those properties.

It also has its downside: Tapestry has to maintain a pool of page instances. And in Tapestry, page instances are big: a tree of hundreds or perhaps thousands of interrelated objects: the tree of Tapestry structural objects that forms the basic page structure, the component and mixin objects hanging off that tree, the binding objects that connect parameters of components to properties of their containing component, the template objects that represents elements and content from component templates, and many, many more that most Tapestry developers are kept unawares of.

This has proven to be a problem with biggest and busiest sites constructed using Tapestry. Keeping a pool of those objects, checking them in and out, and discarded them when no longer needed is draining needed resources, especially heap space.

So that seems like an irreconcilable problem eh? Removing mutable state from pages and components would turn Tapestry into something else entirely. On the other hand, allowing mutable state means that applications, especially big complex applications with many pages, become memory hogs.

I suppose one approach would be to simply create a page instance for the duration of a request, and discard it at the end. However, page construction in Tapestry is very complicated and although some effort was expended in Tapestry 5.1 to reduce the cost of page construction, it is still present. Additionally, Tapestry is full of small optimizations that improve performance ... assuming a page is reused over time. Throwing away pages is a non-starter.

So we're back to square one ... we can't eliminate mutable state, but (for large applications) we can't live with it either.

The best solution would be to require that all those mutable fields be, instead, ThreadLocal objects, and to change all the logic that accesses them to instead read and write values to the ThreadLocal. Oh, and clean up each and every one at the end of the request, so that information doesn't bleed through to the next request. That would be an incredible imposition on Tapestry developers.

Tapestry has already been down this route: the way persistent fields are handled gives the illusion that the page is kept around between requests. You might think that Tapestry serializes the page and stores the whole thing in the HttpSession. In reality, Tapestry is shuffling just the individual persistent field values in to and out of the session. To both the end user and the Tapestry developer, it feels like the entire page is live between requests, but it's really a bit of a shell game, providing an equivalent page instance that has the same values in its fields.

What's going on in trunk (Tapestry 5.2 alpha) right now is extrapolating that concept from just persistent fields to all mutable fields. Every access to every mutable field in a Tapestry page is converted, as part of the class transformation process, into an access against a per-thread Map of keys and values. Each field gets a unique identifying key. The Map is discarded at the end of the request.

The end result is that a single page instance can be used across multiple threads without any synchronization issues and without any field value conflicts.

This idea was suggested in years past, but the APIs to accomplish it (as well as the necessary meta-programming savvy) just wasn't available. However, as a side effect of rewriting and simplifying the class transformation APIs in 5.2, it became very reasonable to do this.

Let's take an important example: the handling of typical, mutable fields. This is the responsibility of the UnclaimedFieldWorker class, part of Tapestry component class transformation pipeline. UnclaimedFieldWorker finds fields that have not be "claimed" by some other part of the pipeline and converts them to read and write their values to the per-thread Map. A claimed field may store an injected service, asset or component, or be a component parameter.

The transform() method is the lone method for this class, as defined
by ComponentClassTransformWorker. It uses a method on the ClassTransformation to locate all the unclaimed fields. TransformField is the representation of a field of a component class during the transformation process. As we'll see it is very easy to intercept access to the field.

Some of those fields are final or static and are just ignored.
A ComponentValueProvider is a callback object: when the component (whatever it is) is first instantiated, the provider will be invoked and the return value stored into a new field. A FieldValueConduit is an object that takes over responsibility for access to a TransformField: internally, all read and write access to the field is passed through the conduit object.

So, what we're saying is: when the component is first created, use the callback to create a conduit, and change any read or write access to the field to pass through the created conduit. If a component is instantiated multiple times (either in different pages, or within the same page) each instance of the component will end up with a specific FieldValueConduit.

Fine so far; it comes down to what's inside the createFieldValueConduitProvider() method:

Here we capture the name of the field and its type (expressed as String). Inside the get() method we determine the initial default value for the field: typically just null, but may be 0 (for a primitive numeric field) or false (for a primitive boolean field).

Next we build a unique key used to store and retrieve the field's value inside the per-thread Map. The key includes the complete id of the component and the name of the field: thus two different component instances, in the same page or across different pages, will have their own unique key.

We use the PerthreadManager service to create a PerThreadValue for the field. You can think of a PerThreadValue as a specialized kind of ThreadLocal that automatically cleans itself up at the end of the request.

Lastly, we create the conduit object. Let's look at the conduit in more detail:

We use the special InternalComponentResources interface because we'll need to know if the page is loading, or in normal operation (that's coming up). We capture our initial guess at a default value for the field (remember: null, false or 0) but that may change.

Whenever code inside the component reads the field, this method will be invoked.
It checks to see if a value has been stored into the PerThreadValue object this request; if so the stored value is returned, otherwise the field default value is returned.

Notice the distinction here between null and no value at all. Just because the field is set to null doesn't mean we should switch over the the default value (assuming the default is not null).

The last hurdle is updates to the field:

public void set(Object newValue)
{
fieldValue.set(newValue);
// This catches the case where the instance initializer method sets a value for the field.
// That value is captured and used when no specific value has been stored.
if (!resources.isLoaded())
fieldDefaultValue = newValue;
}

The basic logic is just to stuff the value assigned to the component field into the PerThreadValue object. However, there's one special case: a field initialization (whether it's in the component's constructor, or at the point in the code where the field is first defined) turns into a call to set(). We can differentiate the two cases because that update occurs before the page is marked as fully loaded, rather than in normal use of the page.

And that's it! Now, to be honest, this is much more detail than a typical Tapestry developer ever needs to know. However, it's a good demonstration of how Tapestry's class transformation APIs make Java code fluid; capable of being changed dynamically (under carefully controlled circumstances).

Back to pooling: how is this going to affect performance? That's an open question, and putting together a performance testing environment is another task at the top of my list. My suspicion is that the new overhead will not make a visible difference for small applications (dozens of pages, reasonable number of concurrent users) ... but for high end sites (hundreds of pages, large numbers of concurrent users) the avoidance of pooling and page construction will make a big difference!