Our initial assumption was that the NamespaceManager would continue to exist for the lifetime of the session since we would have one tenant logging in and would remain the same till he logs out. Supposedly, this is not the case. NamespaceManager is scoped to a single request and is stored in a thread-local.

The advantage of thread-local being that a variable being stored as thread-local is associated with that particular thread and is not shared. All the threads in the same process might share the static variables as they reside in the same memory space but the thread-local variables reside on the stack and every thread has its own stack.

What is the advantage of doing it this way?

Well the first advantage is that since it is coming with each request, I could potentially change the Namespace temporarily for an operation and set it back to the original namespace. I am not bound by the fact that it would have to remain the same for the session/ application scope. So I could do something like this,

Also, since it is not stored in any static variable, I am assured that there would be no data leakage with one tenant accessing the namespace of another.

Now, what is the pitfall of doing it this way?
For an application like ours where tenants are identified by subdomains, for example abc.bookmyhours.com and xyz.bookmyhours.com, each tenant enters the application with their own subdomain. Now, for the remainder of the session, they continue to be in the same namespace.

Irrespective of this, since we use a filter for namespace filtering, we would parse each request to get the servername and set the NamespaceManager. This might not sound very heavy and still might be acceptable to do with each request. What made it worse in our case was that whenever a new namespace tenant comes for the first time, we set up seed data for that namespace. Then for subsequent calls we see that the data exists and hence we should not set up the seed data. Nevertheless, there is always a call made to the database to check if the data exists with each request.

Not a good design at all and may be we would have to explore that we need to set up seed data somewhere else, but for some applications this may be a requirement and that is where the proposed solution would come in handy.

For now, we set the namespace variable in the user session. So the code looks like

The NamespaceManager.set() has to be done with each request since it is thread-local, however you can do away with parsing the request and/or any other complex logic that you have in the filter by using the session.

Like this:

LikeLoading...

About Vikas Hazrati

Vikas is the Founding Partner @ Knoldus which is a group of software industry veterans who have joined hands to add value to the art of software development.
Knoldus does niche Reactive and Big Data product development on Scala, Spark and Functional Java. Knoldus has a strong focus on software craftsmanship which ensures high-quality software development. It partners with the best in the industry like Lightbend (Scala Ecosystem), Databricks (Spark Ecosystem), Confluent (Kafka) and Datastax (Cassandra).
To know more, send a mail to hello@knoldus.com or visit www.knoldus.com

In respect to setting up seed data, the other alternative is to use memcache for the namespace to determine if the namespace is already seeded. The solution above (using the http session) will work of course but it will require a “is data seeded” check to datastore for every session while accessing the memcache after the namespace is set will require only one once per memcache expiry.

may be i misunderstood but when would the data be set in memcache? i am assuming that when the data is not present (we confirm this by going to memcache) i.e. when a new tenant logs in then for the first time data is set and an entry is made into memcache. Now when the tenant logs in for the second time, we would have to check memcache to see whether data for this namespace is set or not. If it is set then good else set the seed data and also make an entry in memcache. So instead of going to the datastore with the “is data seeded” check, we are going to the memcache. so ideally, a key query on the datastore would be slower than the memcache check. is that what you imply or did i miss something?

I’m only raising a question of optimization. The specifics of whether an optimization is appropriate is really application specific but often, using memcache directly will often result in a more optimal solution. In this case (not considering wether your app is already paying the cost of using Java sessions) I’ll outline below the basic algorithms:

Proposal 1: Memcache option for dealing with seed data:

if namespace is unset
set namespace
if ! memcache get seed data
// only happens once per namespace or memcache expiry
check or create seed data from datastore (Datastore TXN)
set seed data in memcache

Proposal 2: Java session option (like I think the article describes)
get session (gets from memcache failing over to datastore)
if namespace is not set
set namespace
check or create seed data from datastore (Datastore TXN)
set session namespace

It’s important to note that Java sessions in App engine uses memcache and datastore in the background. Datastore transactions are generally much more expensive than memcache transactions and is a significant source of application contention in heavily used apps.

Proposal 2 results in one datastore action per HTTP session creation and every time the memcache entry expires. If you have a large number of short lived client accesses, this can result in a datastore request for nearly every client access.

Proposal 1 would only result in a single datastore access once the first client accessed the namespace and once every time the memcache expires.

The only point I tried to make is that proposal 1 might be a better option than proposal 2. Again, it’s not a truism, you really need to understand the application to make sure this will work out the way I described.

thanks for the detailed response, Gianni. good to understand that Java sessions in App engine uses memcache and datastore in the background. we were aware that memcache is going to be faster than the data store but your comment on they being much more expensive is a sure reason for us to look at memcache for this logic. btw, we already have memcache on the radar for improving the performance of our migrated app. more on that as we work on that. best.