It actually works in Azure because each instance has it's own indexing background thread, and has its own index. Though, this is extensible and one implementation could be to provide a cached Blob based lucene Folder implementation. Microsoft Research did
it a few years ago, and you can still find some articles about that on the Net.

So it sounds like Orchard is using Lucene to create a local index per each Web Instance? Just wanted to clarify that. So, if I spin up 2 instances (which I am about to launch), Orchard attaches a writable Azure drive (from the local web instance
file system) and creates the full index - per Instance?

Yeah, some form of single implementation needs to happen.

Lucene has the limitation that only 1 Indexer Engine can have write-access to an index at a time. Meaning, a pool of Lucene Indexers will not scale in Azure.

Azure has the limitation that the CloudDrive can only be mounted as Writable by 1 Web/Worker Role instance at a time, and all others can only have Read access.

This is how I have set Lucene up in Azure in the past, but as you can see it doesn't scale well. I ended up having to split the indexes up into multiple smaller indexes to keep the performance up.

I also setup Solr in Azure in a test bed. Though, Solr is only Tomcat/Apache, and Solr still has the same limitation of Lucene that only 1 Web/Worker role instance can have open write access to the underlying Lucene index.

I may take a stab at this, to see if there has been any updates on the Solr/Lucene front in the past 2 years (since I last used them).

They are out of sync, and not. When you create some content, the index is created in a background task, so technically even on the instance you created it it's out of sync. And this time is as long as it is for other nodes. In average 30 sec.

They are out of sync, and not. When you create some content, the index is created in a background task, so technically even on the instance you created it it's out of sync. And this time is as long as it is for other nodes. In average 30 sec.

Perhaps you can point me to the implementation for Azure for this for what class is handling this.

I understand that a "background task" (IScheduleTask I'm assuming) updates the local-to-that-instance index of Lucene.

But, how are the other webrole instances notified that a piece of data has been updated, and that they're "background task" local to that webrole isntance needs to reindex that one item.

That's what I see missing here. I don't see how an item can be marked dirty in the DB for all webrole isntances to pickup and reindex. Maybe if there was some timestamp on a single entity that indicates it has been modified. But, to query
for all pieces of content, and compare each of the timestamps to what Lucene has seems quite excessive. Though, I don't see anything like that going on.

So, how are the other nodes / webrole instances notified a piece of content has been updated.

There is a table to keep track of any content which needs to be indexed. And each instance keeps tracks of the latest it indexed. Look at the Orchard.Indexing module, it's the same on premise and for azure.