I think Vincent answered your question thoroughly I would also like to know if Intel will push Cilk to be the language of choice for those hardwares ? How is the performance of the scheduler for chess? It has been used before by CilkChess so it should not be bad. But the tree may not be very selective compared to current heavily pruned trees.

I have implemented a cluster YBW which I think is on par with cluster-Toga performance wise. But I didn't test it well because the clusterI had access to used fast ethernet connection (which means slow) and it really did not scale well even on 32 processors. But it works as long as all processors are active.

One thing I did uniquely is to use a combined SMP-cluster search, where the search takes advantage of "fat nodes" (nodes with 8-core SMP machines) by starting an SMP search. It helps with the speed up but could introduce inefficiencies with load balancing as some workers become extra powerfull. Also it makes implementation becomes complicated since you can just use MPI to start a process for every core. That is do message passing even when you know it is an SMP machine. MPI actually optimizes stuff in that case but it will not be as good as the SMP algorithm._________________https://sites.google.com/site/dshawul/https://github.com/dshawul