On 12/08/2010 08:14 AM, Lloyd Brown wrote:
> On 12/6/10 4:38 PM, Ken Nielson wrote:
>>> This version of TORQUE can be built to work the same as other versions
>> of TORQUE without the NUMA option. However, we would recommend the use
>> of TORQUE 2.5.3 if you do not require NUMA capability.
>>> Ken, et al.,
>> Can you be more specific on this recommendation? We're looking at
> upgrading from 2.4.x to 2.5.x during our downtime in early January, and
> due to the communication protocol change from 2.x to 3.x, we're
> wondering about upgrading all the way to 3.0.x, to make future rolling
> upgrades easier. We don't have a big Altix or anything, but just a few
> hundred x86_64 Linux servers. Is there a specific concern about 3.0.x
> on non-NUMA clusters? Are there outstanding, known issues?
>> Lloyd
>>I would recommend going to 2.5.x if you do not need the NUMA support.
Like any .0 release there are several code changes which have inherent
possibilities for problems. The 2.5.x code is more stable and tested
than the 3.0.x branch for non-NUMA functionality. While all of the
2.5.x capability is in 3.0.x you will probably find more stability with
2.5.x.
The best way to answer the rest of your questions is to address the
TORQUE road map.
The next release of 2.4-fixes will be 2.4.12. 2.4-fixes will continue to
be the stable branch for TORQUE. By stable we mean there will be no new
features added to 2.4-fixes. Only bug fixes.
2.5-fixes is becoming more stable and we would recommend moving to the
latest 2.5 release when you are ready. 2.5-fixes will continue to
receive new features along with bug fixes. We do not call this the
stable branch simply because we may add feature changes. However, the
code base itself has proven to be pretty reliable.
We will be adding GPU support starting with 2.5.4 which we hope to
release this month along with Moab 6. Moab 6 also has GPU support that
will work with TORQUE.
In March we hope to release TORQUE 3.1.
The major thrust of TORQUE 3.1 is scalability. Some of the things we
will be doing to improve scalability are as follows:
* Create a multi-threaded TORQUE
* Use a hierarchical job launch (job radix)
* Improve mom-to-mom and mom-to-server communications to reduce
traffic needed to keep the server and moms up to date on the state
of the cluster
Because of the chattiness of TORQUE in doing updates from the mom's to
the server we may need to change how this is done. This may make 3.1
INCOMPATIBLE with all previous versions of TORQUE. So moving to 3.0.0 to
make upgrading easier to 3.1 may not make a difference.
Trunk currently has changes for multi-threading if anyone wants to check
it out. Any recommendations for improvements or reports of problems are
welcomed and encouraged.
To summarize, we recommend that unless you need NUMA support that you
continue to use or upgrade to version 2.5.x. When version 3.1.0 is
released we will start encouraging all users to upgrade as it becomes
more stable.
Let me know if you have more questions.
Ken
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torquedev/attachments/20101208/758e8163/attachment.html