HPC CFD Post-processing with SOS and RLSOS

Doing High Performance Computing (HPC) for CFD is a very common technique in industries such as aerospace, nuclear, energy production, racing, automotive aerodynamics, and engine design. HPC CFD creates huge datasets. And HPC CFD Post-processing can now keep up with that data thanks to EnSight’s RLSOS feature in version 10.1. RLSOS stands for Root Level Server of Servers and its EnSight’s way of handling HPC-sized CFD datasets.

RLSOS works just like SOS which has been a feature of EnSight for over 10 years. CEIShell and CEIStart included with EnSight are required to use RLSOS, for practical reasons. When you are working with 1000s of servers you can’t work with them one by one. CEIShell is EnSight middleware that is included with EnSight that helps you set up the cluster for HPC-sized problems. CEIStart gives EnSight its start up configuration and includes a GUI to make it easier to launch EnSight in the configuration you want.

*CEIShell Job Launching.
We have developed over the past several months a job launching mechanism we call ceishell. This will continue to expand with the purpose of making EnSight executables easily launch and monitor in large complex environments. CEIShell helps increase EnSight robustness.

* it makes launching SOS with an arbitrary number of servers easy. This can be either on a desktop or a remote machine;
* it makes using all the cluster cores, or a specific subset easy;
* it makes using EnSight VR easy

It makes starting up EnSight safer because you don’t have to remember all the right command script options and avoid a costly mistake by a “typo”.

In the image above you can see two default options. But you can also add site specific options for easily launching a configuration you’ve set up before.

In the image above you can see that after selecting a basic configuration you are given a detailed GUI where you can make modifications before proceeding to launch EnSight with the “Launch” button at the bottom. So you have a chance to review the launch options and maybe make tweaks.

We don’t have any sort of scaling tests or anything like that. Yet. But we are willing to work with customers to benchmark the performance or capacity. SOS and RLSOS are really about capacity, giving you the ability to post-process data that would just be too big any other way. Its not about making data that would work on a workstation faster to post-process. That’s like thinking that trains can be replaced efficiently by 1000 Mazda Miatas. Sometimes what you need is a train, not a bunch of cars.

Given we know that the SOS can be used, to some extent, with about 700 EnSight Servers, and the RLSOS is exactly the same code as the SOS with just a few tweaks, one could anticipate:

* the RLSOS should support about 700 connected SOSes. Thus, you could *imagine* an aggregate of 49,000 Servers. No one is going to run with that many Servers. But, 10,000 Servers might be not out of the question.

* The SOS code tends to sequentially iterate over Servers for each EnSight operation. By using RLSOS, to “fatten” the tree, one gets more concurrent parallelism over the aggregate of Servers. Thus, using RLSOS with 8 SOS each of which has 8 Servers should perform better than using a single SOS with 64 Servers.

With EnSight 10.0 and SOS models of a size of 200M cells are routine and models of 2B cells are possible. With EnSight 10.1 and RLSOS models with 2B cells are routine and 20B cells should be possible.