~ Computational modeling & simulation doesn't have to be boring

It is High Time to Envision a Better HPC Future

Last week I attended a rather large scientific meeting in Knoxville Tennessee. It was the kickoff meeting for the Exascale Computing Project. This is a relatively huge program ($250 million/year) and the talent present at the meeting was truly astounding, a veritable who’s who in computational science in the United States. This project is the crown jewel of the national strategy to retain (or recapture) pre-eminence in high performance computing. Such a meeting has all the makings for banquet of inspiration, and intellectually thought-provoking discussions along with incredible energy. Simply meeting all of these great scientists, many of whom also happen to be wonderful friends only added to the potential. While friends abounded and acquaintances were made or rekindled, this was the high point of the week. The wealth of inspiration and intellectual discourse possible was quenched by bureaucratic imperatives leaving the meeting a barren and lifeless launch of a soulless project.

The telltale signs of worry were all present in the lead up to the meeting: management of work took priority over the work itself, many traditional areas of accomplishment are simply ignored, political concerns swamping technical ones, and most damningly no aspirational vision. The meeting did nothing to dampen or dispel these signs, and we see a program spiraling toward outright crisis. Among the issues hampering the project is the degree of project management formality being applied, which is appropriate for a benign construction projects and completely inappropriate for HPC success. The demands of the management formality was delivered to the audience much like the wasteful prep work for standardized testing in our public schools. It will almost certainly have the same mediocrity inducing impact as that same testing regime, the illusion of progress and success where none actually exists. The misapplication of this management formality is likely to provide a merciful deathblow to this wounded mutant of a program. Some point in the next couple of years we will see the euthanized project as being the subject of a mercy killing.

There can be no progress without head-on confrontation.

― Christopher Hitchens

The depth of the vision problem in high performance computing (HPC) is massive. For a quarter of a billion dollars a year, one might expect an expressive and expansive vision for a future to be at the helm of the project. Instead the vision is a stale and spent version of the same approach taken in HPC for the past quarter of a Century. ECP simply has nothing new to offer. The vision of computing for the future is the vision of the past. A quarter of a century ago the stockpile stewardship program came to being in the United States and the lynchpin of the program was HPC. New massively parallel computers would unleash their power and tame our understanding of reality. All that was needed then was some faster computers and reality would submit to the power of computation. Today’s vision is exactly the same except the power of the computers is 1000 times greater than the computers that would unlock the secrets of the universe a quarter of a century ago. Aside from the Exascale replacing Petascale in computing power, the vision of 25 years ago is identical to today’s vision. The problem then as now is the incompleteness of the vision and fatal flaws in how it is executed. If one adds a management approach that is seemingly devised by Chinese spies to undermine the program’s productivity and morale, the outcome of ECP seems assured, failure. This wouldn’t be the glorious failure of putting your best foot forward seeking great things, but failure born of incompetence and almost malicious disregard for the talent at their disposal.

The biggest issue with the entire approach to HPC is evident in the room of scientists I sat with last week, the minds and talents of these talented people are not being engaged. Let’s be completely clear, the room was full of immense talent with many members of the National Academies present, yet no intellectual engagement to speak of. How can we succeed at something so massive and difficult while the voices of those paid to work on the project are silenced? At the same time we are failing to develop an entire generation of scientists with the holistic set of activities needed for successful HPC. The balance of technical activities needed for healthy useful HPC capability is simply unsupported and almost actively discouraged. We are effectively hollowing out an entire generation of applied mathematicians, computational engineers and physicists pushing them to focus more on software engineering than their primary disiplines. Today someone working in applied mathematics is more likely to focus on object oriented constructs in C++ than functional analysis. Moreover the software is acting as a straightjacket for the mathematics slowly suffocating actual mathematical investigations. We see important applied mathematical work avoided because software interfaces and assumptions are incompatible. One of the key aspects of ECP is the drive for everything to be expressed in software as products and our raison d’être. We’ve lost the balance of software as a necessary element in checking the utility of mathematics. We now have software in ascendency, and mathematics as a mere afterthought. Seeing this unfold with the arrayed talents on display in Knoxville last week felt absolutely and utterly tragic. Key scientific questions that the vitality of scientific computing absolutely hinge upon are left hanging without attention and progress on them is almost actively discouraged.

When people don’t express themselves, they die one piece at a time.

— Laurie Halse Anderson

At the core of this tragedy is a fatally flawed vision of where we are going as a community. It was flawed 25 years ago, and we have failed to learn from the plainly obvious lessons. The original vision of computer power uber alles is technically and scientifically flawed, but financially viable. This is the core of the problem as dysfunction; we can get a flawed program funded and that is all we need to go forward. No leadership asserts itself to steer the program toward technical vitality. The flawed vision brings in money and money is all we need to do things. This gets to the core of so many problems as money becomes the sole source of legitimacy, correctness and value. We have lost the ability to lead by principle, and make hard choices. Instead the baser instincts hold sway looking only to provide the support for little empires that rule nothing.

First, we should outline the deep flaws in the current HPC push. The ECP program is about one thing, computer hardware. The issue a quarter of a century ago is the same as it is today; the hardware alone does not solve problems or endow us with capability. It is a single element in our overall ability to solve problems. I’ve argued many times that it is far from being the most important element, and may be one of the lesser capabilities to support. The item of greatest importance are the models of reality we solve, followed bythe methods used to solve these models. Much of the enabling efficiency of solution is found in innovative algorithms. The key to this discussion is the subtext that these three most important elements in the HPC ecosystem are unsupported and minimized in priority by ECP. The focal point on hardware arises from two elements, the easier path to funding, and the fandom of hardware among the HPC cognoscenti.

We would be so much better off if the current programs took a decisive break with the past, and looked to move HPC in a different direction. In a deep and abiding way the computer industry has transformed in the last decade by the power of mobile computing. We have seen cellphones become the dominant factor in the industry. Innovative applications and pervasive connectivity has become the source of value and power. A vision of HPC that resonates with the direction of the broader industry would benefit from the flywheel effect instead of running counter to direction. Instead of building on this base, the HPC world remains tethered to the mainframe era long gone everywhere else. Moreover HPC remains in this mode even as the laws of physics conspire against it, and efforts suffer from terrible side effects of the difficulty in making progress in the outdated approach being taken. The hardware is acting to further tax every effort in HPC making the already threadbare support untenably shallow.

Instead of focusing on producing another class of outdated lumbering dinosaur mainframes, the HPC effort could leap onto clear industry trends and seek a bold resonant path. A combination of cloud based resources, coupled with connectivity could unleash ubiquitous computing and seamless integration with mobile computing forces. The ability to communicate works wonders for combining ideas and pushing innovation ahead would do more to advance science than almost any amount of computing power conceivable. Mobile computing is focused on general-purpose use, but hardly optimized for scientific use, which brings different dynamics. Specific effort to energize science through different computing dynamics could provide boundless progress. Instead of trying something distinct and new, we head back to a mine that has long since born its greatest ore.

Progress in science is one of the most fertile engines for advancing the state of humanity. The United States with its wealth and diversity has been a leading light in progress globally. A combination of our political climate and innate limits in the American mindset seem to be conspiring to undo this engine of progress. Looking at the ECP program as a microcosm of the American experience is instructive. The overt control of all activities is suggestive of the pervasive lack of trust in our society. This lack of trust is paired with deep fear of scandal and more demands for control. Working in almost unison with these twin engines of destruction is the lack of respect for human capital in general, which is only made more tragic when one realizes the magnitude of the talent being wasted. Instead of trust and faith in the arrayed talent of the individuals being funded by the program, we are going to undermine all their efforts with doubt, fear and marginalization. The active role of bullshit in defining success allows the disregard for talent to go unnoticed (think bullshit and alternative facts as brothers).

Progress in science should always be an imperative of the highest order for our research. When progress is obviously constrained and defined with strict boundaries as we are seeing with HPC, the term malpractice should come to mind. One of the clearest elements of HPC is a focus upon management and strict project controls. Instead I see the hallmarks of mismanagement in the failure to engage and harness the talents, capabilities and potential of the human resource available to them. Proper and able management of the people working on the project would harness and channel their efforts productively. Better yet, it would inspire and enable these talented individuals to innovate and discover new things that might power a brighter future for all of us. Instead we see the rule of fear, and limitations governing people’s actions. Instead we see an ever-tightening leash placed around people’s neck suffocating their ability to perform at their best. This is the core of the unfolding research tragedy that is doubtlessly playing out across a myriad of programs far beyond the small-scale tragedy unfolding with HPC.

We can only see a short distance ahead, but we can see plenty there that needs to be done.