High Performance Computing Center

HPCC Facilities and Equipment

Primary Equipment

The High Performance Computing Center's (HPCC) hardware is located in three data center
locations. The main production clusters, Quanah and Hrothgar, along with the central file system servers and a number of smaller systems are in
the Experimental Sciences Building on the main TTU campus. By far the largest of these
resources is the Quanah cluster.

Certain specialty clusters including Weland, Realtime2, and Janus are located at Reese Center, several miles from then main campus. A final set of
resources consists of TTU's portion of the Lonestar 5 cluster, comprising approximately 1600 cores, operated at the Texas Advanced Computing
Center in Austin.

These resources include both generally-available public and researcher-owned private
systems. Public nodes are available to any TTU researcher. Private nodes are owned
by individual researchers and administered by HPCC. All of the generally-available
cluster resources operate using a weighted fair-share queueing system to provide a flexible balance to ensure that newly submitted jobs compete favorably
for upcoming batch queue slots compared to long-running job sequences.

These generally-available resources, including storage of data for use on the clusters, are provided at no cost to TTU researchers. Additionally, the TTU resources on Lonestar 5 are available for competitive allocations for specific projects on special request
and serve as a "safety valve" for certain large-scale projects.

Community Clusters

Researchers who need need additional computing capacity beyond the generally-available
resources and are considering buying dedicated hardware or storage may wish to talk
with us about the community cluster option. This capability allows addition of additional
equipment that can be operated on a "guaranteed class of service" level for cpu or
storage. Additions of this nature are subject to space or infrastructure limitations.
Please check with the HPCC staff and management for the current options.

In this option, researchers may choose to purchase physical compute nodes and/or storage
that are operated as part of the HPCC equipment, and will receive priority access
equal to the purchased resource capacity. The HPCC will house, operate and maintain
the resources according to a memorandum of understanding, which typically lasts as
long as they are covered by the researcher by a service contract or remain in warranty.
The new warranty period is usually determined at the time of purchase of the equipment
and is typically three to five years, although extensions are possible. Contact us for more details.

Dedicated Clusters/Servers

A dedicated cluster is a standalone cluster that is paid for by a specific TTU faculty
member or research group. HPCC is able to, subject to space and infrastructure availability,
house these clusters in its machine rooms providing system administration support,
UPS power and cooling. Typically, for these clusters HPCC system administration support
is by request with day to day cluster administration provided by the owner of the
cluster. Other clusters and dedicated equipment exist on campus for which the HPCC
provides occasional assistance to the researchers by request on a consulting basis.

Campus

The newest cluster, Quanah, has 467 worker nodes with 36 cores each for a total of 16,812 cores, of which 16,092
are reserved for general use and 720 cores are owned by specific research groups.
To connect with West Texas history, the cluster is named for Quanah Parker, and its internal management node Charlie is named for Charles Goodnight. Commissioned in early 2017 and expanded to its current size later in that year,
it is based on Dell C6300 enclosures holding four C6320 nodes each. The worker nodes
consist of dual-18-core Broadwell Xeon processors (36 cores per node) with 192 GB
memory per node. The software environment is based CentOS 7 Linux, controlled by Intel
HPC Orchestrator, and has a fully non-blocking Intel Omni-Path 100 Gbps fabric for
MPI computing. The cluster is operated with a single queue, with jobs sorted according
to projects in order to satisfy the needs of the participating research groups. Benchmarks
show the performance of Quanah to be approximately 485 Teraflops as of late 2017.

Hrothgar is an older Dell Linux Cluster currently consisting of 630 total nodes and 8246 total
processing cores, out of which 7408 cores are made available for general use and the
rest are owned by specific research groups. The Hrothgar cluster was initially built
in 2011. Several updates have occurred since then, including the replacement of the
core and leaf switches with QDR Infiniband and updating of approximately 100 of the
nodes to newer Ivy Bridge Xeon systems. Most of the Hrothgar nodes are composed of
two 2.8 GHz 6-core processors with 24 GB memory, and the rest, approximately 20% of
the total, are in denser 20-core and 32-core nodes. Roughly 90% of these nodes are
connected to either 20 Gbps or 40 Gbps Infiniband fabric optimized for parallel computing,
and the remainder are dedicated to serial processing. The Hrothgar parallel nodes
in the current configuration have a total estimated peak rating of approximately 80
teraflops. To honor the first clusters of this type, the Hrothgar cluster and its internal management node Hygd are named for characters in the Beowulf mythic poem.

The HPCC has a DataDirect Network storage system capable of providing storage for up to 2.5 petabytes of data.This storage space is
configured using Luster to provide a set of common file systems to the Quanah and
Hrothgar clusters. The file system uses a combination of LNet routers to bridge the
Omni-Path traffic to Infiniband for the Quanah cluster, direct QDR Infiniband to connect
the high-speed compute fabric on Hrothgar, and Gigabit Ethernet to connect to Hrothgar
serial nodes. In addition to the DDN central file system, a number of researchers
and research groups have purchased dedicated servers for long-term data storage,
typically in increments of tens of terabytes, that are also reachable by the same
methods.

Reese

The Janus, Weland and Realtime clusters are located at the off-campus Reese data center,
which also houses some of the serial nodes for the Hrothgar cluster. Janus and Weland
are also named after characters from the Beowulf story.

Janus is a Microsoft Windows HPC cluster with twelve 20 core nodes. This system is used
for a small number of dedicated workloads that depend on specific licensed software
that is not available more seedily for the Linux clusters. This is not intended to
be a general Windows login system for the university, but instead to serve those specific
workloads that require Windowes HPC Server support.

Weland is a Linux cluster with sixteen 8 core nodes, and each node contains two Xeon E5540
processors for a total of 128 cores running at 2.53GHz with 16 GB main memory. It
is primarily operated as a TechGrid resource to augment the campus cycle-sharing grid.