Grant Preparation

Here is a boilerplate of equipment, facilities, and other resources existing here at CCS that may be used for grant preparation:

Facilities, Equipment, and Other Resources

Facilities

CCS systems are colocated at the Century Link Data Center hosted at the NAP of the Americas (NOTA or NAP). The NAP Datacenter in Miami currently features a 750,000 square foot, purpose-built datacenter, Tier IV facility with N+2 14 Megawatt power and cooling infrastructure. The equipment floors start at 32 feet above sea level, and the roof slope was designed to aid in the drainage of floodwater in excess of 100-year-storm intensity, assisted by: 18 rooftop drains, architecture designed to withstand a Category 5 hurricane with approximately 19 million pounds of concrete roof ballast, and 7-inch-thick steel reinforced concrete exterior panels. Plus, the building is outside FEMA 500-year designated flood zone. The NAP uses a dry pipe fire-suppression system to minimize the risk of damage from leaks.

The NAP has a centrally located Command Center manned by 7×24 security and security sensors. In order to connect the University of Miami with the NOTA Datacenter, UM has invested in a Dense Wavelength Division Multiplexing (DWDM) optical ring for all of its campuses. UM CCS Advanced Computing resources occupy a discrete, secure wavelength on the ring, which provides a distinct 10 Gigabit HPC network to all UM campuses and facilities.

Given University of Miami’s past experience including several hurricanes and other natural disasters, we anticipate no service interruptions due to facilities issues. The NAP was designed and constructed for resilient operations. UM has gone through several hurricanes, power outages, and other severe weather crises without any loss of power or connectivity to the NAP. The NAP maintains its own generators with a flywheel power crossover system. This insures that power is not interrupted when the switch is made to auxiliary power. The NAP maintains a two-week fuel supply (at 100% utilization), and is on the primary list for fuel replacement due to its importance as a data-serving facility.

In addition to hosting the University of Miami’s computing infrastructure, the NAP of the Americas is home to the US SouthCOM, Amazon, EBay, and several telecommunications companies’ assets. The NAP at Miami hosts 97% of the network traffic between the US and Central/South America. The NAP is also the local access point for Florida LambdaRail (FLR), which is gated to Internet 2 (I2) to provide full support to the I2 Innovation Platform. The NAP also provides TLD information to the DNS infrastructure and is the local peering point for all networks in the area.

The University of Miami has made the NAP its primary Data Center occupying a very significant footprint on the third floor. Currently all UM CCS resources, clusters, storage and back-up system run from this facility, and serves all major campuses of UM.

Each location is equipped with a dual processing workstations and essential software applications. CCS has three dedicated conference rooms and communication technology to interact with advisors (phone, web-, and video-conferencing), plus a Visualization Lab with 2D and 3D display walls (located at the Ungar Building).

Advanced Computing

The University of Miami’s Supercomputer, Pegasus, is a 350-node Lenovo cluster with each node having 2 Intel Sandy Bridge E5-2670 (2.6 GHz) 8C – with 32 GB 1600 MHz RAM (2GB/core) for a total of over 160 TFlops. Connected with an FDR Infiniband fabric, Pegasus was purpose-built for the style of data processing performed by biomedical research and analytics. In contrast with traditional Supercomputers where data flows along the slowest communication network possible (Ethernet), Pegasus was built on the principle that data needs to be on the fastest fabric possible. By utilizing the low latency high bandwidth IB fabric for data, Pegasus is able to access all three tiers (SSD, 15K RPM SAS, 7.2K NL-SAS) at unprecedented speeds.

Unlike traditional HPC storage, the 150 TB /scratch filesystem is optimized for small random reads and writes; and can support over 125,000 sustained IOPs/second and 20 Gb/sec throughput at 4Kb file size. Composed of over 500 15K RPM SAS disks, /scratch is ideal for the extremely demanding IO requirements of biomedical workloads.

For instances where even /scratch is not fast enough, Pegasus now has access to over 8TB of burst buffer space clocked at over 1,000,000 IOPs. This buffer space provides biomedical researchers a good place for large file manipulation and transformation.

Along with the 350 nodes in the general processing queue, all researchers also have access to the 20 large memory nodes in the bigmem queue. With access to the entire suite of software available on Pegasus, the bigmem queue provides large memory access (256 GB) to researchers where parallelization is not an option. With 20 cores each, the bigmem servers provide an SMP-like environment well suited to biomedical research.

As many modern analysis tools require interaction, Pegasus has a unique feature of allowing ssh and graphical (GUI) access to programs using LSF. Tools ranging from Matlab to Knime and SAS to R are available to researchers in the interactive queue with full speed access to /scratch and the W.A.D.E. storage cloud.

PEGASUS SUPERCOMPUTER

The University of Miami’s Supercomputer, Pegasus, is a 350 node Lenovo cluster with each node having 2 Intel Sandy Bridge E5-2670 (2.6 GHz) 8C – with 32 GB 1600 MHz RAM (2GB/core) for a total of over 160 TFlops. Connected with an FDR Infiniband fabric, Pegasus was purpose built for the style of data processing performed by biomedical research and analytics.

W.A.D.E. STORAGE CLOUD (Worldwide Advanced Data Environment)

At the heart of the Advanced Computing data services is the W.A.D.E. storage cloud, which currently provides over 6 PB of active data to the University of Miami research community ranging from small spreadsheets in sports medical research to multi terabyte high resolution image files and NGS datasets.
An upgrade to the W.A.D.E. storage cloud is coming soon.

The W.A.D.E. storage cloud is composed of four DDN storage clusters running the GPFS filesystem. The combination of IBM’s industrial strength filesystem and DDN’s high performance hardware gives researchers at UM the flexibility to process data on Pegasus and share that data with anyone, anywhere.

By utilizing several file service gateways, researchers can share large data sets securely on campus between Mac, Windows, and Linux operating systems. Data can also be presented outside of the University of Miami in several high-performance fashions. In addition to the common protocols of SCP and SFTP, we also provide high-speed parallel access through bbcp and Aspera. You can even share your data using standard web access (httpd) through our integrated web and cloud client service.

All access to W.A.D.E. is provided through UM’s 10 GB/sec Research Network internally and the UM Science DMZ externally. All Internet traffic flows through either the Science DMZs 10 Gb/Sec I2 link through Florida Lambda Rail or through the Research Network’s 1 Gb/sec commercial internet connect.

VAULT SECURE STORAGE

The Vault secure storage service is designed to address the ongoing challenge of storing Limited Research Datasets. Built on enterprise quality hardware with 24×7 support, Vault provides CTSI-approved researchers access to over 150TB of usable redundant (300 TB raw) storage. All data is encrypted according to U.S. Federal Information Processing Standards (FIPS). At rest, data is encrypted using AES encryption with 128 bit keys. In motion, all transfers are encrypted using FIPS 140-2 compliant AES with 256 bit keys. All data is encrypted and decrypted on access automatically.

Access to the Vault storage service is controlled through several methods including the latest in multifactor authentication. All users are required to use Yubikey ™ 4 hardware USB keys in order to log on to the vault secure storage service. Vault also requires IP whitelisting for access through either the on-campus research network or campus based VPN services.

VISUALIZATION LAB

The CCS Visualization Lab is a tool for all University of Miami students and faculty to present graphical and performance intensive 2D and 3D simulations. With a direct connection to all CCS resources, the Viz Lab is the perfect tool for high performance parallel visualization, data exploration, and other advanced 2D and 3D simulations. First time use of this space requires Orientation with the CCS support team. For more information about Orientation and reservations, visit Visualization Lab.

The Visualization Lab is built around a Cyviz 5×2 20 Mega Pixel Native display wall and Mechdyne 2×2 passive 3D display wall. On these impressive high resolution displays, users are able to present their work at a paramount level while analyzing details at a granular level.

The Visualization Lab sits directly on the Research Network, providing it with 10Gb/sec network access to the W.A.D.E. Storage Cloud, Pegasus Supercomputer, and all other CCS resources. It was built with the focus to interpret real-world scenarios such as computational modelling, simulation, analysis, visualization of natural and synthetic phenomena for dynamic engineering, biomedical, epidemiological, and geophysical applications.

With the 2D display wall, users are able to present their work at a paramount level while analyzing details at a granular level. The 2D display is composed of ten 55-inch thin bezel LCD Planar panels and spanning 22ft wide for an ultra-wide angle 21-megapixel display that supports a resolution of 9600×2160.

The 3D display wall supports stereoscopic 3D, for users looking to captivate audiences with something a little more eye-popping or simply looking to add depth to their work. It is composed of four 46-inch ultra-thin LCD Planar panels and supports resolutions up to 5120×2880.

SPS SECURE PROCESSING SERVICE – beta

Our most secure data processing offering is SPS, currently in beta. SPS is designed for secure access to extremely sensitive data sets including PHI. In addition to the security protocols used in the Vault data services, SPS requires additional administrative action for the certified placement and/or destruction of data. Advanced Computing staff (all CITI trained and IRB approved) act as data managers for several federal agencies including NSF, NIH, DoL, DoD, and VA projects.

Once our staff has loaded and secured your data, you can remotely access one of the SPS servers (either Windows or Linux) which has access to the most common data analytic tools including R, SAS, Matlab, and Python. Additional tools are available on request. 50 TB of highly secure redundant storage (100TB raw).

NYX CLOUD – beta

The Nyx Cloud hosting system, currently in beta, allows launch and configuration of your own Virtual Machine servers. It is a private UM/CCS Cloud system powered by the OpenStack cloud software, which offers the IaaS (Infrastructure-as-a-Service) resource management. It is available to registered CCS Users and resource allocations are Project-based.

Nyx Cloud Virtual machine (VM) instances are grouped into Projects, which reside on dedicated private virtual networks (subnets). Through the dashboard, users can start and customize their own VMs. Nyx VMs can be single- or multiple-CPU servers and can be shut down or restarted as needed.

Several bootable images are available, including configurations such as LAMP and MEAN in CentOS 6 and CentOS 7. Snapshots of Projects can be taken for replication and backup. Floating IP addresses are available for SSH connections to Nyx instances from outside the Project network.

Advanced storage features currently undergoing testing include block and object storage. VM instances can be started with dedicated block storage or attached to existing block storage. Object (distributed) storage, which allows for data access via HTTP, is also available.

HPC Core Expertise

The HPC team has in-depth experience in various scientific research areas with extensive experience in parallelizing or distributing codes written in Fortran, C, Java, Perl, Python and R. The team is active in contributing to Open Source software efforts including: R, Python, the Linux Kernel, Torque, Maui, XFS and GFS. The team also specializes in scheduling software (LSF) to optimize the efficiency of the HPC systems and adapt codes to the CCS environment. The HPC core has expertise in parallelizing code using both MPI and OpenMP depending on the programming paradigm. CCS has contributed several parallelization efforts back to the community in projects such as R, WRF, and HYCOM.

The core specializes in implementing and porting open source codes to CCS’ environment and often contributes changes back to the community. CCS currently supports more than 300 applications and optimized libraries on its computing environment. The core personnel are experts in implementing and designing solutions in the three different variants of Unix. CCS also maintains industry research partnerships with IBM, Schrodinger, Open Eye, and DDN.

Software on the Pegasus Cluster

CCS continually updates applications, compilers, system libraries, etc. To facilitate this task and to provide a uniform mechanism for accessing different revisions of software, CCS uses the modules utility. At login, modules commands set up a basic environment for the default compilers, tools, and libraries such as: the $PATH, $MANPATH, and $LD_LIBRARY_PATH environment variables. Available software modules can be viewed on the CCS Portal Software Modules page, including description, version, and update date.

Other Resources

Bioinformatics

CCS’ Computational Biology and Bioinformatics Program (CBBP) was established to conduct research and offer services and training in the management and analysis of biological and medical/health record data. The program’s mission is to spearhead bioinformatics capacity at the University of Miami for all biological and medical applications. This includes data management, data mining, and data analysis capacities. The CBBP aims to achieve this mission through infrastructure, education, and expertise. In particular, CBBP provides an online portal for bioinformatics databases and web tools, and offers a number of data analysis services. CBBP are concomitantly leading educational and training initiatives in bioinformatics analysis, and nourishing these activities with high impact bioinformatics research.

Computational Biology & Bioinformatics

The team provides data analysis training and expertise at a three levels, consulting, preliminary data generation, and fully collaborative, based on the time and complexity of the service requested. The analyses are undertaken by skilled analysts, and overseen by experienced faculty. The group has been working mostly with microarray data and next generation sequencing data, and the analytical services include, but are not limited to, the following:

copy number variant analysis, in this context we are testing the few existing algorithms and developing new ones for accurate and unambiguous discovery of copy number variation in the human genome

genome or transcriptome assembly from next generation sequencing data, and its visualization

SNP functionality analysis

other projects include merging or correlating data from various data types for a holistic view of a particular pathway or disease process.

Big Data Analytics & Data Mining

CCS’ Big Data Analytics & Data Mining research group provide advanced data mining expertise and capabilities to further explore high-dimensional data. The following are examples of the expertise areas covered by our team:

Classification, which appears essentially in every subject area that involves collection of data of different types, such as disease diagnosis based on clinical and laboratory data. Methods include regression (linear and logistic), artificial neural nets (ANN), k-nearest neighborhood (KNN), support vector machines (SVM), Bayesian networks, decision trees and others.

Clustering, which is used to partition the input data points into mutually similar groupings, such that data points from different groups are not similar. Methods include KMeans, hierarchical clustering, and self-organizing map (SOM), and are often accompanied by space decomposition methods to offer low dimensional representations of high dimensional data space. Methods of space decomposition include principal component analysis (PCA), independent component analysis (IDA), multidimensional scaling (MDA), Isomap, and manifold learning. Advanced topics in clustering include multifold clustering, graphical models, and semi-supervised clustering.

Association data mining, which finds frequent combinations of attributes in databases of categorical attributes. The frequent combinations can be then used to develop prediction of categorical values.

Analysis of sequential data involves mostly biological sequence and includes such diverse topics as extraction of common patterns in genomic sequences for motif discovery, sequence comparison for haplotype analysis, alignment of sequences, and phylogeny reconstruction.

Text mining, particularly in terms of extracting information from published papers, thus transforming documents to vectors of relatively low dimension to enable the use of data mining methods mentioned above.

Visualization

The Visualization program conducts both theoretical and applied research in the general areas of Machine Vision and Learning, and specifically in:

Computer Vision and Image Processing

Machine Learning

Biomedical Image Analysis

Computational Biology and Neuroscience

The goal is to provide expertise in this area to develop novel fully automated methods that can provide robustness, accuracy and computational efficiency. The program works towards finding better solutions to existing open problems in the above areas, as well as exploring different scientific fields where our research can provide useful interpretation, quantification and modeling.