4 Why? Because we must: requirements of the Science Council (Wissenschaftsrat) to request funding for future computer systems:. a scientific process for the allocation of the expensive compute resources has to be established which will guarantee fair handling of all users.

5 Why? (ii) fair distribution of resources main goals: collation between used resources and scientific value defined (short...) job staring times defined, predictable throughput for researchers effective and resource saving usage pattern would you drive fuel-saving, if you do not have to pay for the fuel? if you indeed have no clue how many fuel did you burned? and, last but not least, to keep some buddies within bounds

6 The status Implemented with projects and queues in LSF batch system JARA-HPC partition (30% ): since 2012 general introduction: Q3/2014 up n runnin now Use a project: add a line to your batch file #BSUB -P abcd4321 Check your quota: $ r_batch_usage

8 How-to file an Application for computing time go to https://doc.itc.rwth-aachen.de/display/cc/projektbewirtschaftung decide for what type of project you should apply determine your needs; don t be shy! don t try to be too exact It s better to ask for 30% too much than 1% too less it s easier to ask for round sum (compute time, duration..) both for you and us think about special requirements: overlong compute time? (more that 120h not possible) disk storage? one huge project, or maybe multiple subprojects?.

9 How-to file an Application for computing time (II) go to https://doc.itc.rwth-aachen.de/display/cc/projektbewirtschaftung fill in the right form use Acrobat X to edit the PDF file we need the data to be extracted electronically do not use meaningless values like normal, much for e.g. memory consumption do not cut a corner we do not know who Mr See Above is! send the electronically-readable PDF file to do not send us screen shots, JPG, PNG, DOCX, TXT files do not send us signed+scanned PDFs via print the same file, sign it, and fax or mail it to us do not send send us signed+scanned PDFs via At the end we need the same document in two versions: signed+legal (thus fax or mail), and electronically readable.

10 How-to file an Application for computing time (III) go to https://doc.itc.rwth-aachen.de/display/cc/projektbewirtschaftung filing an application for RWTH Standard (M) project? A project description is required (for internal scientific reviews) Bring up if your project is a follow-up project, is funded by some organisation, filing an application for JARA-HPC/RWTH-Big (XL/L) project? Submission twice a year following the JARA-HPC procedures

11 How-to file an Application for computing time (IV) go to https://doc.itc.rwth-aachen.de/display/cc/projektbewirtschaftung Application form filled, ed, printed out, signed, faxed/mailed? then wait In typically a week: a message that both versions of application form has been arrived. Some days later: 1) a message that the project is ready-to-use, (for small projects), or 2) a message that the project has been introduced with a test quota of 0.01 Mio corehours per month, and the scientific review process started (for larger projects) For (2), some weeks (or either months ) later: a message that the project is approved and full remaining quota is granted (often the runtime of project is adjusted, too, according to the delay) Yes we know: this process is a really tedious and lengthy one Working on improving it. But at least the scientific review will stay delaying factor.

12 What happen if over quota? running jobs continue to the end (and still consume core-h!) new-submitted and pending jobs moved to low-priority queue they still can start! but if and only if there are free resources not used by normal-priority jobs if started from low queue, still consume core-h quota may go well in the red! today, no hard limit in low-priority queue this will be subject of change in future, very likely at 1 st of any month, you get next month s quota added. if you are in the black with your quota then, new jobs will be submitted and pending jobs will be moved to normal-priority queue technically, no difference from which queue job is started only start time differs!

13 How is the quota computed? main goal is to motivate the users for continuous using of resources but still allow some peaks Three-month sliding window up to 300% of month quota available in a month unused quota from previous month is transferred to the current month but not further The quota for the previous, the current, and the next month are added up The consumed core-h for the previous and for the current month are added up The difference between both values is the amount of core-h available in the current month Huh? https://doc.itc.rwth-aachen.de/display/cc/resource+contingents

19 Interactive usage Go to: https://doc.itc.rwth-aachen.de/display/cc/interactive+usage Interactive front ends are frequented by 100s of users! any issue directly interrupt work of these users! Purposes: data transfer, job submission, application porting, testing, tuning, debugging NOT FOR PRODUCTIVE RUNS USE BATCH SYSTEM Rule of thumb: not more than 20 minutes of CPU time that does not mean I can start 80x of 19.5-minute-runs one after another! Really need compute power and interactive session? Batch jobs with GUI: https://doc.itc.rwth-aachen.de/display/cc/submitting+a+job+with+gui In terms of advanced testing, we set flexible quotas using cgroup system CPU: processes of a user are configured to get the same amount of CPU cycles as all processes of other user Memory: real memory is limited to a part of available RAM - this prevent the situation when one user consume all RAM and crash the whole node - use memquota command to find out current situation 3 von 5

21 Interactive usage Go to: https://doc.itc.rwth-aachen.de/display/cc/interactive+usage Interactive back ends unprovided for log in hardware subject of change (currently: 8x 12-core Westmere with 96 GB RAM) used to off-load MPI processes started on front ends (reduce load!) off-loading managed by Interactive MPIEXEC wrapper example: $MPIEXEC np 2 hostname processes started on less-loaded nodes, but with massive overloading allowed further, you re not alone on these systems load of 100+ is not unusual any productive runs and time measurements absurd to the highest degree the only sense of (overloading) test runs: test of will my binary start with XYZ ranks? - if yes, Ctrl-C and proceed to the Batch System - if not, you ve got the reply immediately (instead of waiting a day for the batch job) NOT FOR PRODUCTIVE RUNS USE BATCH SYSTEM https://doc.itc.rwth-aachen.de/display/cc/testing+of+mpi+jobs 3 von 5

22 Interactive usage : Changed Terms of Use passing along your HPC account to third parties is explicitly forbidden secondary logins will be gradually deactivated during the next months secondary accounts (after 05/2014) already configured without login permissions the only use of secondary accounts now: data sharing Jobs and processes (in Batch, on interactive front ends and back ends) which disturb other jobs/processes, may be killed without further notice. If your job has been killed, you probably do some bad thing read the documentation! https://doc.itc.rwth-aachen.de/display/cc/2014/12/02/ %3A+Changed+Terms+of+Use

! Introduction to Running Computations on the High Performance Clusters at the Center for Computational Research! Cynthia Cornelius! Center for Computational Research University at Buffalo, SUNY! cdc at

Tim Cramer 17.03.2015 IT Center der RWTH Aachen University Why to use a Debugger? If your program goes haywire, you may... ( wand (... buy a magic... read the source code again and again and...... enrich

HPC at IU Overview Abhinav Thota Research Technologies Indiana University What is HPC/cyberinfrastructure? Why should you care? Data sizes are growing Need to get to the solution faster Compute power is

..................................... PEPPERDATA IN MULTI-TENANT ENVIRONMENTS technical whitepaper June 2015 SUMMARY OF WHAT S WRITTEN IN THIS DOCUMENT If you are short on time and don t want to read the

Using the Yale HPC Clusters Stephen Weston Robert Bjornson Yale Center for Research Computing Yale University Oct 2015 To get help Send an email to: hpc@yale.edu Read documentation at: http://research.computing.yale.edu/hpc-support

Recent Advances in HPC for Structural Mechanics Simulations 1 Trends in Engineering Driving Demand for HPC Increase product performance and integrity in less time Consider more design variants Find the

An Introduction to High Performance Computing in the Department Ashley Ford & Chris Jewell Department of Statistics University of Warwick October 30, 2012 1 Some Background 2 How is Buster used? 3 Software

High Performance Computing within the AHRP http://www.ahrp.info http://www.ahrp.info The Alliance for HPC Rhineland-Palatinate! History, Goals and Tasks! Organization! Access to Resources! Training and

Hodor and Bran - Job Scheduling and PBS Scripts UND Computational Research Center Now that you have your program compiled and your input file ready for processing, it s time to run your job on the cluster.

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging In some markets and scenarios where competitive advantage is all about speed, speed is measured in micro- and even nano-seconds.

Tableau Server 7.0 scalability February 2012 p2 Executive summary In January 2012, we performed scalability tests on Tableau Server to help our customers plan for large deployments. We tested three different

OLCF Best Practices Bill Renaud OLCF User Assistance Group Overview This presentation covers some helpful information for users of OLCF Staying informed Some aspects of system usage that may differ from

Signiant Agent installation Release 11.3.0 March 2015 ABSTRACT Guidelines to install the Signiant Agent software for the WCPApp. The following instructions are adapted from the Signiant original documentation

Microsoft HPC V 1.0 José M. Cámara (checam@ubu.es) Introduction Microsoft High Performance Computing Package addresses computing power from a rather different approach. It is mainly focused on commodity

NetIQ Privileged User Manager Performance and Sizing Guidelines March 2014 Legal Notice THIS DOCUMENT AND THE SOFTWARE DESCRIBED IN THIS DOCUMENT ARE FURNISHED UNDER AND ARE SUBJECT TO THE TERMS OF A LICENSE

Large system usage HOW TO George Magklaras PhD Biotek/NCMM IT USIT Research Computing Services Agenda Introduction: A Linux server as a collection of memory/disk/cpu What is the problem? memory and SWAP

SLURM: Resource Management and Job Scheduling Software Advanced Computing Center for Research and Education www.accre.vanderbilt.edu Simple Linux Utility for Resource Management But it s also a job scheduler!

Operating System for the K computer Jun Moroo Masahiko Yamada Takeharu Kato For the K computer to achieve the world s highest performance, Fujitsu has worked on the following three performance improvements

NeSI Computational Science Team (support@nesi.org.nz) Outline 1 About Us About NeSI Our Facilities 2 Using the Cluster Suitable Work What to expect Parallel speedup Data Getting to the Login Node 3 Submitting

Monitoring IBM HMC Server eg Enterprise v6 Restricted Rights Legend The information contained in this document is confidential and subject to change without notice. No part of this document may be reproduced

Best Practices @OLCF (and more) Bill Renaud OLCF User Support General Information This presentation covers some helpful information for users of OLCF Staying informed Aspects of system usage that may differ

Capacity Planning Any data warehouse solution will grow over time, sometimes quite dramatically. It is essential that the components of the solution (hardware, software, and database) are capable of supporting