You have MPI jobs that you would like to try running on Azure Nodes. You periodically have increases in the number of MPI jobs that you have to run, or you have long wait times in the job queue. You want to test the possibility of adding Azure nodes to the cluster to handle the extra workload.

Goal

As an example of how to run an MPI job on Azure Nodes, this guide walks through the steps to run Linpack on a set of Azure Nodes that are deployed with the Worker Node template.

Upload Linpack files to the Azure Nodes.

Run the Linkpack program as an HPC job.

Retrieve the output file from the Azure Node.

Note

This guide is meant to introduce some of the steps and tools for deploying and running MPI applications on Azure Nodes, so all of the steps for uploading the application and configuring the firewall and netmask are performed manually in the procedures below. After you have verified application deployment steps manually, many of these steps can be defined as part of the Azure Node provisioning process, so that the application can be automatically available on new Azure Node instances. For more information, see
Appendix 2: Configure a Startup Script for Windows Azure Nodes.

Local storage on Azure Nodes is not persistent. When the nodes instances are stopped and then restarted on a different hardware node, the data stored in local storage does not follow the node instance.

Applications deployed to Windows Azure are subject to the licensing terms associated with the application.

MPI jobs that are not particularly latency and bandwidth sensitive are more likely to scale well in the Azure environment. Latency and bandwidth sensitive MPI jobs can perform well as small jobs, where a single task runs on no more than a few nodes. For example, in the case of an engineering simulation, you can run many small jobs to explore and define the parametric space before increasing the model size.

You must register each MPI application with the firewall on the Azure Nodes. This allows MPI communications to take place on a port that is assigned dynamically by the firewall.

When you run MPI jobs on Azure Nodes, you must ensure that the IP addresses of the Azure Nodes are within the range of accepted IP addresses that is specified for the MPI network mask. The cluster-wide range is defined through the CCP_MPI_NETMASK cluster environment variable, and the value that is specified in this cluster variable is automatically set as a system environment variable on all cluster nodes. Depending on your requirements, there are several ways that you can configure the network mask. You can disable the netmask on the cluster, broaden the range to include your Azure Nodes, or override this value at the node level or at the job level. For example, you can reset the value of CCP_MPI_NETMASK on only your Azure Nodes, or you set it at the job level by including –env MPICH_NETMASK <range> in the
mpiexec command arguments.

MPI jobs cannot run across Azure Nodes that are deployed using different node templates. For example, if you have one set of Azure Nodes that are deployed with a worker role template, and one set of Azure Nodes that are deployed with a virtual machine role template, the MPI job must run on one set or the other.

When you add Azure Nodes to your cluster and bring them Online, the HPC Job Scheduler Service will immediately try to start jobs on the nodes. If only a portion of your workload can run on Azure, ensure that you update or create job templates to define what job types can run on Azure. For example, to ensure that jobs submitted with a template only run on on-premises compute nodes, you can add the Node Groups property to the job template and select Compute Nodes as the required value.

Steps

In the following procedure, we upload the Lizard files and an input file to a set of Azure Nodes that are already deployed. The Lizard utility helps optimize Linpack input on the HPC cluster, and includes the Linpack binaries that we will use for this example. We will run the Linpack command directly, rather than using the Lizard, so that we can run it as a job on the Azure Nodes.

Run the lizard_x86.msi installer on the head node, and change the default installation path to C:\Lizard\.

Prepare the Linpack input file.

Note

This example uses a small problem size (15000 Ns) to speed up the run time. This will not provide a representative measure of cluster performance. If you are using fewer than 32 cores, change the number of Qs to match the number of cores. For more information about these parameters, see
Lizard Help.

Package the Lizard files for upload to Azure storage using
hpcpack and save the package to the C:\AzurePkgs folder. At an elevated command prompt window type the following command:

hpcpack create c:\AzurePkgs\Lizard.zip c:\Lizard

Note

You do not need an elevated command prompt (run as Administrator) to run hpcpack create, but the hpcpack upload and clusrun commands in upcoming steps do require elevation.

Upload the package to Azure storage by using the following command, where myHeadNode is the name of your head node, and myAzureTemplate is the name of the template that you used to deploy the Azure Nodes:

When the job is complete, collect the output file. In this case, only the Rank 0 process writes to the standard out file, so you only need to collect one file. To identify which node has the output file, run the following command:

Clusrun /nodegroup:azurenodes dir %CCP_PACKAGE_ROOT%\lizard\out.txt

Use
hpcfile to download the file from the Azure Node to the head node (where you are running the command from). Run the following command, where AzureCN-000x is the node with the out.txt file, and c:\users\myName\ is the destination folder on the head node:

The output file will include text like the following, which lists the number of floating point operations per second that Linpack measured during the test. Because the problem size was intentionally set to a small number for demonstration purposes, this is not a representative measurement of cluster performance. In the example below, the measured performance was 3.498e+001 Gflops.

You have Microsoft Excel workbooks that can run in parallel on a Windows HPC cluster by using the HPC Macro framework and the
Windows HPC Server 2008 R2: HPC Services for Excel. You periodically have sharp increases in the amount of workbook offloading jobs that you need to run, and you want to test the possibility of adding Azure Nodes to the cluster to handle the extra workload.

Goal

Prepare a VHD and node template for Azure Nodes that have Microsoft Excel installed.

Upload an Excel workbook to the Azure Nodes.

Run the workbook.

Requirements

A cluster with the Enterprise Edition of HPC Pack 2008 R2 SP2 installed (this includes the HPC Services for Excel).

A WCF Broker Node configured.

A Windows Azure subscription.

Participation in the Azure Beta program (Azure Virtual Machine roles are a pre-release feature of Windows Azure).

A cluster-enabled Excel workbook that uses the HPC Macro framework to integrate with HPC Services for Excel.

Administrator permissions on the cluster.

Important considerations

When you run a workbook offloading job on Azure Nodes, the workbook and any dependencies must be copied to each Azure node.

Workbook offloading jobs can run on Azure VM roles that are added to the cluster, not on Azure worker roles. UDF offloading jobs can run on Azure VM nodes and on Azure Worker nodes and do not require that you install Excel on the nodes, for more information, see
Upload an XLL file to a Windows Azure storage account.

Azure nodes cannot access on-premises nodes or shares directly. Workbooks that use external data or shares might require additional development and planning before they can run on Azure Nodes.

The Microsoft.Hpc.Scheduler and Microsoft.Hpc.Scheduler.Properties COM APIs are not registered on Azure VMs and therefore workbooks that reference these APIs will fail to resolve the reference.

The ExcelClient COM API cannot be used directly on Azure VMs because of limited access back to the on-premise scheduler. The COM APIs will correctly resolve to workbooks referencing them to run correctly, but should not be used from within HPC_Execute.

Applications deployed to Windows Azure are subject to the licensing terms associated with the application.

Steps

The following steps outline how to prepare a VHD and a node template for Azure virtual machine nodes that already have Microsoft Excel installed. For detailed requirements and steps for adding Azure VM nodes to your cluster, see
Steps for Deploying Windows Azure VM Nodes.

Follow the instructions in
Step 4: Create a VHD for VM Nodes to prepare a VHD operating system image, and for the “Install and configure your applications” step, install Microsoft Excel 2010 and any updates to Excel that you require. Do not copy workbooks or any small dependencies that are likely to change frequently directly to the VHD image.

Note

The cluster-side features of HPC Services for Excel are automatically included when you install the Windows HPC Server components on the VHD.

Click New, and in the Create Node Template Wizard, select the following options:

In Choose Node Template Type, select Windows Azure node template.

In Specify Template Name, type AzureVMExcel.

Provide you subscription and service information.

In Specify Node Role, select VM role.

In the VHD image dropdown list, select the VHD image that you just created.

Configure the remote desktop credentials. This enables you to use the Remote Desktop action in HPC Cluster Manager to connect to the Azure Nodes.

Select the option to configure Azure availability policy manually.

The following steps describe how to stage an Excel workbook to Azure Storage. Packages that are staged to Azure storage using hpcPack are automatically copied to any new Azure Node instances that you provision (or that are automatically reprovisioned by the Windows Azure system).

Package the workbook for upload to Azure storage by using
hpcpack and save the package to the C:\AzurePkgs folder. At an elevated command prompt window type the following command, where “c:\Excel\myWorkbook.xlsb” points to your workbook:

hpcpack create C:\AzurePkgs\myWorkbook.zip c:\Excel\myWorkbook.xlsb

Important

The package name must be the same as the workbook name. If your workbook has dependencies such as other workbooks or DLLs, create a folder that includes the workbook and supporting files, and then package the entire folder. For example: hpcpack create C:\AzurePkgs\myWorkbook.zip c:\Excel\myWorkbookFiles. The workbook must be at the top level of the folder that you are packaging and cannot be contained in a sub-folder. In the example, the workbook would be in “c:\Excel\myWorkbookFiles\myWorkbook.xlsb”.

Note

You do not need an elevated command prompt (run as Administrator) to run hpcpack create, but the hpcpack upload command in an upcoming step does require elevation.

If you upload packages to storage after the nodes are started, you can use
hpcsync to manually copy the files to the nodes. For example: clusrun /nodegroup:AzureNodes hpcsync

The following procedure describes how to start new Azure Node instances. These instances will include Microsoft Excel, the HPC Services for Excel cluster-side features, and any workbooks that you have staged to Azure Storage.

In HPC Cluster Manager, in Node Management click, Add Node and use the wizard to specify the VM node template, the number, and the size of the nodes to add. The nodes will appear in the node list in the Not-Deployed state.

Select the Azure nodes, right-click, and then click Start. This action deploys a set of VM role instances in Windows Azure, and can take some time to complete.

When the nodes move from the Provisioning state to the Offline state, right-click the nodes and then click Bring Online.

You are now ready to submit the offloading job to the cluster. There should be no difference from the point of view of the cluster user that is running the workbook.

Note

When you add Azure Nodes to your cluster and bring them Online, the HPC Job Scheduler Service will immediately try to start jobs on the nodes. If only a portion of your workload can run on Azure, ensure that you update or create job templates to define what job types can run on Azure. For example, to ensure that jobs submitted with a template only run on on-premises compute nodes, you can add the Node Groups property to the job template and select Compute Nodes as the required value.

Expected results

Workbook offloading jobs (using the HPC macro framework and HPC Service for Excel) can run on Azure Nodes with no change for the end user.

Your organization uses Smart Card authentication. You want users to be able to use their Smart Card credentials to submit jobs to the HPC cluster. Some of these jobs are long running, and you don’t want jobs to fail because of ticket time outs.

Because Smart Card users do not have passwords, you want them to be able to use their Smart Card to generate a Soft Card certificate that can be used as credentials on the cluster. If the Soft Card credentials are nearing their expiration date, you want users to generate a new Soft Card before submitting new jobs.

Goal

Configure the HPC Soft Card authentication policy on the cluster and identify a certificate template that must be used when generating an HPC Soft Card for the cluster. Set the Soft Card expiration warning period.

Generate an HPC Soft Card credential and submit a job.

Requirements

A cluster with HPC Pack 2008 R2 SP2 installed.

The head node and compute must have a version of the Windows Server 2008 R2 operating system. The key storage provider (KSP) for HPC soft cards is not supported on Windows Server 2008.

Administrator permissions on the cluster.

The Active Directory and Active Directory domain controllers must be configured for Smart Card authentication.

The certificate template that will be used to generate HPC Soft Card credentials must allow the private key to be exported.

Note

To generate HPC soft card credentials on a client computer, the client computer must have the Windows Vista® or Windows® 7 operating system installed.

Important considerations

You can use HPC Soft Card credentials to submit jobs, run SOA sessions, and run diagnostic tests.

If you are using HPC Soft Card credentials, you cannot run jobs as a different user.

Before enabling HPC Soft Card authentication on the cluster, work with your certification authority (CA) or PKI administrator to choose or create a certificate template that should be used when generating a soft card for the cluster. The certificate template must allow the private key to be exported. Ensure that the validity period in the template is long enough to accommodate the job lifecycle. Optionally, the template can also have an associated access control list that defines who can use the certificate template.

Note

The CA role service includes several default certificate templates. The CA administrator can create an HPC soft card template by copying and then modifying the default Smart Card Logon template as follows:

In Application Policies, remove smart card.

In Request Handling, select “Allow private key to be exported”.

In Security, specify the users who can enroll (optional).

Ensure that the validity period in the template is long enough to accommodate the job lifecycle.

Install the key storage provider (KSP) on the head node, compute nodes, and workstation nodes. The installer is included in the SP2 download. Run the version that is appropriate for the operating system on each computer: HpcKsp64.msi or HpcKsp86.msi.

You can copy the installers to a shared folder that all on-premises nodes can access and then use the
clusrun command to install the KSP on all nodes. For example you can copy the installers to the ccpspooldir share on the head node (\\<headnode>\ccpspooldir) and then run the following command (for 64-bit computers):

clusrun msiexec /passive /I \\<headnode>\ccpspooldir\hpcksp_x64.msi

Set the HPC Soft Card authentication policy on the head node by setting the HpcSoftCard cluster property. HpcSoftCard property is set to Disabled by default. If you want users to always use soft card authentication, set the property to Required. If you want users to choose between password or soft card log on, set the property to Allowed.

For example, run HPC PowerShell as an Administrator and type:

Set-HpcClusterProperty –HpcSoftCard:Allowed

Or at an elevated command prompt window, type:

cluscfg setparams “hpcSoftCard=Allowed”

Set the HpcSoftCardTemplate cluster property to specify the certificate template that should be used to generate a soft card credential.

For example, run HPC PowerShell as an Administrator and type:

Set-HpcClusterProperty –HpcSoftCardTemplate:

Or at an elevated command prompt window, type:

cluscfg setparams “hpcSoftCardTemplate=”

You can configure the warning period for soft cards that are nearing their expiration date. By default, this value is set to 5 days. If a user tries to submit a job with less than 5 days before their credentials expire, the job will be rejected. The user will see an error message about the soft card expiration, and will need to generate a new soft card certificate before resubmitting the job. You can configure this value by setting the SoftCardExpirationWarning cluster property.

For example, run HPC PowerShell as an Administrator and type:

Set-HpcClusterProperty –SoftCardExpirationWarning:3

Or at an elevated command prompt window, type:

cluscfg setparams “SoftCardExpirationWarning=3”

Note

To disable expiration warnings, you can set SoftCardExpirationWarning to 0.

The following procedure describes how a cluster user can generate an HPC Soft Card credential. You can use HPC PowerShell or a command prompt window. The commands are used to generate a public key pair and obtain the certificate from the CA that is configured for your Active Directory domain. The certificate is based on the template that is specified by the HpcSoftCardTemplate cluster property. The certificate is placed in your personal certificate store on your computer.

Note

The computer that you log on to must have the HPC Pack 2008 R2 SP2 client utilities installed.

Use one of the following methods to delete any previously cached credentials, if any:

Run HPC PowerShell and type:

remove-hpcJobCredential

Or at a command prompt window, type:

hpccred delcreds

Submit a test job, for example:

Run HPC PowerShell and type:

New-hpcJob|add-hpcTask –command:”echo hello”|submit-hpcjob

Or at a command prompt window, type:

Job submit echo hello

When prompted, select which credentials to use.

You can cache credentials on the cluster for jobs, diagnostics, or SOA sessions. To cache an HPC Soft Card, the certificate must be in the user’s personal store on the local computer (from which you are running the command to set credentials). The certificate along with the corresponding key pair will be encrypted and transmitted to the HPC Job Scheduler Service. If an HPC Soft Card for that user is already cached, it will be replaced.

You can use the following commands to manage your HPC Soft Card credentials, or to set SOA or diagnostic credentials:

Task

HPC PowerShell

Command prompt window

Get your HPC Soft Card credential

get-hpcJobCredential

Hpccred getcreds

Delete your cached credentials

remove-hpcJobCredential

Hpccred delcreds

Cache your HPC Soft Card on the cluster (jobs)

set-hpcJobCredential -softcard

Hpccred setcreds -softcard

Cache your HPC Soft Card on the cluster (SOA)

set-hpcSoaCredential -softcard

not available

Cache your HPC Soft Card on the cluster (diagnostics)

set-hpcTestCredential -softcard

test setcreds -softcard

Expected results

You can use the HPC Soft Card to submit a job. If Soft Cards are allowed (not required), you will be prompted to select an authentication method. If you use HPC Soft Card authentication, the soft card that you created will be used automatically. If there is more than one certificate in your certificate store, you will be prompted to choose from a list of available certificates.

If your HPC Soft Card is within SoftCardExpirationWarning days of expiring, you will be prompted to create a new HPC Soft Card before submitting the job.

Various user groups in your organization have contributed to the cluster budget, and in return they expect to have a determined portion of the cluster at their disposal. If at any given time a group has a light workload and does not utilize their entire share of the cluster, you want those resources temporarily made available to other groups. So to guarantee availability and maximize cluster utilization, you want the HPC Job Scheduler Service to allocate resources based on Resource Pools.

Goal

Create Resource Pools to define guaranteed cluster proportions. Create Job Templates to associate each user group or job type with a Resource Pool. Configure the HPC Job Scheduler Service to allocate resources based on Resource Pools.

Requirements

A cluster with HPC Pack 2008 R2 SP2 installed.

Administrator permissions on the cluster.

Important considerations

Resource pool definitions

Weight: An integer between 0 and 999,999 that represents the proportion of cluster cores that should be guaranteed to the pool.

Guaranteed cores: The number of cores that correspond to the weight defined for the pool. The number of guaranteed cores will vary according to how many nodes are Online and reachable at any given time. The number of guaranteed cores is calculated as (poolWeight/totalWeights)*NumberOfCoresOnline.

Allocated cores: The number of cores that are actually being used by jobs that are submitted to the pool. This number can be higher or lower than the number of guaranteed cores.

A pool with a weight of 0 has no guaranteed cores, but can have allocated cores if there are jobs that are submitted to the pool, and the other pools are not using all of their resources.

The Default Pool cannot be deleted. When Resource Pools are enabled in the HPC Job Scheduler Service, any jobs that do not specify a pool will use the Default Pool. Unlike custom pools, specifying the Default Pool does not provide any guarantee of resources. You can set the weight of the Default Pool to 0.

When the Job Scheduler calculates the number of cores for each Resource Pool (according to pool weight), the resulting value is rounded down to the nearest whole number. The remainder cores are added to the Default Pool.

Resource Pools and node groups provide distinct ways to allocate cluster resources to a job, and they are not intended to be used together. If you add both specific node groups and Resource Pools to a job template, the Job Scheduler will restrict access to cluster resources based on both properties independently.

Steps

In this example, let’s say you have two user groups, and each group expects to be able to use the following proportions of the cluster at any given time: Group A 60%, and Group B 40%. Let’s also say that Group A has two distinct types of jobs for which they want separate job templates: one type is high priority, and the other type is low priority. To enforce the desired scheduling policies, you create three node templates: “GroupA_HighPriJobs”, “GroupA_LowPriJobs”, and “GroupB_AllJobs”.

All jobs that are assigned to a particular resource Pool will collectively be guaranteed the proportion of cluster cores that are defined for the Resource Pool, and will be scheduled within the pool according to job priority, submit time, and scheduling mode (Queued or Balanced). For example, jobs that are submitted using the job templates “GroupA_HighPriJobs” and “GroupA_LowPriJobs” will collectively be guaranteed 60% of the Online cluster cores.

If both groups have jobs in the queue, the cluster will be shared according to the resource pool weights.

If one group has no jobs in the queue, or not enough jobs to keep their share of the cluster busy, the other group can temporarily make use of the resources.

Cluster administrators can fine tune cluster performance by controlling how many HPC tasks should run on a particular node. Over-subscription provides the ability to schedule more processes on a node than there are physical cores or sockets. A process could be an MPI rank, a single-core task, a sub-task, or an instance of a SOA service. For example, if a node has eight cores, then normally eight processes could potentially run on that node. With over-subscription, you can set the subscribedCores node property to a higher number, for example 16, and the HPC Job Scheduler Service could potentially start 16 processes on that node. Conversely, under-subscription provides the ability to schedule fewer processes on a node than there are physical cores or sockets.

For example, this can be useful in the following scenarios:

Part of the cluster workload consists of coordinator tasks that use very few compute cycles. An MPI code can have a master process that does not run very much, but distributes work to the other processes. To improve utilization, you can oversubscribe a node, and ensure that your Rank0 process starts on that node (see
MPI Rank0 placement script)

Your MPI code needs more memory bandwidth than the processor can support if all cores are running. To improve performance, you can undersubscribe the node so that only the desired number of processes can run on that node.

You only want to use a subset of cores or sockets on a particular node for running cluster jobs, so you undersubscribe the node. For example, if you enable the compute node role on a broker node or head node, you can essentially limit the number of cores that are used for the compute node role by undersubscribing the node.

Goal

Set the number of subscribed cores and sockets on a node.

Submit a job.

Requirements

A cluster with HPC Pack 2008 R2 SP2 installed.

Administrator permissions on the cluster.

Important considerations

Node property definitions:

subscribedCores

Specifies the number of cores that you want the HPC Job Scheduler Service to use when it is allocating tasks to the node. It can be larger or smaller than the number of physical cores. To clear this property, set the value to $null.

Ensure that the number of subscribed cores is divisible by the number of subscribed sockets (or of physical sockets is no value is set for subscribedSockets). That is to say, each socket must have the same number of cores (for example 8 cores and 4 sockets is ok, but 10 cores and 4 sockets is not).

subscribedSockets

Specifies the number of sockets that the HPC Job Scheduler Service should use when it is allocating tasks to the node. It can be larger or smaller than the number of physical sockets. To clear this property, set the value to $null.

Ensure that the number of subscribed cores (or of physical cores, if no value is set for subscribedCores) is divisible by the number of subscribed sockets. That is to say, each socket must have the same number of cores (for example 8 cores and 4 sockets is ok, but 10 cores and 4 sockets is not).

affinity

Specifies how affinity is managed for tasks that run on the node. By default, the value is null, which means that affinity is managed according to the job scheduler affinity policy. For more information about the job scheduler affinity policy, see
Understanding Affinity. If this property is set, node affinity overrides the job scheduler affinity settings. If it is set to false, affinity on the node is not managed by the HPC services, and the operating system or the application manages placement of tasks on physical cores. If it is set to true, the HPC Node Manager Service sets affinity for tasks (assigns tasks to specific cores).

Note

These properties can only be set on nodes that are in the Offline node state.

Steps

As an example, let’s say we have a node named CN001 that has 4 cores and 1 socket. We want to set the subscribed cores and sockets to 8 and 2.

Node properties changes are applied during scheduling passes. If your changes are not reflected when you run get-hpcnode, wait a few seconds and then try again.

Bring the node Online:

set-hpcnodestate –name:CN001 –state:online

Now you can submit a job to verify that the HPC Job Scheduler will now start up to 8 tasks (or sub-tasks) on CN001. For example, in HPC PowerShell, to submit a parametric sweep job that requires 8 cores and requests CN001:

You want your cluster users to be able to submit and monitor jobs from a web portal.

Goal

Launch the HPC Web Portal.

Create an Application Profile.

Create a Job Submission page.

Submit a job through the portal.

Requirements

A cluster with HPC Pack 2008 R2 SP2 installed.

Administrator permissions on the cluster.

The installation file HpcWebComponents.msi. HpcWebComponents.msi is included in the HPC2008R2SP2-Update-x64.zip file available at the
Microsoft Download Center, or you can locate the file on the full installation media for HPC Pack 2008 R2 with SP2 or later.

Important considerations

When the portal is set up, cluster users will be able to submit and monitor jobs from the https://<headnode>/hpcportal site. Users will need to be logged on to their computer or launch Internet Explorer with their domain credentials.

You can use a job template as a basis for more than one submission page.

Job submission pages allow the cluster administrator to provide a simplified submission experience, but do not provide actual constraints on job or task properties. The submission page can specify constraints on values that are allowed when creating the job, but value limitations that are defined in the web portal are not enforced by the HPC Job Scheduler Service. After the job is submitted, the job owner will be able to see the values for all properties, and modify values within the restrictions defined by the underlying job template, as permitted by the job state. For information about what properties can be modified in different job states, see
Modify a Job.

Restrictions that are specified by the job template cannot be overridden by the submission page. But the default values that are specified in the submission page do override the default values that are specified in the job template.

Select a certificate option from the displayed list. For testing purposes you can type 0 to generate and configure a self-signed certificate.

When configuration and installation are complete, open Internet Explorer and add the web portal to the list of trusted sites as follows:

On the Tools menu, click Internet Options.

In the Security tab, select the Trusted sites zone, and then click Sites.

Add https://localhost/hpcportal to the list of sites.

Note

The default security level for trusted sites (Medium) allows AJAX, which is required to view the web portal.

Open Internet Explorer and go to the following address:

https://localhost/hpcportal

If you see a certificate error warning, click “Continue to this website”.

If you are prompted, type your domain credentials.

To demonstrate how the application profile works, the following procedure describes how to create a profile for the ping command with the parameter –n. So for example, a user could run the following command ping localhost –n 2

In the job property visibility and defaults page, configure the properties as follows:

Job Name: “myPingJob”.

Project Name: Clear the Show check box for this property.

Define the parametric sweep values: start=1, end=50, increment=1.

Clear the Show check boxes for the email, working directory, and standard in, out and error properties.

On the next page, define Node Preparation and Release tasks as follows:

Do not show the Node Preparation task options, and specify a default value of “echo hello”.

Note

If a property is not shown, the default value is applied, and the job owner cannot change the value from the job submission page. After the job is submitted, the job owner will be able to see the values for all properties, and modify values within the restrictions defined by the underlying job template, as permitted by the job state. For information about what properties can be modified in different job states, see
Modify a Job.

Select the check box to show the Node Release task options, and specify a default value of “echo goodbye”.

On the Specify application profile page, select the option to use an existing profile, click Select, select “PingCommand” in the dialog box, and then click OK.

On the next page, accept the default visibility settings for the PingCommand Application Profile.

You support a varied workload on your cluster, and you want to use custom filters to provide additional checks and controls on jobs that are submitted to your cluster. However, some of the properties and conditions that you want to check for only apply to certain job types, so rather than specifying a single filter for all jobs that are submitted to the cluster, you want to specify one or more filters that should run only on jobs that are submitted with a particular job template. For example, you can ensure that an activation filter that checks for license availability only runs on jobs that require a license.

Goal

To demonstrate how custom filters that are defined at the job template level work, this guide describes how to compile and install a sample submission filter from the SDK code samples. The submission filter checks for job property values and if conditions are met, reassigns the job to a different job template. If a job owner already selected a custom job template, we do not want to reassign the job, so we will run this filter only on jobs that are submitted with the Default job template.

Filters that are specified at the job template level must be defined as a DLL (and will run in the same process as the HPC Job Scheduler Service), rather than as an executable like the cluster-wide filters (which run in a separate process).

Job template filters can modify jobs and influence how the HPC Job Scheduler Service processes jobs in the same way as cluster-wide filters. For more information about how the job scheduler interprets filter exit codes, see
Understanding Activation and Submission Filters.

When a job is submitted or ready for activation, the job-template filters will run in the order listed in the template, and will run before the cluster-wide filter.

HPC Server 2008 R2 has the .NET 3.5 framework installed by default. If you are compiling in Visual Studio 2010, you must select .NET 3.5 when compiling your filter DLL for the job scheduler (the default framework in Visual Studio 2010 is .NET 4.0). Even if you install .NET 4.0 on the cluster, the job scheduler is based on .NET 3.5, so any DLL that loads in the scheduler process must also be .NET 3.5.

Steps

To experiment with job template filters, you can build and try the sample submission filter that is included in the SP2 code samples. The SP2 code samples include a C# project file that you can compile in Visual Studio to produce a DLL file. You can then deploy this DLL file to the cluster and associate it with a job template.

On the head node, open the %CCP_DATA%Filters folder (typically, this is C:\Program Files\Microsoft HPC Pack 2008 R2\Data\Filters).

Create a new sub-folder named “SubmissionJobSize”.

Copy the SubmissionJobSize DLL and PDB files (from the SubmissionJobSize\bin\Debug folder) and place them in the new folder you just created.

Note

If your filters have more than one file, or if they create output files, it is good practice to create a sub-folder for each filter in the %CCP_DATA%Filters folder.

The PDB file is not required, but you can include it to help when debugging the filter.

The sample submission filter checks the job XML file to see if the maximum number of requested resources for the job is greater than 1 (and not set to autocalculate). If so, the filter changes the value for the job template property and assigns a job template named “LargeJobTemplate” (to test this filter, you must first create a job template named “LargeJobTemplate”). We will add the filter to the Default job template. That way, any job that is submitted to the Default template will be checked for maximum requested resource settings, and if greater than 1, the job will be assigned to the new template.

In the wizard, type “LargeJobTemplate” for the template name, select the Finish tab, and then click Finish.

Right-click the Default job template and then click Edit.

Click Add, and then select SubmissionFilters.

In Valid Values, specify the location of the filter relative to the %CCP_DATA%Filters folder. In this case, type SubmissionJobSize\SubmissionJobSize.dll, as illustrated in the following screen shot:

Note

You can specify more than one filter in the value field. List each filter on its own line (filters are delimited by a carriage return). Filters run in the listed order, and will run before the cluster-wide filter, if specified.