Amanda E. Peters, Rochester US

Amanda E. Peters, Rochester, MN US

Patent application number

Description

Published

20080209179

Low-Impact Performance Sampling Within a Massively Parallel Computer - An apparatus, program product and method sample at different times nodes that are performing similar work. Performance data associated with first and second node subsets performing the similar work are sampled at different times, e.g., in a round-robin fashion, and in accordance with a given sampling rate. The performance data is analyzed. Nodes whose performance suffers as a result of a sampling operation may be identified and removed from a subsequent operation.

08-28-2008

20090043728

Query Optimization in a Parallel Computer System to Reduce Network Traffic - An apparatus and method for a database query optimizer to optimize a query that uses multiple networks. The query optimizer optimizes a query to reduce network traffic on a network or node that is overloaded or above an established parameter in a node/network attribute table. The query optimization to reduce network traffic may result in a sub-optimal query in other respects such as execution time. The result is a query optimizer that rewrites or optimizes a query to execute on multiple nodes or networks to reduce traffic on a network or node according to the loading characteristics and assigned attributes of a node or network.

02-12-2009

20090043745

Query Execution and Optimization with Autonomic Error Recovery from Network Failures in a Parallel Computer System with Multiple Networks - An apparatus and method for a database query execution monitor determines if an network error or low performance condition exists and then where possible modifies the query. The query execution monitor then determines an alternate query execution plan to continue execution of the query. The query optimizer can re-optimize the query to use a different network or node. Thus, the query execution monitor allows autonomic error recovery for network failures using an alternate query execution. The alternate query execution could also be determined at the initial optimization time and then this alternate plan used to execute a query in the case of a particular network failure.

02-12-2009

20090043750

Query Optimization in a Parallel Computer System with Multiple Networks - An apparatus and method for a database query optimizer to optimize a query that uses multiple networks. The database query optimizer optimizes a query that uses multiple networks to satisfy the query by splitting the query execution to use multiple networks. Thus, the query optimizer rewrites or optimizes a query to execute on multiple nodes or networks to more efficiently execute the query and reduce network traffic on a network. The query optimizer uses plan cache statistics to determine whether to use multiple networks to optimize the query.

02-12-2009

20090043873

Methods and Apparatus for Restoring a Node State - In one aspect of the invention, a method is provided. The method may include: (1) storing a snapshot of a system state of a node; (2) executing a job on the node; and (3) restoring the node to the system state using the stored snapshot of the system state.

02-12-2009

20090043910

Query Execution and Optimization Utilizing a Combining Network in a Parallel Computer System - An apparatus and method for a database query optimizer utilizes a combining network to optimize a portion of a query in a parallel computer system with multiple nodes. The efficiency of the parallel computer system is increased by offloading collective operations on node data to the global combining network. The global combining network performs collective operations such as minimum, maximum, sum, and logical functions such as OR and XOR.

02-12-2009

20090271784

Executing A Distributed Java Application On A Plurality Of Compute Nodes - Methods, systems, and products are disclosed for executing a distributed Java application on a plurality of compute nodes. The Java application includes a plurality of jobs distributed among the plurality of compute nodes. The plurality of compute nodes are connected together for data communications through a data communication network. Each of the plurality of compute nodes has installed upon it a Java Virtual Machine (‘JVM’) capable of supporting at least one job of the Java application. Executing a distributed Java application on a plurality of compute nodes includes: tracking, by an application manager, JVM environment variables for the JVMs installed on the plurality of compute nodes; and configuring, by the application manager, the plurality of jobs for execution on the plurality of compute nodes in dependence upon the JVM environment variables for the JVMs installed on the plurality of compute nodes.

10-29-2009

20090271799

Executing A Distributed Java Application On A Plurality Of Compute Nodes - Methods, systems, and products are disclosed for executing a distributed Java application on a plurality of compute nodes. The Java application includes a plurality of jobs distributed among the plurality of compute nodes. The plurality of compute nodes are connected together for data communications through a data communication network. Each of the plurality of compute nodes has installed upon it a Java Virtual Machine (‘JVM’) capable of supporting at least one job of the Java application. Executing a distributed Java application on a plurality of compute nodes includes: tracking, by an application manager, a just-in-time (‘JIT’) compilation history for the JVMs installed on the plurality of compute nodes; and configuring, by the application manager, the plurality of jobs for execution on the plurality of compute nodes in dependence upon the JIT compilation history for the JVMs installed on the plurality of compute nodes.

10-29-2009

20090300384

Reducing Power Consumption While Performing Collective Operations On A Plurality Of Compute Nodes - Methods, apparatus, and products are disclosed for reducing power consumption while performing collective operations on a plurality of compute nodes that include: receiving, by each compute node, instructions to perform a type of collective operation; selecting, by each compute node from a plurality of collective operations for the collective operation type, a particular collective operation in dependence upon power consumption characteristics for each of the plurality of collective operations; and executing, by each compute node, the selected collective operation.

12-03-2009

20090300385

Reducing Power Consumption While Synchronizing A Plurality Of Compute Nodes During Execution Of A Parallel Application - Methods, apparatus, and products are disclosed for reducing power consumption while synchronizing a plurality of compute nodes during execution of a parallel application that include: beginning, by each compute node, performance of a blocking operation specified by the parallel application, each compute node beginning the blocking operation asynchronously with respect to the other compute nodes; reducing, for each compute node, power to one or more hardware components of that compute node in response to that compute node beginning the performance of the blocking operation; and restoring, for each compute node, the power to the hardware components having power reduced in response to all of the compute nodes beginning the performance of the blocking operation.

12-03-2009

20090300386

Reducing power consumption during execution of an application on a plurality of compute nodes - Methods, apparatus, and products are disclosed for reducing power consumption during execution of an application on a plurality of compute nodes that include: powering up, during compute node initialization, only a portion of computer memory of the compute node, including configuring an operating system for the compute node in the powered up portion of computer memory; receiving, by the operating system, an instruction to load an application for execution; allocating, by the operating system, additional portions of computer memory to the application for use during execution; powering up the additional portions of computer memory allocated for use by the application during execution; and loading, by the operating system, the application into the powered up additional portions of computer memory.

12-03-2009

20090300394

Reducing Power Consumption During Execution Of An Application On A Plurality Of Compute Nodes - Methods, apparatus, and products are disclosed for reducing power consumption during execution of an application on a plurality of compute nodes that include: executing, by each compute node, an application, the application including power consumption directives corresponding to one or more portions of the application; identifying, by each compute node, the power consumption directives included within the application during execution of the portions of the application corresponding to those identified power consumption directives; and reducing power, by each compute node, to one or more components of that compute node according to the identified power consumption directives during execution of the portions of the application corresponding to those identified power consumption directives.

12-03-2009

20090300399

Profiling power consumption of a plurality of compute nodes while processing an application - Methods, apparatus, and products are disclosed for profiling power consumption of a plurality of compute nodes while processing an application that include: executing the application on the plurality of compute nodes; monitoring performance characteristics for components of the plurality of compute nodes during execution of the application; and recording, in a power profile for the application, power consumption during execution of the application in dependence upon the performance characteristics for components of the plurality of compute nodes.

12-03-2009

20090307036

Budget-Based Power Consumption For Application Execution On A Plurality Of Compute Nodes - Methods, apparatus, and products are disclosed for budget-based power consumption for application execution on a plurality of compute nodes that include: assigning an execution priority to each of one or more applications; executing, on the plurality of compute nodes, the applications according to the execution priorities assigned to the applications at an initial power level provided to the compute nodes until a predetermined power consumption threshold is reached; and applying, upon reaching the predetermined power consumption threshold, one or more power conservation actions to reduce power consumption of the plurality of compute nodes during execution of the applications.

12-10-2009

20090307703

Scheduling Applications For Execution On A Plurality Of Compute Nodes Of A Parallel Computer To Manage temperature of the nodes during execution - Methods, apparatus, and products are disclosed for scheduling applications for execution on a plurality of compute nodes of a parallel computer to manage temperature of the plurality of compute nodes during execution that include: identifying one or more applications for execution on the plurality of compute nodes; creating a plurality of physically discontiguous node partitions in dependence upon temperature characteristics for the compute nodes and a physical topology for the compute nodes, each discontiguous node partition specifying a collection of physically adjacent compute nodes; and assigning, for each application, that application to one or more of the discontiguous node partitions for execution on the compute nodes specified by the assigned discontiguous node partitions.

12-10-2009

20090307708

Thread Selection During Context Switching On A Plurality Of Compute Nodes - Methods, apparatus, and products are disclosed for thread selection during context switching on a plurality of compute nodes that includes: executing, by a compute node, an application using a plurality of threads of execution, including executing one or more of the threads of execution; selecting, by the compute node from a plurality of available threads of execution for the application, a next thread of execution in dependence upon power characteristics for each of the available threads; determining, by the compute node, whether criteria for a thread context switch are satisfied; and performing, by the compute node, the thread context switch if the criteria for a thread context switch are satisfied, including executing the next thread of execution.

12-10-2009

20090313636

Executing An Application On A Parallel Computer - Methods, apparatus, and products are disclosed for executing an application on a parallel computer that include: executing, by a current compute node, a current task of the application, including producing results; determining, by the current compute node in dependence upon current network characteristics and application characteristics, whether to transfer the results to a next compute node for further processing by a next task on the next compute node or to execute the next task for further processing of the results on the current compute node; transferring, by the current compute node, the results to the next compute node for further processing by the next task on the next compute node if the determination specifies transferring the results to the next node; and executing, by the current compute node, the next task for further processing of the results if the determination specifies executing the next task on the current compute node.

12-17-2009

20090319662

Process Migration Based on Exception Handling in a Multi-Node Environment - A process on a highly distributed parallel computing system is disclosed. When a first compute node in a first pool is ready to hand-off a task to second pool for further processing, the first compute node may first determine whether a node is available in the second pool. If no node is available from the second pool, then the first compute node may begin performing a primary task assigned to the second pool of nodes, up to the point where a service available exclusively to the nodes of the second pool is required. In the interim, however, one of the nodes of the second pool may become available. Alternatively, an application program running on a compute node may be configured with an exception handling routine that catches exceptions and migrates the application to a compute node where a necessary service is available, as such exceptions occur.

12-24-2009

20090320023

Process Migration Based on Service Availability in a Multi-Node Environment - A process on a highly distributed parallel computing system is disclosed. When a first compute node in a first pool is ready to hand-off a task to second pool for further processing, the first compute node may first determine whether a node is available in the second pool. If no node is available from the second pool, then the first compute node may begin performing a primary task assigned to the second pool of nodes, up to the point where a service available exclusively to the nodes of the second pool is required. In the interim, however, one of the nodes of the second pool may become available. Alternatively, an application program running on a compute node may be configured with an exception handling routine that catches exceptions and migrates the application to a compute node where a necessary service is available, as such exceptions occur.

12-24-2009

20100005326

Profiling An Application For Power Consumption During Execution On A Compute Node - Methods, apparatus, and products are disclosed for profiling an application for power consumption during execution on a compute node that include: receiving an application for execution on a compute node; identifying a hardware power consumption profile for the compute node, the hardware power consumption profile specifying power consumption for compute node hardware during performance of various processing operations; determining a power consumption profile for the application in dependence upon the application and the hardware power consumption profile for the compute node; and reporting the power consumption profile for the application.

01-07-2010

20100014523

Providing Point To Point Communications Among Compute Nodes In A Global Combining Network Of A Parallel Computer - Methods, apparatus, and products are disclosed for providing point to point data communications among compute nodes in a global combining network of a parallel computer that include: determining a class route identifier available for all of the nodes along a communications path from an origin node to a target node; configuring network hardware of each node along the communications path with routing instructions in dependence upon the available class route identifier and the network's topology; transmitting, by the origin node along the communications path, a network packet to the target node, including encoding the available class route identifier in the network packet; and routing, by the network hardware of each node along the communications path, the network packet to the target node in dependence upon the routing instructions for each node and the available class route identifier.

01-21-2010

20100017420

Performing An All-To-All Data Exchange On A Plurality Of Data Buffers By Performing Swap Operations - Methods, apparatus, and products are disclosed for performing an all-to-all exchange on n number of data buffers using XOR swap operations. Each data buffer has n number of data elements. Performing an all-to-all exchange on n number of data buffers using XOR swap operations includes for each rank value of i and j where i is greater than j and where i is less than or equal to n: selecting data element i in data buffer j; selecting data element j in data buffer i; and exchanging contents of data element i in data buffer j with contents of data element j in data buffer i using an XOR swap operation.

01-21-2010

20100095303

Balancing A Data Processing Load Among A Plurality Of Compute Nodes In A Parallel Computer - Methods, apparatus, and products are disclosed for balancing a data processing load among a plurality of compute nodes in a parallel computer that include: partitioning application data for processing on the plurality of compute nodes into data chunks; receiving, by each compute node, at least one of the data chunks for processing; estimating, by each compute node, processing time involved in processing the data chunks received by that compute node for processing; and redistributing, by at least one of the compute nodes to at least one of the other compute nodes, a portion of the data chunks received by that compute node in dependence upon the processing time estimated by that compute node.

04-15-2010

20120246509

GLOBAL DETECTION OF RESOURCE LEAKS IN A MULTI-NODE COMPUTER SYSTEM - A process is disclosed for identifying and recovering from resource leaks on compute nodes of a parallel computing system. A resource monitor stores information about system resources available on a compute node in a clean state. After the compute node runs a job, the resource monitor compares the current resource availability to the clean state. If a resource leak is found, the resource monitor contacts a global resource manger to remove the resource leak.

09-27-2012

20130185278

QUERY OPTIMIZATION IN A PARALLEL COMPUTER SYSTEM TO REDUCE NETWORK TRAFFIC - A database query optimizer optimizes a query that uses multiple networks. The query optimizer optimizes a query to reduce network traffic on a network or node that is overloaded or above an established parameter in a node/network attribute table. The query optimization to reduce network traffic may result in a sub-optimal query in other respects such as execution time. The result is a query optimizer that rewrites or optimizes a query to execute on multiple nodes or networks to reduce traffic on a network or node according to the loading characteristics and assigned attributes of a node or network.

07-18-2013

20130185588

QUERY EXECUTION AND OPTIMIZATION WITH AUTONOMIC ERROR RECOVERY FROM NETWORK FAILURES IN A PARALLEL COMPUTER SYSTEM WITH MULTIPLE NETWORKS - A database query execution monitor determines if a network error or low performance condition exists and then where possible modifies the query. The query execution monitor then determines an alternate query execution plan to continue execution of the query. The query optimizer can re-optimize the query to use a different network or node. Thus, the query execution monitor allows autonomic error recovery for network failures using an alternate query execution. The alternate query execution could also be determined at the initial optimization time and then this alternate plan used to execute a query in the case of a particular network failure.