Monday, March 25, 2013

A non-exhausted Hadoop Ecosystems are listed in the below. The list is taken from the nice readHadoop the definitive guide / Tom White.

Avro
A serialization system for efficient, cross-language RPC and persistent data storage.
1. Hadoop Distributed File System (HDFS)
A distributed filesystem that runs on large clusters of commodity machines.

2. Hive
A distributed data warehouse. Hive manages data stored in HDFS and provides a query language based on SQL (and which is translated by the runtime engine to MapReduce jobs) for querying the data.

3. HBase
A distributed, column-oriented database. HBase uses HDFS for its underlying storage, and supports both batch-style computations using MapReduce and point queries (random reads).

4. MapReduce
A distributed data processing model and execution environment that runs on large clusters of commodity machines.

5. Oozie
A service for managing workflows of Hadoop jobs

6. Pig
A data flow language and execution environment for exploring very large datasets. Pig runs on HDFS and MapReduce clusters.

7. Sqoop
A tool for efficient bulk transfer of data between structured data stores (such as relational databases) and HDFS.

8. ZooKeeper
A distributed, highly available coordination service. ZooKeeper provides primitives such as distributed locks that can be used for building distributed applications.

Thursday, March 14, 2013

National Renewable Energy Laboratory (NREL) is the first peta-scale HPC to use warm-water liquid cooling, earning it the world’s number one rating in power-usage effectiveness of PUE=1.06!
......
......
.......The direct-component liquid cooling system supplies servers with warm water (75 degrees Fahrenheit) that is piped over processors to remove excess heat, returning water heated to approximately 100 degrees F. Because only 25 degrees of heat must be removed by the water-cooling system, the energy efficient set-up eliminates the power-hungry compressors needed for traditional air-cooling systems.

Excess heat generated by the new HPC will function as the primary room-heating technology for its Golden, Colo., data center, and will heat walkways outside buildings to melt snow and ice. This holistic approach, defined by the NREL’s Energy Systems Integration Facility (ESIF), will save as much as $1 million per year on power needed to run a conventional air-cooled data center.

Tuesday, March 12, 2013

Intel’s Xeon Phi
coprocessor outperforms Nvidia’s Tesla graphic-processing unit (GPU) on
the operations used by “solver” applications in science and
engineering, according to independent tests at Ohio State University.

When comparing Intel’s Xeon Phi to Nvidia’s Tesla, most reviewers dwell on how much easier it is to rewrite parallel programs for the Intel coprocessor, since it runs the same x86 instruction set as a 64-bit Pentium.

Nvidia’s “Cuda” cores on its Tesla coprocessor, on the other hand, do
not even try to emulate the x86 instruction set, opting instead for
more economical instructions that allow it to cram many more cores on a
chip.

As a result, Nvidia’s Tesla has 40-times more cores (2,496) than Intel’s Xeon Phi (60). The question then becomes: “is it worth it” to rewrite x86 parallel software for Nvidia’s Cuda, in order to gain access to the thousands of more cores available with Tesla over Xeon Phi?
......
......

Friday, March 8, 2013

HPL is a software package that solves a (random) dense linear system
in double precision (64 bits) arithmetic on distributed-memory
computers. It can thus be regarded as a portable as well as freely
available implementation of the High Performance Computing Linpack
Benchmark.
The algorithm used by HPL can be summarized by the
following keywords: Two-dimensional block-cyclic data distribution –
Right-looking variant of the LU factorization with row partial pivoting
featuring multiple look-ahead depths – Recursive panel factorization
with pivot search and column broadcast combined – Various virtual panel
broadcast topologies – bandwidth reducing swap-broadcast algorithm –
backward substitution with look-ahead of depth 1.
1. Requirements:

Saturday, March 2, 2013

If you have an requirement to execute jobs in sequence after the 1st job has completed, only then the 2nd job can launch, you can use the -W command
For example, if you have a running job with a job ID 12345, and you run the next job to run only after job 12345 run.