The IBM Advance Toolchain for Linux on Power is a set of open source development tools (compiler, debugger and profiling tools) and runtime libraries that allow users to take leading edge advantage of IBM's latest POWER hardware features on Linux.
For more information about it, visit http://ibm.co/AdvanceToolchain.

The IBM Advance Toolchain for Linux on Power is a set of open source development tools (compiler, debugger and profiling tools) and runtime libraries that allow users to take leading edge advantage of IBM's latest POWER hardware features on Linux.
For more information about it, visit http://ibm.co/AdvanceToolchain.

The IBM Advance Toolchain for Linux on Power is a set of open source development tools (compiler, debugger and profiling tools) and runtime libraries that allow users to take leading edge advantage of IBM's latest POWER hardware features on Linux.
For more information about it, visit http://ibm.co/AdvanceToolchain.

The IBM Advance Toolchain for Linux on Power is a set of open source development tools (compiler, debugger and profiling tools) and runtime libraries that allow users to take leading edge advantage of IBM's latest POWER hardware features on Linux.
For more information about it, visit http://ibm.co/AdvanceToolchain.

The IBM Advance Toolchain for Linux on Power is a set of open source development tools (compiler, debugger and profiling tools) and runtime libraries that allow users to take leading edge advantage of IBM's latest POWER hardware features on Linux.
For more information about it, visit http://ibm.co/AdvanceToolchain.

The IBM Advance Toolchain for Linux on Power is a set of open source development tools (compiler, debugger and profiling tools) and runtime libraries that allow users to take leading edge advantage of IBM's latest POWER hardware features on Linux.
For more information about it, visit http://ibm.co/AdvanceToolchain.

The IBM Advance Toolchain for Linux on Power is a set of open source development tools (compiler, debugger and profiling tools) and runtime libraries that allow users to take leading edge advantage of IBM's latest POWER hardware features on Linux.
For more information about it, visit http://ibm.co/AdvanceToolchain.

Big News from SuperComputing 2016 this morning! I am excited to share with you that IBM and NVIDIA teamed up to announce the release of PowerAI, the world's fastest deep learning enterprise solution. PowerAI is an optimized, integrated offering including everything to get started with cognitive applications using Deep Learning, including optimized CPU and GPU high performance libraries and prebuilt optimized versions of the most popular Deep Learning frameworks for GPU accelerated Power servers. The PowerAI release also includes a binary version of IBM Caffe, the popular Caffe deep learning framework enhanced with new algorithms designed by IBM’s Research labs to deliver the most scalable deep learning in the industry. In addition, this offering includes support for scripting languages such as Lua JIT and Python notebooks for ease of use and productive data scientist use.

​Caffe, a dedicated artificial neural network (ANN) training environment developed by the Berkeley Vision and Learning Center at the University of California at Berkeley is now available in three versions: the leading edge Caffe development version from UCB’s BVLC with performance enhancements developed at IBM's research labs, an unmodified BVLC Caffe, and Nvidia's NVCaffe.

Torch, a framework consisting of several ANN modules built on an extensible mathematics library

Theano, another framework consisting of several ANN modules built on an extensible mathematics library

TensorFlowwill be released in the near future reflecting its growing popularity in the Deep Learning Community.

In addition to these binary releases, we are enabling Machine Learning and Deep Learning in the Cloud. We are partnering with NIMBIX to make PowerAI available as a service in the cloud. The Power MLDL solution on NIMBIX allows users to get started with PowerAI with preconfigured ready-to-use Deep Learning engines immediately and while you install servers in your data center. We are also expanding the SuperVessel container cloud for academic users.

We continue to expand the open source ecosystem for Linux on Power for Machine Learning and Deep Learning. We have published code updates reflecting the Power enablement to community repositories and made additional enhancements available on github. We have also published build instructions for the tuned open source Deep Learning frameworks included in PowerAI for those looking to build and further enhance these frameworks on Power.

In addition to the prebuilt optimized releases included in the PowerAI toolkit, we have ported and published NVIDIA docker containers for Power to simplify deploying GPU-accelerated application stacks on Power, additional Deep Learning frameworks (such as Chainer), and tools such as GUIs and scripting languages for Power which can be easily installed from source.

Get started with PowerAI to develop cognitive applications on Power today and share how you are unleashing the power of deep learning to transform the future of computing in the comments section.

The Machine Learning and Deep Learning project in IBM Systems is a broad effort to build a co-optimized stack of hardware and software to make IBM Power Systems the best platform to develop and deploy cognitive applications.

This blog provides instructions on building the latest version of Nvidia’s DIGITS deep learning graphical user interface (DIGITS5) on (little-endian) OpenPOWER Linux, such as Red Hat Enterprise Linux 7.1, SUSE Linux Enterprise Server 12, Ubuntu 14.04, and subsequent releases. Torch may be built with support for CUDA7.5 or CUDA8 to exploit Nvidia numerical accelerators, such as the Pascal P100 numerical accelerators in the S822LC for HPC systems with four P100 accelerators, or with PCIe-attached accelerators in conjunction with POWER8™ systems.These instructions are based on the instructions found in the DIGITS distribution at https://github.com/NVIDIA/DIGITS/blob/master/docs/BuildDigits.md.

Prerequisites for building DIGITS5

To build DIGITS 5, you will need at a minimum the following Linux packages:

git

graphviz

libfreetype6-dev

libpng12-dev

python-dev

python-flask

python-flaskext.wtf

python-gevent

python-h5py

python-numpy

python-pil

python-pip

python-pkgconfig

python-protobuf

python-scipy

On Ubuntu Linux, these packages may be installed with the following command:

Since DIGITS is only a graphical user interface, you need to install one or more deep learning frameworks. DIGITS5 required Caffe to be installed and can also work with Torch. To install Caffe and Torch from a binary distribution of Deep Learning frameworks on Power, or install Caffe and Torch from source as I described in previous blogs.

To install Caffe and Torch from the binary distribution, follow the instructions supplied with the binary distribution.

Building and installing DIGITS

Several Python packages need to be installed. Fortunately, the DIGITS distro contains a file "requirements.txt" that specifies all required python packages:

sudo pip install -r $DIGITS_ROOT/requirements.txt

To enable loading data and visualization plug-ins, DIGITS may optionally be installed as follows:

$ sudo pip install -e $DIGITS_ROOT

Starting and Using the DIGITS server

To start a server and the following commands:

$ cd $DIGITS_ROOT
$ ./digits-devserver

This will start a DIGITS GUI web service on the your Linux server host. Now use a web browser to start using the DIGITS GUI at http://yourhost:5000/ (insert the host name of your DIGITS server for “yourhost”) and check out the Getting Started Guide for DIGITS.

Dr. Michael Gschwind is Chief Engineer for Machine Learning and Deep Learning for IBM Systems where he leads the development of hardware/software integrated products for cognitive computing. During his career, Dr. Gschwind has been a technical leader for IBM’s key transformational initiatives, leading the development of the OpenPOWER Hardware Architecture as well as the software interfaces of the OpenPOWER Software Ecosystem. In previous assignments, he was a chief architect for Blue Gene, POWER8, POWER7, and Cell BE. Dr. Gschwind is a Fellow of the IEEE, an IBM Master Inventor and a Member of the IBM Academy of Technology.

The new IBM SDK for Linux on Power 1.10 provides integration with the IBM POWER Functional Simulator, a POWER8 and POWER9 simulator that can be installed in any x86_64/amd64

system. The simulator instantiates a Power virtual machine to which the x86_64/amd64 version of SDK can connect. Once connected, you can compile and run ppc64le programs, all in the x86_64/amd64 machine.

In this version of the SDK you can either run the simulator from within the SDK or run stand alone.

Running the POWER Functional Simulator From the Command Line

Go to the sdk-systemsim and execute setupsimulator -i. This will download and install all necessary packages.

Once the aforementioned steps are completed. Go to ~/systemsim_execution and execute startsimulator -[p8 or p9]. You will be prompted to enter your user password so the network rules necessary to allow the communication with the simulator can be set. In addiction an xterm will be started and you can follow the boot process. The default user for the simulator is root and its password is mambo. The default IP to access the simulator via SSH is 172.19.0.109.

Running the POWER Functional Simulator From within the SDK

Install the SDK on your x86_64 machine. Ensure that you are in one of the supported Linux distributions: Ubuntu 14.04 or 16.04, RHEL 7.2 and later, SLES 12 SP1, CentOS7 or Fedora 22. The instructions to install the SDK are available here.

Once the SDK is installed, open it and find in the main toolbar the Power Simulator option. The select Start.

If all necessary packages aren't installed it will download the installation scripts and will ask you to run setupsimulator -i, like the step 2 of the above section. You can use the console that will be open in the SDK to complete this step.

Assuming you have everything installed, you can now run the simulator. In the SDK's UI select Power Simulator > Start. If this is the first time your are performing this action, you will be prompted to enter you system user password. This is necessary to setup the necessary network configurations to make the communication with the simulator work. The password is securely stored.

Once the steps above are successfully executed, the simulator will start booting. You can follow the progress in the SDK console. When the boot is completed a connection to the simulator with be automatically created and open. You can now use your simulator.

The IBM SDK for Linux on Power (SDK) 1.10 provides a set of tools to manage Docker containers. Docker wraps a piece of software in a complete filesystem that contains everything needed to run: code, runtime, system tools, system libraries.In the SDK you can manage images and containers running in a remote server.

About this task

Configure and use the Docker Tooling plug-ins from the IBM SDK for Linux on Power running on your x86_64 machine to manage docker images and containers in the server-side (ppc64le).

Installing and Configuring Docker

Before starting with the SDK plug-in, it is necessary to install the docker packages.

On your laptop it requires docker and docker-machine command line tools.

You can use the "Test Connection" button to check if the docker connection to the remote machine is working correctly.

Press "Finish" button.

NOTE: it is possible to expand the connection you just created and it get a list of all the available images and containers available int the remote machine.

Figure 1: Docker connection using TCP

3. Manage Images

You can manage the Docker images from "Docker Images" view. At the right corner of the view there are some buttons that allows to pull, push, build and delete images.

Figure 2: Docker images view

3.1 Pull an Image

Pulling a Docker image consists of requesting a repo tag or repository specification from a registry. The pull Wizard is used to specify the repository or repo:tag specification and the registry account to use.

To pull an image, click on the Pull icon in "Docker Images" view.

Click in "Search" button and type the image name at search field.

Select an image from the list and click "Finish".

The pulling job will start. You can see the progress at the right bottom corner and depending on the image size and connection speed, it can take a while.

Figure 3: Pulling a docker image

3.2 Push an Image

Pushing a Docker image consists of specifying an existing repo tag to push. By default, Images will be pushed to the default Docker registry, but if a tag contains a registry specifier in addition to repo:tag, it will be pushed to the specified registry.

To push an image, click on the Push icon in "Docker Images" view.

Select the properly "Registry Account" and click "Finish".

The pushing job will start. You can see the progress at the right bottom corner and depending on the image size and connection speed, it can take a while.

Figure 4: Pushing a docker image

3.3 Build an Image

Building an image takes an existing image and modifies it to create a new image. The specification of the new Docker image is done via a special file which is always named: Dockerfile.

To build a new image, click on the Build Image icon in "Docker Images" view.

Add the new image name. This name must follow correct repo:tag format.

Add the directory that contains or will contain the Dockerfile. Once a valid existing directory is specified, the "Edit" button will be enabled, allowing creation and modification of the Dockerfile using a basic editor dialog.

When the Dockerfile is considered complete, hitting the "Finish" button will start the image build action. When the build is complete, the "Docker Images" view will be refreshed automatically.

Figure 5: Building an image from a Dockerfile

4. Manage Containers

You can manage the Docker containers from the "Docker Containers" view. At the right corner of this view there are some buttons that allows you to start, stop, pause, resume, kill and delete containers.

Figure 6: Docker containers view

4.1 Create a Container

It allows to create a container based on an existing image.

To create a new container, move to "Docker Images" view.

Select an image from the list, right-click on it and select "run".

A wizard is showed and in the first page it allows a number of common settings like:

Name - Must be filled in as this is the name of the new image;

Entry point - Allows configuring the Container to run as an executable;

Command - The command to run in the Container when it starts.

You can click "Next" if you want to add extra configuration or just "Finish" to run the container.

Figure 7: Creating a container

4.2 Commit a Container

It allows to create a new image based on an existing container.

To commit a container, right-click on a container at "Docker Containers" view and select "Commit" option.

In the commit dialog enter the fields:

Name - name of the new image being created and must be a valid repo tag name.

Author - name to add as the author of the image (optional)

Comment - comment to add for the image (optional)

Press "Finish" and a new docker image will be created based on the selected container.

The IBM Advance Toolchain for Linux on Power is a set of open source development tools (compiler, debugger and profiling tools) and runtime libraries that allow users to take leading edge advantage of IBM's latest POWER hardware features on Linux.
For more information about it, visit http://ibm.co/AdvanceToolchain.

The Machine Learning and Deep Learning project in IBM Systems is a broad effort to build a co-optimized stack of hardware and software to make IBM Power Systems the best platform to develop and deploy cognitive applications. As part of this project, IBM has developed new processors, systems, and a co-optimized software stack uniquely optimized for AI applications.

In addition to creating the binary distribution of DL frameworks, we have also been working with the Open Source community to enable the open source frameworks and libraries to be built directly from the repositories to enable Deep Learning users to harness the power of the OpenPOWER ecosystem. With the introduction of little-endian OpenPOWER Linux, installation of open source applications on Power has never been easier.

If you need to build optimized libraries from source, this blog provides instructions on building and installing Optimized Libraries for Deep Learning on (little-endian) OpenPOWER Linux, such as Red Hat Enterprise Linux 7.1, SUSE Linux Enterprise Server 12, Ubuntu 14.04, and subsequent releases. These instructions are primarily focused on providing improves numeric libraries of importance for Deep Learning frameworks, such as libraries implementing the BLAS basic linear algebra library interfaces, and in particular ATLAS and OpenBLAS, and an accelerated Power math library providing optimized scalar and vectorized implementations of common mathematics functions.

While mathematics libraries (such as the system library libm) and BLS libraries, e.g., based on ATLAS or OpenBLAS, are available with many Linux operating system distributions, these distributions often lack many of the newest and best code improvements which are particularly important for high-performance computing applications such as Deep Learning. Thus, you can significantly improve Deep Learning performance by installing the advanced and highly tuned libraries as described here.

Installing the Mathematical Acceleration Subsystem (MASS) for Linux

To accelerate base mathematics functions by exploiting the advanced capabilities for the Power vector-scalar instruction set, IBM has made the MASS vector library freely available. MASS implements the common libmvec interfaces used by the GNU Compiler Collection and can be accessed from GCC compilers using the -mveclibabi=mass. You can find out more about MASS at the MASS for Linux Home Page.

For example, on Ubuntu 16.04 (also known under the distribution name “xenial”) use the following commands to configure the MASS repository and install MASS 8.1.4:

To compile applications to use the MASS libraries, invoke the GNU C Compilers with -mveclibabi=mass option. To link, specify the link time options -L/opt/ibm/xlmass/8.1.4/lib -lmass -lmassvp8 -lmass_simdp8 Thus a program may be compiled and linked to use MASS as follows:

Many of the Deep Learning packages already include code to use MASS libraries to improve performance on Power. For example, when building Caffe, after installing MASS on your system, you can enable MASS with the by setting the USE_MASS flag in the build configuration file Makefile.config around line 12:

Also, as an example, when building Caffe, use the Makefile.config configuration file to specify the location of the MASS libraries on your system around line 72:

# MASS lib directories

# MASS_LIB := /opt/ibm/xlmass/8.1.4/lib

Installing OpenBLAS

OpenBLAS is a high-performance open source implementation of the BLAS basic linear algebra library interfaces. You can find more information about OpenBLAS at the OpenBLAS Project Homepage. Recent versions of OpenBLAS contain significant enhancements including support for the POWER vector-scalar instruction setwhich was designed to accelerate numerically intensive algorithms such as libraries implementing the BLAS basic linear algebra library interfaces.. To get the best performance for these libraries, download and build the latest release of OpenBLAS. Starting with release 0.2.19, the OpenBLAS master repository includes these enhancements.

To install the latest version of OpenBLAS, download the OpenBLAS source code as follows:

$ git clone git://github.com/xianyi/OpenBLAS.git

$ cd OpenBLAS

You can then build OpenBLAS for POWER8 with the command:

$ make TARGET=POWER8

Building OpenBLAS with MASS Support

The IBM MASS library consists of a set of mathematical functions for C, C++, and Fortran-language applications that are tuned for optimum performance on POWER architectures. Start by installing MASS on your system as described in the section on installing and using MASS.

Depending on the version of MASS installed on your system, you may have to update the MASSPATH variable used by the build process around line 46 of Makefile.power:MASSPATH = /opt/ibm/xlmass/8.1.3/lib

Then, permanently enable MASS by setting the variable USE_MASS to 1, either by editing Makefile.power, or by specifying its value on the command line:

$ make USE_MASS=1 TARGET=POWER8

Building ATLAS on OpenPOWER

The ATLAS (Automatically Tuned Linear Algebra Software) project is an ongoing research effort focusing on applying empirical techniques in order to provide portable performance. To achieve this, ATLAS includes a self-tuning framework to optimize its high-performance open source implementation of the BLAS basic linear algebra library interfaces for the system it is being installed. You can find more information about OpenBLAS at the ATLAS Project Homepage.

Recent versions of ATLAS contain significant enhancements including support for the POWER vector-scalar instruction setwhich was designed to accelerate numerically intensive algorithms such as libraries implementing the BLAS basic linear algebra library interfaces.. The recently releasedATLAS 3.10.3 in the ATLAS source repository is the most recent distribution version, and adds many improvements for little-ended OpenPOWER Linux systems. The ATLAS Developer branch includes the most recent enhancements for ATLAS, and includes support for the POWER vector-scalar instruction setwhich was designed to accelerate numerically intensive algorithms such as libraries implementing the BLAS basic linear algebra library interfaces.. To get the best performance for these libraries, download and build the latest release of OpenBLAS. Starting with release 3.11.16, or later, to include optimized support for the POWER vector-scalar instruction set.

To get the best performance for ATLAS libraries, download and build the latest release of ATLAS on the OpenPOWER Linux system you will be using ATLAS, and the installation framework will optimize the ATLAS library for this particular configuration, including CPU generation, and cache and memory sizes and latencies. In particular, the ATLAS project lead reports significant speedup starting with ATLAS 3.10.16, e.g., for general matrix-matrix multiply which are ciritically important for Deep Learning performance:

I have just released 3.11.36. The only performance improvement over
3.11.35 is for power, where my single precision performance went from
around 66% to 86%. More specifically. 3.11.36 for serial gemm of
N=6000, my power8 gets (% of peak):
dgemm : 87%
zgemm : 88%
sgemm : 86%
cgemm : 88%

The software link off of this page allows for downloading the tarfile. The explicit download link is https://sourceforge.net/project/showfiles.php?group_id=23725. Once you have obtained the tarfile, you untar it in the directory where you want to keep the ATLAS source directory. The tarfile will create a subdirectory called ATLAS, which you may want to rename to make less generic. For instance, assuming I have saved the tarfile to ~/dload, and want to put the source in !/numerics, you could create ATLAS's source directory (SRCdir) with the following commands:

Similarly, an IBM Power7 I have access to has 8 physical cores, but offers 64 SMT units. If you install with the default flags, your parallel speedup for moderate sized DGEMMs is around 4.75. On the other hand, if you add:

--force-tids="8 0 8 16 24 32 40 48 56"

Then the parallel DGEMM speedup for moderate sized problems is more like 6.5.

If you build POWER8 machine, with four physical cores, that are again shared 8-way, leading to the need to add to configure:

--force-tids="4 0 8 16 24"

When using the force-rids option, the first number specifies the number of physical cores (in the examples there are 8 physical POWER7 cores and 4 physical POWER8 cores), and the four following numbers are thread ids to use.

Building netlib-java for Spark on OpenPOWER

In addition to the native dynamic libraries described above, we have also ported netlib-java for Spark to OpenPOWER Linux systems. We have submitted those modifcations to the maintainer of fommil, the project of which netlib-java is a part. In the meantime, we have also created a PowerPC-enabled fork of netlib-java at https://github.com/ibmsoe/netlib-java.

Build instructions for the unmodified netlib-java project may be found at the following blog =>

See what you can do with Deep Learning on OpenPOWER

I am inviting you to explore Deep Learning on OpenPOWER systems, and exploit the full potential of an open ecosystem built on collaborative innovation, as we continue to co-optimize and expand the hardware and software stack for Deep Learning on Power.

I look forward to hearing about the performance you get from Deep Learning on OpenPOWER. Share how you want to use Deep Learning on OpenPOWER and how Deep Learning on OpenPOWER will enable you to build the next generation of cognitive applications by posting in the comments section below.

Dr. Michael Gschwind is Chief Engineer for Machine Learning and Deep Learning for IBM Systems where he leads the development of hardware/software integrated products for cognitive computing. During his career, Dr. Gschwind has been a technical leader for IBM’s key transformational initiatives, leading the development of the OpenPOWER Hardware Architecture as well as the software interfaces of the OpenPOWER Software Ecosystem. In previous assignments, he was a chief architect for Blue Gene, POWER8, POWER7, and Cell BE. Dr. Gschwind is a Fellow of the IEEE, an IBM Master Inventor and a Member of the IBM Academy of Technology.

I am pleased to announce a major update to the deep learning frameworks available for OpenPOWER as software “distros” (distributions) that are as easily installable as ever using the Ubuntu system installer.

Significant updates to Key Deep Learning Frameworks on OpenPOWER

Building on the great response to our first release of the Deep Learning Frameworks, we have made significant updates by refreshing all the available frameworks now available on OpenPOWER as pre-built binaries optimized for GPU acceleration:

Caffe, a dedicated artificial neural network (ANN) training environment developed by the Berkeley Vision and Learning Center at the University of California at Berkeley is now available in two versions: the leading edge Caffe development version from UCB’s BVLC, and a Caffe version tuned Nvidia to offer even more scalability using GPUs.

Torch, a framework consisting of several ANN modules built on an extensible mathematics library

Theano, another framework consisting of several ANN modules built on an extensible mathematics library

The updated Deep Learning software distribution also includes DIGITS, a graphical user interface to make users immediately productive at using the Caffe and Torch deep learning frameworks.

As always, we’ve ensured that these environments may be built from the source repository for those who prefer to compile their own binaries.

New Distribution, New Levels of Performance

The new distribution includes major performance enhancements in all key areas:

The Mathematical Acceleration Subsystem (MASS) for Linux high-performance mathematical libraries are made available in freely distributable form and free of charge to accelerate cognitive and other Linux applications by exploiting the latest advances in mathematical algorithm optimization and advanced POWER processor features and in particular the POWER vector-scalar instruction set

cuDNN v5.1 enables Linux on Power cognitive applications to take full advantage of the latest GPU processing features and the newest GPU accelerators

To get started on an evaluation of the latest IBM Power Systems S822LC for High Performance Computing server, please contact me at mkg@us.ibm.com. You can learn more about and order these systems by contacting your IBM Business Partner.

IBM invites GPU software developers to join the IBM-NVIDIA Acceleration Lab to be among the first to try these systems and see the benefits of the Tesla P100 GPU accelerator and the high-speed NVLink connection to the IBM POWER8 CPU.

I look forward to hearing about the performance you get from these systems. Share how you want to use accelerated Deep Learning on OpenPOWER and how Deep Learning on OpenPOWER will enable you to build the next generation of cognitive applications by posting in the comments section below.

Dr. Michael Gschwind is Chief Engineer for Machine Learning and Deep Learning for IBM Systems where he leads the development of hardware/software integrated products for cognitive computing. During his career, Dr. Gschwind has been a technical leader for IBM’s key transformational initiatives, leading the development of the OpenPOWER Hardware Architecture as well as the software interfaces of the OpenPOWER Software Ecosystem. In previous assignments, he was a chief architect for Blue Gene, POWER8, POWER7, and Cell BE. Dr. Gschwind is a Fellow of the IEEE, an IBM Master Inventor and a Member of the IBM Academy of Technology.

The IBM Advance Toolchain for Linux on Power is a set of open source development tools (compiler, debugger and profiling tools) and runtime libraries that allow users to take leading edge advantage of IBM's latest POWER hardware features on Linux.
For more information about it, visit http://ibm.co/AdvanceToolchain.

The IBM Advance Toolchain for Linux on Power is a set of open source development tools (compiler, debugger and profiling tools) and runtime libraries that allow users to take leading edge advantage of IBM's latest POWER hardware features on Linux.
For more information about it, visit http://ibm.co/AdvanceToolchain.

Two years ago this month, I published my first blog aboutthe Linux on Power strategic shift from big endian to little endian, titled Just the FAQs about Little Endian, and I still get questions about it today. So, I figured it was time to make updates.

The questions listed below are either updated or new. If you do not see your particular question, please read the original document before reaching out.

Beginning with the 14.04 distribution, Canonical’s Ubuntu Server supports Power in little endian mode only and future release plans show this support continuing. No plans exist to provide an equivalent big endian version optimized for IBM Power Systems.

SUSE's Linux Enterprise Server (SLES) offers SLES 12 on Power only in little endian mode only. As such, customers will have to migrate from big to little endian as they upgrade from SLES 11 to SLES 12.

Since the release of Red Hat Enterprise Linux (RHEL) 7.1, Red Hat provides both little endian and big endian versions of Linux on IBM Power Systems. At this time in 2016, these products are separately licensed and non-transferable; customers should pay close attention when ordering RHEL for Power to request the desired endianness. While Red Hat's plans around releasing RHEL as a little endian only distribution remain undisclosed, customers should view RHEL 7 as the opportunity to migrate from big to little endian versions just in case the next major release ships only as a little endian product.

At this time, all community distributions of Debian, Fedora, and openSUSE offer big and little endian versions.

How long will Linux distributions continue to support big endian on Power?

It remains IBM's understanding that Red Hat and SUSE will continue to support their existing big endian releases on Power for their full product life cycles. However, customers running big endian distributions would be wise to begin planning their transitions to little endian distributions as their applications become available and time permits.

How does POWER systems support the running of mixed environments of big and little endian operating systems?

The POWER8 processor supports mixing of big and little endian memory accesses at the core level, through the use of special purpose register (SPR) settings. While this could technically support running of both big and little endian software threads, the complexity of implementing such a design point would be high. Therefore, IBM has elected to enable operating system versions as completely big endian or little endian by design.

The virtualization capabilities of the POWER platform allow for mixed environments of operating system levels and types. This same isolation mechanism applies to big and little endian operating systems with Linux and other Power operating systems such as AIX and IBM I.

Does POWER systems support running mixed environments of big and little endian operating systems in both PowerVM and PowerKVM?

As of the PowerKVM release 2.1.1 (shipped in October 2014), KVM has supported a mix of big and little endian guests running simultaneously. Further, little endian support was added to PowerVM in the spring of 2015, allowing the system to run in mixed modes. So yes, all current PowerKVM and PowerVM releases support mixing of big endian and little endian operating systems.

Can I run big endian applications on a little endian operating system or vice versa?

No, the operating system enablement only supports applications of the same type. As such, a little endian operating system (ppc64le or ppc64el) can only run little endian applications built for this software platform. Likewise, big endian operating systems (ppc64) only support software built for big endian.

What if I want to run a mix of big endian and little endian applications on the same Power System?

Virtualization enables mixing of big and little endian application environments on the same server. Applications of a particular operating system and endian mode must be run in a separate virtual machine (VM) or logical partition (LPAR).

See the above question aboutmixed big and little endian operating system environments for more explanation.

Where can little endian distributions run on Power?

Little endian distributions can run virtualized in a VM (PowerKVM or any KVM on Power distribution from the vendors) on a POWER8 S8xxL or S8xxLC model system; in an LPAR (PowerVM) on a POWER8 S8xxL, S8xx, or E8xx model system; or on bare metal (directly on the “BIOS-like” firmware that enables KVM) on S8xxL or S8xxLC models.

What about Linux applications that have already been optimized for big endian on Power?

IBM remains committed to transitioning the Linux on Power application ecosystem from big endian to little endian in an expeditious manner. Most IBM products have completed the transition and new products have started as little endian only.

Additionally, IBM continues to work both with open source communities and third party software providers to grow the Linux on Power ecosystem. While the operating system support decisions lies with the application vendor, IBM strongly encourages new providers to start as little endian so as to eliminate any transition planning and to simplify the application development process.

I am pleased to announce that several major deep learning frameworks are now available on the Power platform, as "distros" (distributions) that are easily installable using the Ubuntu system installer.

Deep learning, or the use of multi-layer neural networks, has revolutionized speech recognition, natural language processing, and computer vision, and continues to to revolutionize IT due to availability of rich data sets, new methods for accelerating neural network training and extremely fast hardware with GPU accelerators.

Deep Learning can be used from safety systems to personal assistants to enterprise systems. Increasingly, driver assist technologies rely on machine and deep learning patterns to recognize objects in a rapidly changing environment, personal digital assistant technology is learning to categorize and group e-mail text message, and other content based on their context. In the enterprise, machine and deep learning applications can identify high value sales opportunities, enable smart call center automation, detect and react to intrusion or fraud, and suggest solutions to technical or business problems.

The frameworks that are available on POWER as pre-built binaries optimized for GPU acceleration include:

Caffe, a dedicated ANN training environment developed by the Berkeley Vision and Learning Center at the University of California at Berkeley

Torch, a framework consisting of several ANN modules built on an extensible mathematics library

Theano, another framework consisting of several ANN modules built on an extensible mathematics library

In addition to prebuilt and optimized binaries for Power with acceleration we have worked to ensure that these environments may be built from the source repository by those preferring to compile their own binaries. Finally, we have enabled the DL4J (Deep Learning 4 Java), TensorFlow and CNTK frameworks and are working with the developers to ensure Power support for these environments “out of the box”

The POWER platform is ideal for deep learning, big data, and machine learning due to its high performance, large caches, 2x-3x higher memory bandwidth, very high I/O bandwidth, and of course, tight integration with GPU accelerators. The parallel multi-threaded Power architecture with high memory and I/O bandwidth is particularly well adapted to ensure that GPUs are used to their fullest potential.

Today, these software packages are available on our Power Linux 822LC server, that features two POWER8 CPUs, along with two NVIDIA Tesla K80s. We are currently working on optimizing the deep learning software to take advantage of the upcoming POWER8 servers connected with the high-speed NVLink interface to NVIDIA Tesla P100 (Pascal) GPU accelerators. This brings a huge advantage to cognitive computing applications like deep learning, by giving applications running on the GPU, fast access to large system memory via the NVLink interface to the CPU. Coupled with the higher performance POWER8 CPUs, the overall workflow for applications like voice recognition, natural language processing, and computer vision that employ deep learning benefits from a massive performance leap thanks to data-centric system design and optimization.

Radar charts allow a good way to compare elements of a dimension in a function of several metrics. This make them useful for seeing which variable have similar values or if there are any outlier amongst each variable. Radar charts are also useful for seeing which variables are scoring high or low within a dataset, making them ideal for displaying performance.

Radar chart is also called Spider, Star or Web chart. It is a two-dimensional chart type which is used to show, in a graphical way, series of values over many variables. Radar chart uses an axis for each variable that are arranged radially around a central point, with equal distances between each other. Lines that connected adjacent variables from axis to axis create a polygonal shape that is filled with a color.

In this post we show how Radar chart works on Cycles per Instruction Breakdown (CPI breakdown) plug-in which is part of IBM Software Development Kit for Linux on Power (SDK). First we introduce CPI Breakdown plug-in and then we explain CPI radar chart.

2. CPI Breakdown Plug-in

CPI (cycles per instruction) analysis was designed aiming to improve application performance, it refers to how many processor cycles are needed to complete an instruction. An instruction can be a read/write from memory operation, an arithmetic calculation, or bit-wise operation. The more cycles the processor takes to complete an instruction, the poorer the performance of the application in the processor.

In the CPI breakdown model, a set of processor events is broken down into components. Processor performance counters calculate metrics for the event components. This approach provides a complete view of how the application behaves concerning processor performance.

The CPI breakdown plug-in shows five tabs called: CPI Breakdown Model, CPI Radar Chart, Metrics View, Events View and Drilldown View. It collects several required events, and then calculates metrics for the CPI breakdown model. For more information about events and metrics please see User guide.

3. CPI Radar Chart

CPI breakdown plug-in also creates a cpi file for each execution (profiling) which can be used to recovery info about events and metric every time we want. The cpi file is basically a xml file that contains every data about events and metrics from CPI Breakdown model.

In order to compare cpi files content, we introduce CPI Radar Chart which at first use only events to compare. It shows events from Breakdown 2 when we use Power 8 machine and Events from Breakdown 3 in Power 7 case. The radar graphics shows hot spots events and put them near to most external circle, as shown in Figure 1.

Figure 1: CPI Radar Chart with a single data series.

During the creation of the CPI Radar Chart it is first executed the sorting of the events and then a name verification is executed in order to perform a comparison as shown in the Figure 2.

Figure 2: CPI Radar Charts comparison

4. Conclusion

Finally, because simplicity of shapes used by Radar Chart it is very helpful to evaluate and visualize data from CPI Breakdown model. CPI Radar Chart allow us comparison between CPI profilings in a simple and friendly way.

The IBM Software Development Kit for Linux on Power (SDK) is a free, Eclipse-based Integrated Development Environment (IDE). It integrates C/C++ source development with the IBM Advance Toolchain, Migration Advisor, CPI Breakdown, Build Advisor, Post-Link Optimization and classic Linux performance analysis tools, such as Oprofile, Perf and Valgrind.

IBM SDK supports three different architectures: ppc64, ppc64le and x86_64. You can develop your C/C++ application using one of the following scenarios:

Locally, using a Power System and connection using VNC, for example.
Remotely, using a x86_64 machine as a client to connect to a Power System.
Cross-compiled, using a x86_64 machine.

The IBM SDK for Linux on Power 1.9 support the following distributions for both Power and x86_64:

2 - Remote development using a x86_64 machine as a client and a Power System as a server

To develop remotely in a Power System you need to install the IBM SDK in your x86_64 machine (step 1) and also install the IBM SDK Remote Dependencies in the Power System (step 2). The IBM SDK Remote Dependencies ensures that all required packages for remote development are installed in the Power System.

Note: In order to use Advance Toolchain cross packages (either ppc64 or ppc64le), you need to install the package Advance Toolchain cross common package. This package provides common components for both versions.

The IBM Software Development Kit for Linux on Power (SDK) provides two related tools, Source Code Advisor (SCA) and the Feedback Directed Program Restructuring tool (FDPR), that implement feedback-directed, post-link program analysis and optimization technology. SCA finds and visualizes performance problems in the application source code, using information produced by using FDPR.

1. Feedback Directed Program Restructuring

FDPR can analyze, report on and optimize the executable image of a program or shared library based on a typical execution profile. It works similarly to a compiler: it reads a linked executable program and produces an optimized version of it.

FDPR optimization is performed in three distinct steps:

Instrumentation: FDPR analyzes the input program or shared library and creates an instrumented version.

Profiling: The instrumented code is run with some representative input. During this run, profiling data is collected, including various counts, such as how many times each branch was executed.

Optimization: FDPR processes the input program together with the profile. It performs various optimizations based on this profile, such as code restructuring, making the program run more efficiently.

The SDK provides a plug-in (also called FDPR) that allows you to run FDPR through the SDK user interface.

2. Source Code Advisor

During the code optimization process, FDPR can produce a journal of the optimization performed. The Source Code Advisor plug-in uses this journal, produced as an XML file, to highlight potential problems in the source code and to offer suggested solutions. The journal explains each optimization, including the source location, execution count, the performance problem found, and the user action required to resolve the problem.

In the SCA launcher configuration, one can specify the workload needed to collect the profile of the program. When running this configuration, the program is built, if necessary, using the standard project build
process. Once the executable is available, FDPR creates an instrumented version and runs it using the specified workload. FDPR then performs a pseudo optimization step producing a journal of the performance problems found. The result is an XML-formatted file that lists the specific problems found, their exact location in the source, and so on. With the XML journal available, the Source Code Advisor view is displayed to visualize the set of problems, allowing you to navigate through the problems and the corresponding places in the source where they were found. The view provides a recommended course of action for each problem at various abstraction levels (source change, compiler switches, and so on.)

This section shows the result of using both FDPR and SCA plugins in the PHP 7.0 project. The PHP source code can be downloaded from the php.net download site (http://php.net/downloads.php) for current stable releases or from github for the current development releases (https://github.com/php/php-src).

3.1 FDPR Results

After running the FDPR plug-in with the PHP binary, a new binary called "php.fdpr" is generated and we can explore if the new binary is more optimized than the original. To do so, we can first compare the size of the 2 binaries. The figure 1 shows that the original PHP binary has 66MB while the new one has 37MB, so FDPR reduced a lot the size of final binary.

Figure 2 shows the original PHP binary execution output while figure 3 shows the output from PHP binary generated by FDPR. Comparing both outputs we can see that the binary generated by FDPR had performance gains in all operations, particularly in the "test_math" and "test_stringmanupulation".

Figure 2: php binary output

Figure 3: php.fdpr binary output

3.2 SCA Results

After running the SCA plug-in with the PHP binary, a view called "Source Code Advisor" opens and displays the optimization problems found in the PHP project (Figure 4). The picture above show the SCA view with the possible optimization that can be done in the PHP code.

We can expand each problem type in the view and click in one of them to open it location in the source code. The view also shows a description about the problem and a possible solution to solve it and increase the program performance. For the "Unroll Loop" problem type, SCA provides a quick fix that can automatically change the code. To use this feature put the cursor in the yellow warning in the source code, use the "CTRL + 1" shortcut and press "Enter" key. Figure 5 shows the code after applying the quick fix.

The IBM Advance Toolchain for Linux on Power is a set of open source development tools (compiler, debugger and profiling tools) and runtime libraries that allow users to take leading edge advantage of IBM's latest POWER hardware features on Linux.
For more information about it, visit http://ibm.co/AdvanceToolchain.

The IBM Advance Toolchain for Linux on Power is a set of open source development tools (compiler, debugger and profiling tools) and runtime libraries that allow users to take leading edge advantage of IBM's latest POWER hardware features on Linux.
For more information about it, visit http://ibm.co/AdvanceToolchain.

The IBM Advance Toolchain for Linux on Power is a set of open source development tools (compiler, debugger and profiling tools) and runtime libraries that allow users to take leading edge advantage of IBM's latest POWER hardware features on Linux.
For more information about it, visit http://ibm.co/AdvanceToolchain.

The IBM Advance Toolchain for Linux on Power is a set of open source development tools (compiler, debugger and profiling tools) and runtime libraries that allow users to take leading edge advantage of IBM's latest POWER hardware features on Linux.
For more information about it, visit http://ibm.co/AdvanceToolchain.

The IBM Advance Toolchain for Linux on Power is a set of open source development tools (compiler, debugger and profiling tools) and runtime libraries that allow users to take leading edge advantage of IBM's latest POWER hardware features on Linux.
For more information about it, visit http://ibm.co/AdvanceToolchain.

The IBM Advance Toolchain for PowerLinux is a set of open source development tools (compiler, debugger and profiling tools) and runtime libraries that allow users to take leading edge advantage of IBM's latest POWER hardware features on Linux.
For more information about it, visit http://ibm.co/AdvanceToolchain.

For download links, more information and documentation, please refer to our official documentation page.
Please let us know if you have any questions about this release.

About the IBM Advance Toolchain for PowerLinux

The IBM Advance Toolchain for PowerLinux is a set of open source development tools (compiler, debugger and profiling tools) and runtime libraries that allow users to take leading edge advantage of IBM's latest POWER hardware features on Linux.
For more information about it, visit http://ibm.co/AdvanceToolchain.

The IBM Advance Toolchain for PowerLinux is a set of open source development tools (compiler, debugger and profiling tools) and runtime libraries that allow users to take leading edge advantage of IBM's latest POWER hardware features on Linux.
For more information about it, visit http://ibm.co/AdvanceToolchain.