Recent

Our recent publication “Algorithmic Canonical Stratifications of Simplicial Complexes” proposes a new algorithm for data analysis that offers a topology-aware path towards explainable artificial intelligence. Despite (or, perhaps, due to) being mathematically rigorous, the text of the original work is virtually impenetrable for readers not familiar with the concepts, tools, and notation of topology. In order to convey our ideas to a wider audience, we present this supplemental introduction. Here, we summarize and explain in plain English the motivation, reasoning, and methods of our new topological data analysis algorithm that we term “canonical stratification”. Canonical-Stratification-for-Non-Mathematicians.pdf (38 KB) Table of Contents 1. Motivation 2. More on Canonical Stratification 3. Conclusion 1. Motivation Machine learning has advanced significantly in recent years and has proven itself to be a powerful and versatile tool in a variety of data-driven disciplines. Machine learning algorithms are now being used to make decisions in numerous areas [...]

We record performance measurements on Micron 5210 SSD related to Machine Learning workflow. Even though Machine Learning is highly CPU intensive, fast storage can lower training time through faster file pre-processing and serialization, particularly when the size of a data set exceeds the amount of installed memory. A popular format for datasets is TFRecord, and in our performance measurements, we will be comparing the throughput speed and completion time of a TFRecord on a 7.68TB Micron 5210 ION SSD versus that of an 8TB Seagate 7200RPM HDD. Colfax-Machine-Learning-and-QLC-SSDs.pdf (151 KB) Table of Contents 1. QLC SSDs 2. Micron QLC SSD 3. Test System Configuration 4. Test Workload: TFRecord 5. Test Results 6. Summary 1. QLC SSDs For years, 7200 RPM hard disk drives (HDDs) have been the standard media on which Machine Learning (ML) training data sets have been stored. These traditional HDDs have been preferred due to their low cost and easy to adopt SATA interfaces. However, HDD’s suffer from relatively slow throughput . Solid State Drives (SSDs) have been too [...]

You have set up a deep learning model that you are planning to train on an Intel architecture processor. In order to be productive, you have to minimize the training time. You run the application and see that it takes N seconds for a single training epoch. How do you know if it is good? If improvement is possible, what can you do to improve the training time? Are there tools to identify a tuning strategy? Intel software development tools can answer these questions to maximize your productivity in deep learning on Intel architecture. At the Intel AI DevCon 2018 in San Francisco, Alaa Eltablawy (Colfax) presented a workshop that demonstrates how this works. For the workshop, attendees received access to the Intel® AI DevCloud, where they could experiment with the optimization of a TensorFlow-based application for image segmentation. The instructor demonstrated the performance analysis results obtained with Intel® VTune Amplifier and Application Performance Snapshot and explained how this analysis consistently guides you to the use of known “performance tuning knobs” in [...]

Pablo Gonzalez-de-Aledoa, Andrey Vladimirovd, Marco Mancab, Jerry Baughc, Ryo Asaid, Marcus Kaisere,f, Roman Bauerf,e a Software Performance Optimization Group, Imperial College London, London, United Kingdom b CERN Openlab, IT Department, CERN, Switzerland c Intel Corporation, USA d Colfax International, USA e Interdisciplinary Computing and Complex BioSystems Research Group, School of Computing, Newcastle University, Newcastle upon Tyne, United Kingdom f Institute of Neuroscience, Newcastle University, Newcastle upon Tyne, United Kingdom A paper led by Pablo Gonzales-de-Aledo (Imperial College London) with contributions from his colleagues from CERN, Intel, Colfax and Newcastle University was published in the journal Advances in Engineering Software. This is a case study on performance optimization in a biological simulation code. The code presents a highly parallel implementation of a computer simulation that involves millions of agents interacting in a 3D environment. The paper explains the general approach to transforming a sequential code to run on modern, highly [...]

This publication demonstrates the process of optimizing an object detection inference workload on an Intel® Xeon® Scalable processor using TensorFlow. This project pursues two objectives: Achieve object detection with real-time throughput (frame rate) and low latency Minimize the required computational resources In this case study, a model described in the “You Only Look Once” (YOLO) project is used for object detection. The model consists of two components: a convolutional neural network and a post-processing pipeline. In this work, the original Darknet model is converted to a TensorFlow model. First, the convolutional neural network is optimized for inference. Then array programming with NumPy and TensorFlow is implemented for the post-processing pipeline. Finally, environment variables and other configuration parameters are tuned to maximize the computational performance. With these optimizations, real-time object detection is achieved while using a fraction of the available processing power of an Intel Xeon Scalable processor-based system. [...]