About crc

Posts by Chris Coughlin:

With the completion of the NASA Phase I SBIR I thought it would be useful to tag the versions of Myriad, Myriad Desktop, and Myriad Trainer that were delivered at the end of the project. They’ve all been tagged as version “1.0-SNAPSHOT” and here are the links:

Since we’re likely to break backwards compatibility in the next version, for your convenience here’s the trained model bundle we put together if you want to take a test drive. What’s in the bundle? Quoting from the original docs:

Inside is a model that has been trained to find indications of structural damage in sensor data and some sample input files to get you started. Download the bundle and extract the model and the sample files in a convenient location, then use them as you proceed through the documentation.

Basically once you’ve built the 1.0-SNAPSHOT-tagged source code, this bundle gives you a starting point for training a machine learning model to detect damage in ultrasonic data and using it in a distributed damage detection system.

So what’s coming in the next version of Myriad? First up is a major redo of the serialization. The initial approach to serialization was functional but not easy to use either as a developer or as an end user. This was fine in the Phase I where you’re under time constraints to show feasibility early on, but there’s definitely room for improvement. Among other things I’m hoping to implement is a “black box” of sorts that bundles the machine learning model and its pre-processing functions in a single package so that an end user doesn’t need to keep track of how to massage raw data for the model, just feed it in and get the results. This should make it easier to share and distribute machine learning models.

I could start off into painful detail about the algorithm behind Myriad Desktop and how it looks for Regions Of Interest (ROI), but let’s start with a video.

I trained a Myriad model to recognize indications in ultrasonic data. As the sliding window moves across the input data, we see what the damage-detection model sees when it looks for damage. In this case, a Sobel edge detector was applied. When the model would classify the current window as containing an indication, it flashes the window’s border. When the window finishes looking at the current input data, the Pyramid halves the input data’s size and repeats the process until our sliding window’s size is larger than the size of the current scaled-down version of the input. By considering our input data at several different scales, we remove the need for the damage detection algorithm to handle this in its code. I’ve posted sample code to build a similar ROI detection pipeline.

It’s more or less the same operation in Desktop, except it’s done concurrently. Each of the steps – ingestion, scaling, sliding, etc. – is set up as a central router with a configurable number of workers. When the router receives a new task it’s put on a work queue for the next available worker. When a worker completes a task it sends its results back to its router, which then sends to the next stage in the pipeline. If I had to draw a picture of the algorithm in Desktop I’d use something like this, with each stage in the algorithm represented with a different color:

Each arrow in this picture represents Akka messages between Actors. The smaller bi-directional arrows show messaging between stage workers and the central router, and the larger black arrows show messages passed between stages. This visualization shows the benefits of using Akka as the foundation for our concurrency in Myriad. Each stage is independent and doesn’t know / doesn’t care about the others. We can insert, delete, and reorder stages in a computation at run-time e.g. with LinkedWorkerPools. We can create multi-core pipelines, switch to a multi-system pipeline, or mix and match as we go.

Working with some Log ASCII standard (LAS) data the other day made me realize that I owe a belated thank-you to Canadian Nuclear Laboratories (CNL) for allowing me to write the WIMS-AECL (Winfrith Improved Multigroup Scheme Atomic Energy Of Canada Limited *phew*) post processor “WIMSpp”. It might not ever find widespread use outside CNL (or inside CNL for that matter), but I found it to be worthwhile for a couple of reasons.

A couple of months of head down, solve a problem with Python work is always appreciated.

After the NDIToolbox project had wrapped up, I’d had a couple of ideas on how to handle plugins in a self-contained (no Python install) Python app for Windows users and WIMSpp gave me the opportunity to test them out.

NASA’s Phase I SBIR schedule can be challenging to meet, particularly since it usually requires the first demonstration of feasibility no later than the third month of effort. Things went well on this project on that front – by the end of the second month we had enough of the basic framework in place that we could demonstrate how a Myriad-based machine learning model could be taught to recognize indications of cracks, corrosion, and other types of damage in sensor data; and then build a concurrent processing pipeline in Myriad to run multiple instances of the model concurrently to rapidly scan large amounts of data for structural damage. We were even able to add experimental OpenCL support, which will eventually let you use your GPU, Xeon Phi, etc. to accelerate calculations. If Myriad finds a suitable hardware accelerator it picks the “best” (based on number of CUDA cores / stream processors) device for acceleration and if nothing’s found it falls back to conventional CPU calculations, all without changing a single line of code or configuration.

Very heavy emphasis on “experimental support” though – ask me about the denial of service attack I launched on my GPU sometime. Not recommended for production use! 🙂

All in all I think the project went quite well and I hope to be able to continue work on cleaning up the code to make it easier to use and add a few more features and functions.

You’ll need Java 8 (Oracle or OpenJDK), Apache Maven, and (optionally) OpenCL SDKs if you’d like to use the experimental support for GPUs. Documentation is also available, as well as code samples and videos.

As previously promised I’ve updated the Myriad samples with a concurrent demo that more or less follows the same workflow as the 60 day technical review video. Included in the project for your convenience is a pre-trained machine learning model that’s learned to recognize indications of structural damage in C-scan maps. Although I originally trained it on ultrasonic sensor data, it did show some promise when I tried it out with microwave and X-ray data.

You’d most likely want to train it a bit more on data representative of your inspection to get the best results.

It’s a little amazing to me that ~250 lines of Java gets you a concurrent fault-tolerant damage detection app, but that’s very much courtesy the very excellent Akka framework. Well worth a look for your next project!

Development of the distributed fault-tolerant data reduction frameworkMyriad continues apace. While we’re working towards the release, I’ve gone ahead and posted sample code so potential users can get a feel for what’s there and how to use it. The most interesting demo is probably the complete ROI detection pipeline which shows how to do everything from read the input to report the ROI in 125 lines of Java. This is a close approximation to the basic flow of the pipeline I demoed in the two month technical review video, the primary difference being that this is a single-threaded implementation. I’ll update soon with a concurrent version for comparison.

[Update Tue Oct 25 15:38:38 CDT 2016] : the concurrent version of the pipeline is now live.