science

I could start off into painful detail about the algorithm behind Myriad Desktop and how it looks for Regions Of Interest (ROI), but let’s start with a video.

I trained a Myriad model to recognize indications in ultrasonic data. As the sliding window moves across the input data, we see what the damage-detection model sees when it looks for damage. In this case, a Sobel edge detector was applied. When the model would classify the current window as containing an indication, it flashes the window’s border. When the window finishes looking at the current input data, the Pyramid halves the input data’s size and repeats the process until our sliding window’s size is larger than the size of the current scaled-down version of the input. By considering our input data at several different scales, we remove the need for the damage detection algorithm to handle this in its code. I’ve posted sample code to build a similar ROI detection pipeline.

It’s more or less the same operation in Desktop, except it’s done concurrently. Each of the steps – ingestion, scaling, sliding, etc. – is set up as a central router with a configurable number of workers. When the router receives a new task it’s put on a work queue for the next available worker. When a worker completes a task it sends its results back to its router, which then sends to the next stage in the pipeline. If I had to draw a picture of the algorithm in Desktop I’d use something like this, with each stage in the algorithm represented with a different color:

Each arrow in this picture represents Akka messages between Actors. The smaller bi-directional arrows show messaging between stage workers and the central router, and the larger black arrows show messages passed between stages. This visualization shows the benefits of using Akka as the foundation for our concurrency in Myriad. Each stage is independent and doesn’t know / doesn’t care about the others. We can insert, delete, and reorder stages in a computation at run-time e.g. with LinkedWorkerPools. We can create multi-core pipelines, switch to a multi-system pipeline, or mix and match as we go.

There haven’t been too many updates as of late because I’ve been working on a paper and presentation…and I’m happy to announce that both were accepted for SPIE’s Smart Structures and NDE for Industry 4.0 ! So if you’ve got time to kill in March, you should drop by and listen to me blather for 20 minutes about a Big Data problem and how an Actor-based architecture makes it easier to handle. Hope to see you there!

Speaking of Actor-based systems, the latest changes to Myriad include a bugfix to the Canny edge detector and a new auto-thresholding option for the same, so please update as soon as possible.

I’ve also added initial support for oversampling to correct imbalances in classes based on the SMOTE ( Synthetic Minority Over-sampling TEchnique) algorithm that may be of interest. Myriad was written to help with Region Of Interest (ROI) detection applications, which often have many more negative (i.e. not ROI) samples than positive (are ROI). The Myriad toolset currently uses RUS (Random UnderSampling) to try to correct this imbalance by basically randomly discarding members of the majority class to reduce the imbalance. The SMOTE algorithm tries to correct the imbalance by oversampling-generating synthetic members of the minority class.

I haven’t fully integrated SMOTE into the toolset yet but it’s available for the DIY-er crowd. My game plan is to make it an option available for cross-validation in Trainer. Intuitively to me it makes sense to apply SMOTE after we’ve split the data into testing and training subsets, i.e. we apply SMOTE to the training set rather than the original dataset. Otherwise our synthetic samples would make it into the testing set and we’d be testing models for their ability to classify real and “fake” data when what we really want is to test for ability to classify real data alone.

I’m very pleased to announce that Myriad was selected as a finalist in the Spring 2017 round of AIGrant! Whether we win or not is secondary; I’m grateful for the opportunity and the chance to increase project visibility just by being listed. I’m hopeful that with increased visibility we’ll be able to get more relevant data from calibration standards, actual inspection data, etc. and make more useful models for detecting damage. Each finalist was asked to put together a two minute video demonstration for their entry, and you’ll find Myriad’s embedded below.

Many thanks to Nat Friedman, the judges, and the sponsors for the opportunity!

Working with some Log ASCII standard (LAS) data the other day made me realize that I owe a belated thank-you to Canadian Nuclear Laboratories (CNL) for allowing me to write the WIMS-AECL (Winfrith Improved Multigroup Scheme Atomic Energy Of Canada Limited *phew*) post processor “WIMSpp”. It might not ever find widespread use outside CNL (or inside CNL for that matter), but I found it to be worthwhile for a couple of reasons.

A couple of months of head down, solve a problem with Python work is always appreciated.

After the NDIToolbox project had wrapped up, I’d had a couple of ideas on how to handle plugins in a self-contained (no Python install) Python app for Windows users and WIMSpp gave me the opportunity to test them out.

NASA’s Phase I SBIR schedule can be challenging to meet, particularly since it usually requires the first demonstration of feasibility no later than the third month of effort. Things went well on this project on that front – by the end of the second month we had enough of the basic framework in place that we could demonstrate how a Myriad-based machine learning model could be taught to recognize indications of cracks, corrosion, and other types of damage in sensor data; and then build a concurrent processing pipeline in Myriad to run multiple instances of the model concurrently to rapidly scan large amounts of data for structural damage. We were even able to add experimental OpenCL support, which will eventually let you use your GPU, Xeon Phi, etc. to accelerate calculations. If Myriad finds a suitable hardware accelerator it picks the “best” (based on number of CUDA cores / stream processors) device for acceleration and if nothing’s found it falls back to conventional CPU calculations, all without changing a single line of code or configuration.

Very heavy emphasis on “experimental support” though – ask me about the denial of service attack I launched on my GPU sometime. Not recommended for production use! 🙂

All in all I think the project went quite well and I hope to be able to continue work on cleaning up the code to make it easier to use and add a few more features and functions.

You’ll need Java 8 (Oracle or OpenJDK), Apache Maven, and (optionally) OpenCL SDKs if you’d like to use the experimental support for GPUs. Documentation is also available, as well as code samples and videos.

As previously promised I’ve updated the Myriad samples with a concurrent demo that more or less follows the same workflow as the 60 day technical review video. Included in the project for your convenience is a pre-trained machine learning model that’s learned to recognize indications of structural damage in C-scan maps. Although I originally trained it on ultrasonic sensor data, it did show some promise when I tried it out with microwave and X-ray data.

You’d most likely want to train it a bit more on data representative of your inspection to get the best results.

It’s a little amazing to me that ~250 lines of Java gets you a concurrent fault-tolerant damage detection app, but that’s very much courtesy the very excellent Akka framework. Well worth a look for your next project!

So I had this idea about how you could automatically analyze tons of NDE data…

Experience test-driving NDIToolbox in the field (or the depot / hangar to be more accurate) showed me that there is a ton of NDE sensor data out there and that it can take forever and a fortune to analyze manually. I’d experimented with algorithms to automatically flag possible indications of damage in the data when I was working on NDIToolbox and a project for the Air Force, but I’d never really gone beyond the proof of concept stage. Until recently I didn’t have a good handle on how to make it multi-processor and/or distributed, either – sitting in a depot for an hour waiting for a file to load has taught me single-threaded analysis isn’t feasible.

Two months or so in to the project and there’s a rather long demonstration of “Myriad” in action available, in which we train a machine learning model to automatically detect indications of damage in ultrasonic sensor data. I hope to have a few more demos of calling external apps or building a Myriad P2P cluster soon, stay tuned!