There haven’t been too many updates as of late because I’ve been working on a paper and presentation…and I’m happy to announce that both were accepted for SPIE’s Smart Structures and NDE for Industry 4.0 ! So if you’ve got time to kill in March, you should drop by and listen to me blather for 20 minutes about a Big Data problem and how an Actor-based architecture makes it easier to handle. Hope to see you there!

Speaking of Actor-based systems, the latest changes to Myriad include a bugfix to the Canny edge detector and a new auto-thresholding option for the same, so please update as soon as possible.

I’ve also added initial support for oversampling to correct imbalances in classes based on the SMOTE ( Synthetic Minority Over-sampling TEchnique) algorithm that may be of interest. Myriad was written to help with Region Of Interest (ROI) detection applications, which often have many more negative (i.e. not ROI) samples than positive (are ROI). The Myriad toolset currently uses RUS (Random UnderSampling) to try to correct this imbalance by basically randomly discarding members of the majority class to reduce the imbalance. The SMOTE algorithm tries to correct the imbalance by oversampling-generating synthetic members of the minority class.

I haven’t fully integrated SMOTE into the toolset yet but it’s available for the DIY-er crowd. My game plan is to make it an option available for cross-validation in Trainer. Intuitively to me it makes sense to apply SMOTE after we’ve split the data into testing and training subsets, i.e. we apply SMOTE to the training set rather than the original dataset. Otherwise our synthetic samples would make it into the testing set and we’d be testing models for their ability to classify real and “fake” data when what we really want is to test for ability to classify real data alone.

I’m very pleased to announce that Myriad was selected as a finalist in the Spring 2017 round of AIGrant! Whether we win or not is secondary; I’m grateful for the opportunity and the chance to increase project visibility just by being listed. I’m hopeful that with increased visibility we’ll be able to get more relevant data from calibration standards, actual inspection data, etc. and make more useful models for detecting damage. Each finalist was asked to put together a two minute video demonstration for their entry, and you’ll find Myriad’s embedded below.

Many thanks to Nat Friedman, the judges, and the sponsors for the opportunity!

With the completion of the NASA Phase I SBIR I thought it would be useful to tag the versions of Myriad, Myriad Desktop, and Myriad Trainer that were delivered at the end of the project. They’ve all been tagged as version “1.0-SNAPSHOT” and here are the links:

Since we’re likely to break backwards compatibility in the next version, for your convenience here’s the trained model bundle we put together if you want to take a test drive. What’s in the bundle? Quoting from the original docs:

Inside is a model that has been trained to find indications of structural damage in sensor data and some sample input files to get you started. Download the bundle and extract the model and the sample files in a convenient location, then use them as you proceed through the documentation.

Basically once you’ve built the 1.0-SNAPSHOT-tagged source code, this bundle gives you a starting point for training a machine learning model to detect damage in ultrasonic data and using it in a distributed damage detection system.

So what’s coming in the next version of Myriad? First up is a major redo of the serialization. The initial approach to serialization was functional but not easy to use either as a developer or as an end user. This was fine in the Phase I where you’re under time constraints to show feasibility early on, but there’s definitely room for improvement. Among other things I’m hoping to implement is a “black box” of sorts that bundles the machine learning model and its pre-processing functions in a single package so that an end user doesn’t need to keep track of how to massage raw data for the model, just feed it in and get the results. This should make it easier to share and distribute machine learning models.

As previously promised I’ve updated the Myriad samples with a concurrent demo that more or less follows the same workflow as the 60 day technical review video. Included in the project for your convenience is a pre-trained machine learning model that’s learned to recognize indications of structural damage in C-scan maps. Although I originally trained it on ultrasonic sensor data, it did show some promise when I tried it out with microwave and X-ray data.

You’d most likely want to train it a bit more on data representative of your inspection to get the best results.

It’s a little amazing to me that ~250 lines of Java gets you a concurrent fault-tolerant damage detection app, but that’s very much courtesy the very excellent Akka framework. Well worth a look for your next project!

As promised, I’ve put together a short video on how to build your own cluster for sensor data analysis with Myriad. Myriad uses Akka’sRemoting feature to (hopefully) make it relatively straightforward to link up several computers in a DIY processing pipeline. If you’re using the GUI tools I wrote for NASA, you just start the GUI on each machine, then point the next machine in the processing pipeline to the remote machine and you’re good to go.

The main use cases I see for this feature are for handling big datasets and for resource-intensive ROI code. If you’ve got enough data to analyze that just reading it might take up all your available RAM, or if your ROI code needs all the CPU/GPU/RAM it can get, you can split the processing up among multiple systems. Have one machine responsible for reading the data and sending subsets to an analysis machine, which is then free to use all its resources for your ROI code. The analysis machine does its work and sends the results to another machine for reporting and compiling the results, and so on.

So I had this idea about how you could automatically analyze tons of NDE data…

Experience test-driving NDIToolbox in the field (or the depot / hangar to be more accurate) showed me that there is a ton of NDE sensor data out there and that it can take forever and a fortune to analyze manually. I’d experimented with algorithms to automatically flag possible indications of damage in the data when I was working on NDIToolbox and a project for the Air Force, but I’d never really gone beyond the proof of concept stage. Until recently I didn’t have a good handle on how to make it multi-processor and/or distributed, either – sitting in a depot for an hour waiting for a file to load has taught me single-threaded analysis isn’t feasible.

Two months or so in to the project and there’s a rather long demonstration of “Myriad” in action available, in which we train a machine learning model to automatically detect indications of damage in ultrasonic sensor data. I hope to have a few more demos of calling external apps or building a Myriad P2P cluster soon, stay tuned!

If you haven’t updated NDIToolbox since last time, it’s worth doing it now. Here’s where we are today.

Better support for UTWin data files, including preliminary support for compressed waveforms. That last one’s still highly experimental but let me know if it works for you; I don’t have access to a lot of sample data files for testing.

Squashed bugs, which includes better handling of memory errors running a plugin.

(Developers) A new report module which provides a quick-and-easy way of generating simple PDF reports.

Source code has already been updated, binaries will follow shortly. I’ll have more to say on the report module in a later post.

Development on TRI‘s nondestructive evaluation data analysis software NDIToolbox has slowed of late as we’ve gotten closer to our goal for functionality and as we get ready to do an honest-to-goodness field test later this year on a QA line. Nevertheless I’m still plugging away at it whenever I get the chance, and today I’ve got the latest and greatest available with two new features: support for multiple datasets in Winspect data files and a new “batch mode.”

The batch mode feature lets you run an NDIToolbox plugin on a set of input files, optionally spawning multiple processes to speed things up. If you have a ton of data files and you’re doing the same number crunching over and over, just point NDIToolbox to the files and the plugin and let it do it for you. You don’t have to convert your data files to HDF5 before using batch mode; as long as the file format(s) are supported by NDIToolbox it’ll fetch the data and run the plugin automatically. More info on batch mode available here from my mirror of the NDIToolbox docs. If you’re going to use batch mode’s multiprocessing, be sure to read up on the requirements (basically, don’t have really huge data files).

As usual, I’d recommend using the conventional Python version of NDIToolbox if you can. If you’re on Windows and don’t want to install Python (or you want to run from a thumb drive), the Downloads section of NDIToolbox’s Bitbucket page has a Windows installer and a compiled version available, no Python required.

If you’re writing a plugin there’s one additional step required to support the new batch mode. Since more than a few nondestructive testing system file formats like UTWin’s CSC or WinSpect’s SDT can have multiple datasets in a single file, batch mode will send your plugin a dict of all the datasets it finds in a given input file. So you’ll need a bit of code to see if you’ve been passed a single dataset (conventional user interface) or a container full of datasets (batch mode). There’s a few ways to do this but one of the most straightforward is to look for a “keys” attribute like so.

You could also just check to see if you were passed an actual dict, courtesy isinstance(). I’d recommend against doing that for now though – better to just assume it’s an associative container of some sort rather than hard-wiring an expectation of an actual dict.