There haven’t been too many updates as of late because I’ve been working on a paper and presentation…and I’m happy to announce that both were accepted for SPIE’s Smart Structures and NDE for Industry 4.0 ! So if you’ve got time to kill in March, you should drop by and listen to me blather for 20 minutes about a Big Data problem and how an Actor-based architecture makes it easier to handle. Hope to see you there!

Speaking of Actor-based systems, the latest changes to Myriad include a bugfix to the Canny edge detector and a new auto-thresholding option for the same, so please update as soon as possible.

I’ve also added initial support for oversampling to correct imbalances in classes based on the SMOTE ( Synthetic Minority Over-sampling TEchnique) algorithm that may be of interest. Myriad was written to help with Region Of Interest (ROI) detection applications, which often have many more negative (i.e. not ROI) samples than positive (are ROI). The Myriad toolset currently uses RUS (Random UnderSampling) to try to correct this imbalance by basically randomly discarding members of the majority class to reduce the imbalance. The SMOTE algorithm tries to correct the imbalance by oversampling-generating synthetic members of the minority class.

I haven’t fully integrated SMOTE into the toolset yet but it’s available for the DIY-er crowd. My game plan is to make it an option available for cross-validation in Trainer. Intuitively to me it makes sense to apply SMOTE after we’ve split the data into testing and training subsets, i.e. we apply SMOTE to the training set rather than the original dataset. Otherwise our synthetic samples would make it into the testing set and we’d be testing models for their ability to classify real and “fake” data when what we really want is to test for ability to classify real data alone.

I do seem to be writing quite a few of these. At least there wasn’t a big move involved with this one.

After a year as a Data Engineer with Contata Solutions I’ve moved on to Principal Engineer at Emphysic here in Minnesota. Contata has a terrific crew and management and I learned a lot from them, in particular it was the first time I’ve developed extensively in Java rather than my trusty Python. When I started I was excited to get the chance to learn more about Big Data Engineering and Machine Learning, but Contata needed a translator/mediator to sit between the Python-speaking machine learning team and the Java-speaking offshore development team more than an aspiring data scientist. I did learn a lot about analytics and building platforms in Spark and Storm, so it was time well spent. Hopefully I was able to make some small positive contribution, or at least a smaller negative one.

One thing I picked up that’s already reaping benefits: Akka. Writing apps in Akka feels a little less alien than apps in Storm, at least from the Java perspective (my experience with Scala is limited). Akka and lessons learned from NDIToolbox gave me an idea about processing massive amounts of sensor data…

The last time I posted one of these I was definitely moving up in the world, in terms of latitude at least. This time around it’s more of a sideways shift geographically speaking, with maybe a degree’s difference (if that). I’ve left my position as a Computational Physics Programmer with Canadian Nuclear Laboratories (neé Atomic Energy of Canada Limited) and accepted a position as a Data Analytics Engineer with Contata Solutions in Minneapolis.

CNL has a fantastically talented workforce and I enjoyed my time in Computational Reactor Physics, but ultimately I don’t think they really needed me. My first six-month plan on starting was to plan the development of a suite of post-processing tools for reactor physics simulators, with a goal of an initial prototype of the tools six months after that. Three months in to this 12 month plan I was shipping the first version to reactor analysts. That done, I had the opportunity to work on everything from RSS scrapers to exploratory data analysis to implementing more matrix solvers than I care to remember. As interesting as the work could be, I didn’t find the learning experience I’d been looking for and so it was time for me to move on.

Contata is working on some pretty interesting projects involving big data analytics and machine learning, to wit: Alertmix. Very much looking forward to learning a lot!

Like the CV says, I left my position at TRI a few months ago and started up as a Computational Physics Programmer with AECL back in the Old Country. Well, I guess the U.S. is now the Old Country; making Canada the Current Country. One of the hazards of being a dual citizen I suppose, Current vs. Old Country is all about the timing.

I enjoyed my time with TRI but this was an opportunity to grow professionally. I did my share of programming at TRI but it wasn’t my job as such. At AECL it’s all right there in the job description, and for whatever reason I was interested in trying something completely outside my experience. That, and the chance to work with clusters running my code was too good to pass up.

Unfortunately this means that visible work on NDIToolbox will probably slow down, at least in the short term. TRI continues work on the project so I’m hopeful anything they come up with will eventually make it into the open source repositories. In the meantime if you have any questions or run into any trouble feel free to contact me.

I completed the four-course Python Programming Certificate program at O’Reilly School of Technology a few weeks ago and now that I’ve had a while to think about it I thought it might be useful for anyone else considering the program to jot down a couple of paragraphs, especially since when I was first looking at the program I wasn’t able to find anything about people that had gone through the whole thing.

First, a word about where I’m coming from. I’d written some Python code at/for the day job for a year or two prior, and felt fairly comfortable with the language before I started the program. I wasn’t really looking for a certificate à la Java certification, I was mainly looking for something that’d give me more Python experience and make me feel comfortable in saying I “knew” Python.

All in all I quite enjoyed the program. There’s plenty of details online about how OST courses work but in a nutshell it’s read a lesson and complete exercises as you go, then a quiz or two with three or four questions and a programming assignment to turn in. The assignment’s usually 100-odd lines of code (not including unit tests), with the final assignments at the end of each course a little more involved. Each course has 15 or so lessons, and OST says you should expect to spend about 40 hours total on each course. I didn’t time my progress but I don’t think it took me that long, but your mileage may differ.

Of the two I very much prefer the assignments – I found the quizzes a little too “copy-paste” from the lesson, and in a lot of cases it seemed as though you had to be fairly precise with your answers’ wording to get it right. I found the wording for the quizzes to be a little fuzzy occasionally so that I had a bit of a tough time figuring out exactly what the question was asking.

The assignments on the other hand were pretty enjoyable for the most part – you’re given a task to code and some criteria to meet, then you’re more or less left to your own devices. The instructors were pretty good about pointing out places your solution could be improved, alternate ways of doing things, and so on.

After you’re taught the basics of Python, the rest of the coursework is primarily looking at the use of specific modules in the standard library. Some modules get more of a workout than others, so for example there are several lessons on both Tkinter and MySQL.

By far my favorite aspect of the program was the enforced unit testing and test-driven development process. Almost at the beginning you’re introduced to unittest, and once the preliminary introductions are out of the way you’re expected to use TDD for all the assignments. In fact if I remember correctly you’re even told at one point that your assignments won’t get a passing grade without unit tests. I found this to be the most valuable part of the coursework – by the time I was done unittest and TDD were completely second nature to me, and it’s probably the most important thing I picked up from the program.

If you’re already a hard-core Pythonista I don’t think there’s much in this program you won’t already know. On the other hand, if you’re looking to get a little workout with Python or coming to it fresh I can definitely recommend it. And in case you’re wondering, if you complete all four courses you do indeed get a certificate, suitable for framing. 🙂