Tuesday, November 25, 2014

The other day I saw a something retweeted by @leppie (I think) about an experimental hyper-fast vector math driven 3D engine for the dot Net Framework. This led me to investigate whether there is a default implementation of vector math in the dot Net Framework. As it turns out, there is.

This is of interest because (I think) this would make IronPython the only Python implementation that has vector math included without having to install a third party library. Java has a utils.Vector object, but it has nothing to do with vector math (it's a specialized array). You do need to use the dot Net Framework instead of standard Python modules, but if you're running IronPython, you should have access to that anyway.

The whole, or at least a big part of the idea of running a Python implementation against the dotNet Framework is that you can leverage the power of that big library collection with a language that's fairly dense, easy, and doesn't require compilation.

This was pretty easy on Windows. The only confusing part is that there are two namespaces in dot Net called System.Windows. You want the one that references the WindowsBase dll. This is the one that has our Vector object in it.

The code (including the plotting by Gnuplot - I had to download the Windows version; I did leave out the monastery.py file with the original shape points in it; also, the writetofile.py file is almost exactly like the one from the previous post except that for a Vector object, the x and y names are capitalized):

# vecipy.py"""Polygon offset problem usingdot Net Framework."""

import clr

WINX = 'WindowsBase'

clr.AddReference(WINX)

from System.Windows import Vector

import mathimport copy

import monastery as pic

OFFSET = 0.15

def scaleadd(origin, offset, vectorx): """ From a Vector representing the origin, a scalar offset, and a Vector, returns a Vector object representing a point offset from the origin.

I run OpenBSD on my laptop at home. So I would be using mono in my cross-platform experiment.

Microsoft just recently (Fall 2014) announced the open sourcing of the dotNet Framework and cross platform capability for it. The mono project responded very positively to this announcement. I would imagine this as being good news for IronPython too.

OpenBSD has a package for mono. From there, I just needed to download the IronPython binaries and run mono against them, or so I thought . . .

As it turns out, my script kept crashing on the overloaded Vector.Multiply method - NotImplementedError. I tried to research things, wasn't having any luck, and brute forced the problem by wrapping the method in a class in C# class I called vecx:

Note (26NOV2014): I hacked this C# module up a bit too quickly and didn't have performance or elegance in mind. If you declare those Multiply methods as static you can save yourself the trouble of instantiating a new instance of the class each time you want to call them. In fact, you can do the same thing with all the Vector methods you want to use (Add, CrossProduct, etc.). I was just too hurried and too lazy. CBT

Sunday, November 16, 2014

A few years back I did two or three posts on polygon offset. It was a
learning experience that I never quite completed to my satisfaction. A
kind visitor to my last post on the subject, Mr. Ahmad Rafsanjani,
actually rewrote some of my code in a comment. I gave him a polite
weasel answer thanking him, but dropped the effort and never felt quite
right about it.

Well, as the saying goes, better late than never. He was quite correct in his assessment, but my understanding of vector math was not strong enough to prove this to myself. I was visually inspecting the results, and, given what I was dealing with at the time, they seemed OK.

Here is the picture we're trying to get (this is with Mr. Rafsanjani's code, but the difference with mine and the original code, although wrong, is not great):

In order to nail down the discrepancy in my original code, I inserted some print statements with a lot of numeric precision (28 digits to the right of the decimal) in the output:

$ more points1.2231671842700024832595318003 1.70241951348501396878987179662.1231671842700023944416898303 1.70241951348501396878987179662.2768328157299975167404681997 2.54758048651498603121012820341.6635803619063778135966913396 2.54938398097015550547439488581.7364196380936223196300716154 3.35061601902984440570776314422.5205825797292722434406186949 3.35291289864636210538151317452.6794174202707274901058553951 4.14708710135363833870769667562.1360193516544989655869812850 4.1228847880778562995374159073(etc.)The numbers highlighted in yellow are mismatches in the Y-coordinates of points of the inset offset polygon - each pair of Y coordinates should represent lines parallel to the X axis; in other words, they should be equal. I have a bug.

Contrast that with the numbers yielded by Mr. Rafsanjani's code:

$ more points1.2251864530113494300422871675 1.70000000000000017763568394002.1251864530113491191798402724 1.70000000000000017763568394002.2797319075568038826418160170 2.54999999999999982236431606001.6642549229616445671808833140 2.54999999999999982236431606001.7369821956889173186766583967 3.35000000000000008881784197002.5229705854077835169846366625 3.35000000000000008881784197002.6829705854077836590931838145 4.15000000000000035527136788012.1880983342360056376207921858 4.15000000000000035527136788012.6780983342360054066944030637 4.84999999999999964472863211993.1219016657639944156699129962 4.8499999999999996447286321199(etc.)Much better. Lines that are supposed to be perfectly parallel to the X axis are, at least to 28 decimal places precision and the limits of my platform and the C Python interpreter, parallel to the X axis. For what I am doing, I can more than live with that.

I've included Mr. Rafsanjani's comments in the code. My modifications to his code were mainly for the purpose of printing some things out and organizing the polygon offset part of this exercise into a module.

I've made a separate main script for gnuplot. After not looking at everything for three years I realized I had forgotten everything I ever knew about gnuplot and wanted to record it this time. The file with the 20 points for the shape (monastery.py) is available on request.

Here is the main pyeuclid/polygon offset part of the code (rafsanjanicorrection.py):

pyeuclid, to the best of my knowledge, runs only in Python 2.7 at the moment. In any case, I got an error on the Python 3.4 install with setup.py so I stuck with 2.7.Thanks to Mr. Rafsanjani for his help with this and for the rest of you for stopping by.

Monday, November 3, 2014

I am returning from MeetBSD in San Jose, California. This isn't a Python-related post per se, but the BSD family of operating systems maintains packages and ports for Python and Python third party libraries, and use of Python on these systems is significant both in the open source development and commercial spheres.

The structure of the conference is a brief weekend unconference. Nonetheless some of the talks were more than worthy of a full fledged mega-con, and the rest were quality. It was a good deal.

We met in a rectangular conference room. All of Silicon Valley seems to me to be an endless office park with nice weather and some landscaped spots (I've included the obligatory Strelizia/bird of paradise pic from the conference hotel entrance below). It was a fairly intimate setting. The food (a variety of sandwiches) was good. We were warned ahead of time that Wifi was limited; I brought my own Verizon jetpack unit so it wasn't an issue for me.

Talks (that I attended):

1) Rick Reed, “WhatsApp: Half a billion unsuspecting FreeBSD users” - Erlang and FreeBSD at WhatsApp used for scaling. Now 600,000 users. It was a good talk, but I wasn't awake and some of it went over my head.

2) Jordan Hubbard, “FreeBSD: The Next 10 Years” Good talk; I hated it :-(

Hubbard's leaving Apple a couple years ago and signing on with iXSystems (a sponsor and essentially the organizer of this conference) made a big splash. He is an accomplished dev and a good guy by all accounts. His ideas are on many levels very valid in every sense.

I am primarily an OpenBSD user. I run FreeBSD on my RPi and on a spare laptop for easy access to Java. The two OS's have similar philosophies in some respects (correctness, BSD license, etc.). There is cross-polination when it comes to operating system components, apps, and drivers. But where OpenBSD unapologetically maintains new releases for older hardware and uncompromisingly adheres to its leader's approach to security and development, FreeBSD in the framework of Hubbard's talk is looking more towards the future and making changes to attract younger talented core committers and target more modern (read mobile) platforms. Telemetry, scrapping development on older platforms "ruthlessly," getting younger devs involved by providing work that's interesting to them - all this stuff is important for FreeBSD going forward. At one point he even <gasp> suggested systemd as a good strategy for Linux that FreeBSD should, at least in principle if not in form, emulate.

FreeBSD is everywhere - or at least in a lot of places companies just don't make a big deal of. Inside cable (connections) was the one example. In order to accomodate mobile and embedded environments, the OS, although well suited to these platforms now, needs to change.

A lot of this in my mind goes against OpenBSD's philosophy - purity and security at all costs. My personal philosophy lies with the OpenBSD approach, but I may well be wrong. Hubbard is a guy with a lot of industry know how and experience and I am a geologist who uses OpenBSD. He is probably right, but I don't want my fun to stop, so I'm sticking with OpenBSD even if death awaits us . . .

3) David Maxwell, "The Unix command pipeline - using Unix in the renewable energy era"

I always liked Maxwell. He's a Canadian guy and a NetBSD devotee.

His talk was about a command line app he's putting together for better tracking piped commands on the UNIX command line and reproducing, referencing, and inspecting them retroactively in a way that's easier than what you have to do now. I think it's got potential and would like to see it succeed.

After the angst I felt over Hubbard's talk, this was a welcome relief. The UNIX command line is something everyone, or most everyone at the con knows and loves. Everyone uses piped commands. This is a useful approach to a common problem - that's something we can all agree on. My favorite talk of the conference (that I attended).

4) Alex Rosenberg, "Meet PlayStation 4"

By far and away the coolest talk. Rosenberg presented this well and spoke honestly and as openly as he could as a member of a big commercial project about specifics. Games require so much optimization at such a low level. Although this theme came up in a number of the talks, on the PlayStation project it's critical. Essentially, the best hardware and hardware architecture for the project is selected for a given product lifecycle (10 years? IIRC) then you hammer at it with software modifications to get every last bit of efficiency out of it.

It's not like there's a standard laptop install of FreeBSD on PlayStation 4 and you let it rip with your happy traditional UNIX OS. They're optimizing LLVM and clang (the compiler and linkers), talking directly to the metal as much as possible, and just generally nailing performance at the lowest level of the architecture (after they've gotten the low hanging fruit up top, of course).

Another theme that came up in almost all the talks, but especially in this one, was the BSD license. Granted, it was a BSD conference, so organizers and attendees have a bias. Nonetheless, it appears that licensing is really critical in the decision to adopt open source software and operating systems. "business friendly" nowadays often has "capitalism at its worst" overtones, still, it was a theme: the BSD license is the "business friendly" one whereas the GPL, particularly the GPL3, is not . . .

I'm not a gamer, but I enjoyed this. Rosenberg is really easy to talk to as well. He let me take that pic up close when we were posing for the group pic after his talk.

5) Brendan Gregg, "Performance Analysis"

Gregg works for Netflix. He's written a lot of dtrace scripts (including numerous Python ones) and has them readily available on Github.

I found myself wishing I knew more about the subject, because performance monitoring is a really cool netadmin problem when, like Netflix, you're dealing with huge bandwidth challenges (as in other talks, so much comes down to optimization).

That said, Gregg presented some graphical tools that are useful (I'll get the names wrong, so I won't try) - basically histogram-like, color coded performance charts with labels for processes. You don't have to run your own netflix to benefit from these and he's made everything open source and available. If I were a netadmin I would jump on this. I've got to get smarter first before I can benefit from these tools.

Gregg has a soft British accent and a very amiable demeanor. He was the first talk in the morning. It was like a lullabye. This is one I need to revisit on the videos posted online because it's worth it.

6) Corey Vixie, "Web Apps on Embedded BSD..."

The iXSystems surprise talk, but a good one. The youngster Vixie briefed us a bit on what iXSystems is doing with web presentation layer (for lack of a better description) of the FreeNAS implementation.

He started off by saying static web pages are, at least for apps like FreeNAS, not the way to go anymore. Refreshing the DOM (Document Object Model) at regular intervals is not going to work well. He then introduced us to a number of mature and nascent JavaScript/web technologies, some of which no one in the room had yet heard of. Basically he had to rewrite the "old" Django/other technologies implementation to accomodate better simulation of a desktop app in the browser.

The specifics were not something I could follow well because of my ignorance. There was talk of an Open Source, BSD licensed Facebook framework whose name I can't recall, a one-way change propagation architecture for updating the dynamic web page, and, as always, optimization of the process. I asked him about Django after the talk. He said it was the best thing a couple years ago for this app, but now they needed something that could interact directly with the browser - namely JavaScript - it comes down to fine-grained control and optimization.

One humorous interlude during the Q & A was my asking him if he was indeed related to Paul Vixie, historical UNIX tools author (Vixie Cron), to which he replied, "This is the part of my talk where I say, 'I am Worf, son of Mogh.'" Anyone with a sense of humor and a knowledge of STTNG can't be all bad ;-)

A few people pics:

Dru Lavigne. Without the BSDA cert program she helped found, I would never have gotten over the hump learning UNIX. We differ on our choice of specific BSD, but I still consider her my UNIX mentor.

iXSystems old timers Denise and Matt working out conference specifics.

FreeBSD Foundation rep Anne.

Conclusion: MeetBSD is an affordable, pretty meaty con if you like UNIX, hardware, and topics about optimization and scale. It is, fortunately or unfortunately, a pretty well kept secret.

Friday, October 31, 2014

The post immediately prior to this one was an attempt to reproduce Windows.Forms Calendar controls in Gtk for cross platform (Windows/*nix) effective rendering.

This time I am attempting to get familiar with gtk-sharp/Gtk's version of a grid view - the Gtk.TreeView object. Some of the gtk-sharp documentation suggests the NodeView object would be easier to use. I had some trouble instantiating the objects associated with the NodeView and went with the TreeView instead in the hopes of getting more control.

The Windows.Forms GridView I did years ago is here. It became apparent to me shortly after embarking on this journey that I would be hard pressed to recreate all the functionality of that script in a timely manner. I settled for a tabular view of drillhole data (fabricated, mock data) with some custom formatting.

Aside: this is typically how mineral exploration drillhole data (core, reverse circulation drilling) is presented in tabular format - a series of from-to intervals with assay values. Assuming the assays are all separate elements, the reported weight percents should not sum more than 100%, and never do unless someone fat fingers a decimal place. I've projected a couple screaming hot polymetallic drill holes that end near surface (lack of funding for drilling), but show enough promise that the new mining town of Trachteville (the drill hole name CBT-BNZA stands for CBT-Bonanza) will spring up there at any moment . . . one can dream.

The data store object for the grid view Gtk.ListStore object would not instantiate in IronPython. I was not the only person to have experienced this problem (I cannot locate the link to the mailing list thread or forum reference, but like the big fish that got away, I swear I saw it). I didn't want to drop the effort just because of that, so I hacked and compiled some C# code:

Those are my file paths; locations depend on where you install things like mono and IronPython.

Anyway, I got my dll and I was off to the races. Getting to know the Gtk and gtk-sharp object model proved challenging for me. I'm glad I got some familiarity with it, but it would take me longer to do something in Gtk than it did with Windows.Forms. The most fun and gratifying part of the project was getting the custom formatting to work with a Gtk.TreeCellDataFunc. I used a function that yielded specific functions for each column - something that's really easy to do in Python.

Anyway, here are a couple screenshots and the IronPython code:

The OpenBSD one below turned out pretty good, but the Windows one had a little double line underneath the first row - it looked as though it was still trying to select that row when I told it specifically not to. I'm not a design perfectionist Steve Jobs type, but niggling nits like that drive me batty. For now, though it's best I publish the code and move on.

#!/usr/local/bin/mono /home/carl/IronPython-2.7.4/ipy64.exe

import clr

GTKSHARP = 'gtk-sharp'PANGO = 'pango-sharp'

# Mock store C#STOREX = 'storex'

clr.AddReference(GTKSHARP)clr.AddReference(PANGO)

# C# module compiled for this project.# Problems with Gtk.ListStore in IronPython.clr.AddReference(STOREX)

# XXX - not very generic, but better than doing them one by one. # from, to columns. for x in xrange(1, 3): self.columns[FIELDS[x]].SetCellDataFunc(self.cellrenderers[FIELDS[x]], genericfloatformat(FP1FMT, x)) # assay<x> columns. for x in xrange(3, 7): self.columns[FIELDS[x]].SetCellDataFunc(self.cellrenderers[FIELDS[x]], genericfloatformat(FP2FMT, x))

def usemarkup(self): """ Refreshes UseMarkup property on widgets (labels) so that they display properly and without markup text. """ # Have to refresh this property each time. self.frame.LabelWidget.UseMarkup = True

def prettyup(self): """ Get Gtk objects looking the way we intended. """ # Try to get Courier New on treeview. self.tree.ModifyFont(self.fdregular) # Get rid of line. self.frame.Shadow = Gtk.ShadowType.None self.usemarkup()

Thursday, October 30, 2014

A number of years ago I did a post on the IronPython Cookbook site about the Windows.Forms Calendar control. I could never get the thing to render nicely on *nix operating systems (BSD family). It sounds as though Windows.Forms development for mono (and in general) is kind of dead, so there is not much hope that solution/example will ever render nicely on *nix. Recently I've been playing with mono and decided to give gtk-sharp a shot with IronPython.

Quick disclaimers:

1) I suspect from the examples I've seen on the internet that PyGtk is a little easier to deal with than gtk-sharp. That's OK; I wanted to use IronPython and have the rest of the mono/dotNet framework available, so I went through the extra trouble to forego CPython and PyGtk and go with IronPython and gtk-sharp instead.

2) The desktop is not the most cutting edge or sexy platform in 2014. Nonetheless, where I work it is alive and well. When I no longer see engineers hacking solutions in Excel and VBA, I'll consider the possibility of outliving the desktop. Right now I'm not hopeful :-\

The results aren't bad, at least as far as rendering goes. I couldn't get the Courier font to take on OpenBSD, but the Gtk Calendar control looks acceptable. All in all, I was OK with the results on both Windows and OpenBSD. I've heard Gtk doesn't do quite as well on Apple products, but I don't own a Mac to test with. Here are a couple screenshots:

I run the cwm window manager on OpenBSD and have it set up to cut out borders on windows, hence the more minimalist look to the control there.

IronPython output on *nix has always come out in yellow or white - it doesn't show up on a white background, which I prefer. In order to get around this, I run an xterm with a black background:

Monday, October 20, 2014

Each month I redo 3D block model interpolations for a series of open pits at a distant mine. Those of you who follow my twitter feed often see me tweet, "The 3D geologic block model interpolation chuggeth . . ." What's going on is that I've got all the processing power maxed out dealing with millions of model blocks and thousands of data points. The machine heats up and with the fan sounds like a DC-9 warming up before flight.

All that said, running everything roughly in parallel is more efficient time-wise than running it sequentially. An hour of chugging is better than four. The way I've been doing this is using the Python (2.7) subprocess module's Popen method, running my five interpolated values in parallel. Our Python programmer Lori originally wrote this to run in sequence for a different set of problems. I bastardized it for my own.

The subprocess part of the code is relatively straightforward. Function startprocess() in my code covers that.

What makes this problem a little more challenging:

1) it's a vendor supplied executable we're dealing with . . . without an API or source . . . that's interactive (you can't feed it the config file path; it asks for it). This results in a number of time.sleep() and <process>.stdin.write() calls that can be brittle.

2) getting the processes started, as I just mentioned, is easy. Finding out when to stop, or kill them, requires knowledge of the app and how it generates output. I've gone for an ugly, but effective check of report file contents.

3) while waiting for the processes to finish their work, I need to know things are working and what's going on. I've accomplished this by reporting the data files' sizes in MB.

4) the executable isn't designed for a centralized code base (typically all scripts are kept in a folder for the specific project or pit), so it only allows about 100 character columns in the file paths sent to it. I've omitted this from my sanitized version of the code, but it made things even messier than they are below. Also, I don't know if all Windows programs do this, but the paths need to be inside quotes - the path kept breaking on the colon (:) when not quoted.

Basically, this is a fairly ugly problem and a script that requires babysitting while it runs. That's OK; it beats the alternative (running it sequentially while watching each run). I've tried to adhere to DRY (don't repeat yourself) as much as possible, but I suspect this could be improved upon.

The reason why I blog it is that I suspect there are other people out there who have to do the same sort of thing with their data. It doesn't have to be a mining problem. It can be anything that requires intensive computation across voluminous data with an executable not designed with a Python API.

Notes:

1) I've omitted the file multirunparameters.py that's in an import statement. It has a bunch of paths and names that are relevant to my project, but not to the reader's programming needs.

2) python 2.7 is listed at the top of the file as "mpython." This is the Python that our mine planning vendor ships that ties into their quite capable Python API. The executable I call with subprocess.Popen() is a Windows executable provided by a consultant independent of the mine planning vendor. It just makes sense to package this interpolation inside the mine planning vendor's multirun (~ batch file) framework as part of an overall working of the 3D geologic block model. The script exits as soon as this part of the batch is complete. I've inserted a 10 second pause at the end just to allow a quick look before it disappears.