When using libraries, I often find myself torn between the standard ones versus importing a firehose like numpy. The former gives me the base upon which I build functions and get the task at hand done, which is often tedious, but offers a sense of accomplishment in the end. Whereas with the latter, all the heavy lifting is done for you by this powerful library. (There are, of course, occasions where I do not hesitate to use numpy, like generating plots for example, where I am less interested in understanding graphics, more in getting results for further realization.) This I suppose rarely crosses a full-time programmer’s or a data scientist’s mind, who may be doing this all day with a key objective of writing error-free code quickly and painlessly.

For someone who programs occasionally — not as my primary work routine, I find developing muscle memory valuable, as it helps me reapply techniques across a variety of tasks. Failing often forces me to pay attention to details, and seek clarity on control flow. I realized this when I was writing a block of code to process vessel motions data from sea-transporting assets recently. The list (or array) looked like this (saved in a file named motions_data.py):

I’ve taken a simple CSV file and added a structure to it in the above — with anonymized data, as you can tell. Of course, most people I think won’t bother to do this by hand, and instead read a file, and get it as a list with the same virtual structure in a few lines of code. But, I like seeing data structure beforehand — as a personal preference and where practically possible, to avoid overloading my mind with too many abstractions. In that regard, I find comfort in Rob Pike’s Rule 5 of programming:

Data dominates. If you’ve chosen the right data structures and organized things well, [then] the algorithms will almost always be self-evident. Data structures, not algorithms, are central to programming.

If desired, a dictionary like structure can easily be added to raw csv data using the code below, whose output would look much like the list shown above.

On line 10, map() takes an anonymous inline function (via lambda) to pick values corresponding to rangle (roll angle) and form a list subset from the larger motions_data list. The result of line 10 looks like this:

[10.5, 13.7, ..., 15.3, 15.3]

Notice from above that only values corresponding to roll angles (those with rangle labels) appear in this list — this is via x['rangle'], which would still have output like 'rangle': 10.5. Thereafter, filter() is further used to remove the remaining labels in the list, namely, all instances of rangle. The code on line 10, therefore, is essential in either the standard library method, or the numpy method that we’ll see later.

With if len(roll_angles_topside) > 0:, overflow error is avoided, and we then want to progress further to calculate the average (or mean) of these roll angles. This is done with reduce(), which takes a function (add, an operator in this case) to sum all values in the list, roll_angles_topside. The sum is then divided by the number of values in the list to get mean roll angle.

Determining variance is similar to finding mean, but just a little more complex, as it requires calculating the deviation of each value in the list, deduct it from the mean, square the result, and then convert these in to a list of deviations. This is done by the function, vfn:

All the functional acrobatics are performed invisibly by this magical library, and so, just by invoking numpy.mean() and numpy.std() and feeding an array, numpy.array(roll_angles_topside), to these functions, one can determine the mean and standard deviation in one simple call each. So, as one can see, it’s easy to get high on numpy. (Python 3.x has statistics module that caters specifically to these functions.)

Making the script generic

The above examples take a hardcoded motion parameter (rangle) and dataset (topside_assets), so to extend them to allow the user to choose a dataset as well as a motion parameter desired (without having the edit the script), I’ve modified to code below slightly — again for both the methods:

Zip (Jul 3, 15)

A couple of days ago, I was stuck with a problem calculating wave lengths: I had three lists (of equal lengths, or number of items) — generated from a tuple, which consisted of significant wave heights (Hs) & peak wave periods (Tp) — both corresponding to 1-month return period, and water depths (d).

Due to the need to perform depth criteria check — as shallow water non-linearity kicks-in for intermediate and shallow depths, I could not use the map() function. This is because the anonymous function, lambda(), is too simple for multiple conditional checks.

After scouting around built-in functions, I found zip(), and it did not disappoint. Before I proceed to show how I used it, here’s the depth criteria check for wave length calculations (Table 3.3, Deepwater criterion and wave length, Subrata K. Chakrabarti, Handbook of Offshore Engineering, Volume I, 2005):

To get this in code, I calculated wave lengths for all types first, and then mapped them into lists each, Ld_all, Li_all, Ls_all as below: