In keeping with the prior Global Warming Grab Bag, this is also a list of some accumulated links without too much evaluation / commentary.

They are things of interest, but where I’ve not gotten around to doing a full posting on them, so might as well just put up the links and let folks read them as desired without my nattering in the way.

Parallel Programming ‘Start Here!’

This site has all sorts of interesting stuff in many pages, I’ll just post a link to one of them. This page is a basic introduction to doing parallel computer programming in C.

Simply put, because it may speed up your code. Unlike 10 years ago, today, your computer (and probably even your smartphone) have one or more CPUs that have multiple processing cores (Multi-core processor). This helps with desktop computing tasks like multitasking (running multiple programs, plus the operating system, simultaneously). For scientific computing, this means you have the ability in principle of splitting up your computations into groups and running each group on its own processor.

Most operating systems have a utility that allows you to visualize processor usage in real-time. Mac OSX has “Activity Monitor”, Gnome/GNU Linux has “gnome-system-monitor” and Windows has … well actually I have no idea, you’re on your own with that one. Fire it up, and run a computationally intensive program you have written, and what you will probably see is that you have a lot of computational power that is sleeping. Parallel programming allows you in principle to take advantage of all that dormant power.
Kinds of Parallel Programming

There are many flavours of parallel programming, some that are general and can be run on any hardware, and others that are specific to particular hardware architectures.

Two main paradigms we can talk about here are shared memory versus distributed memory models. In shared memory models, multiple processing units all have access to the same, shared memory space. This is the case on your desktop or laptop with multiple CPU cores. In a distributed memory model, multiple processing units each have their own memory store, and information is passed between them. This is the model that a networked cluster of computers operates with. A computer cluster is a collection of standalone computers that are connected to each other over a network, and are used together as a single system. We won’t be talking about clusters here, but some of the tools we’ll talk about (e.g. MPI) are easily used with clusters.
Types of parallel tasks

Broadly speaking we can separate a computation into two camps depending on how it can be parallelized. A so-called embarrassingly parallel problem is one for which it is dead easy to separate it into some number of independent tasks that then may be run in parallel.
Embarrassingly parallel problems

Embarrassingly parallel computational problems are the easiest to parallelize and you can achieve impressive speedups if you have a computer with many cores. Even if you have just two cores, you can get close to a two-times speedup. An example of an embarrassingly parallel problem is when you need to run a preprocessing pipeline on datasets collected for 15 subjects. Each subject’s data can be processed independently of the others. In other words, the computations involved in processing one subject’s data do not in any way depend on the results of the computations for processing some other subject’s data.

As an example, a grad student in my lab (Heather) figured out how to distribute her FSL preprocessing pipeline for 24 fMRI subjects across multiple cores on her Mac Pro desktop (it has 8) and as a result what used to take about 48 hours to run, now takes “just” over 6 hours.
[…]
Tools for Parallel Programming

The threads model of parallel programming is one in which a single process (a single program) can spawn multiple, concurrent “threads” (sub-programs). Each thread runs independently of the others, although they can all access the same shared memory space (and hence they can communicate with each other if necessary). Threads can be spawned and killed as required, by the main program.

A challenge of using threads is the issue of collisions and race conditions, which can be addressed using synchronization. If multiple threads write to (and depend upon) a shared memory variable, then care must be taken to make sure that multiple threads don’t try to write to the same location simultaneously. The wikipedia page for race condition has a nice description (an an example) of how this can be a problem. There are mechanisms when using threads to implement synchronization, and to implement mutual exclusivity (mutex variables) so that shared variables can be locked by one thread and then released, preventing collisions by other threads. These mechanisms ensure threads must “take turns” when accessing protected data.
POSIX Threads (Pthreads)

POSIX Threads (Pthreads for short) is a standard for programming with threads, and defines a set of C types, functions and constants.

More generally, threads are a way that a program can spawn concurrent units of processing that can then be delegated by the operating system to multiple processing cores. Clearly the advantage of a multithreaded program (one that uses multiple threads that are assigned to multiple processing cores) is that you can achieve big speedups, as all cores of your CPU (and all CPUs if you have more than one) are used at the same time.
[…]
OpenMP

OpenMP is an API that implements a multi-threaded, shared memory form of parallelism. It uses a set of compiler directives (statements that you add to your C code) that are incorporated at compile-time to generate a multi-threaded version of your code. You can think of Pthreads (above) as doing multi-threaded programming “by hand”, and OpenMP as a slightly more automated, higher-level API to make your program multithreaded. OpenMP takes care of many of the low-level details that you would normally have to implement yourself, if you were using Pthreads from the ground up.
[…]
MPI

The Message Passing Interface (MPI) is a standard defining core syntax and semantics of library routines that can be used to implement parallel programming in C (and in other languages as well). There are several implementations of MPI such as Open MPI, MPICH2 and LAM/MPI.

In the context of this tutorial, you can think of MPI, in terms of its complexity, scope and control, as sitting in between programming with Pthreads, and using a high-level API such as OpenMP.

The MPI interface allows you to manage allocation, communication, and synchronization of a set of processes that are mapped onto multiple nodes, where each node can be a core within a single CPU, or CPUs within a single machine, or even across multiple machines (as long as they are networked together).

One context where MPI shines in particular is the ability to easily take advantage not just of multiple cores on a single machine, but to run programs on clusters of several machines. Even if you don’t have a dedicated cluster, you could still write a program using MPI that could run your program in parallel, across any collection of computers, as long as they are networked together. Just make sure to ask permission before you load up your lab-mate’s computer’s CPU(s) with your computational tasks!

And even some more on using the GPU (Graphics processor) for general purpose computing.

Looks like a great place for folks who are interested to start down the path of parallel computing. (and / or explains some my babbling here ;-)

GISS Good Programming Guide

Who knew… they generally have good advice, but leave out a lot of stuff I learned in my FORTRAN class about ‘defensive programming’ and how to avoid various kinds of insidious errors. Still, it’s nice to see that they are making an effort.

For those not big into FORTRAN programming practices, you can take a look at the general kinds of comments about how things can vary from one machine to another (i.e. not work right if you don’t use a different set of code…) and contemplate just how many ways in all that complexity you could have a small error that just screws it all up in an insidious not so obvious way… and how much effort it would take to validate everything… and how much of that has NOT been done…

This ‘cut/paste’ will likely lose all the pretty formatting, but it’s just to document what’s at the other end of the link for historical preservation against changes. Hit the link to read it in a nicer format.

The GISS GCM (and all attendant offshoots) has developed into a large distributed effort. This guide is an effort to help integrate GISS-wide good coding practices that improve the efficiency of the code, make it more transparent and hopefully (in the long term) lead to some degree of homogenization.

This guide will be split into a number of sections. Firstly, we will highlight some of the common examples of ‘bad’ code and indicate some better alternatives. These should become common sense programming habits. The next section deals with the ways of getting rid of unnecessary GO TO statements, and discusses more structured approaches. We then outline some of the more useful elements of FORTRAN 90which can be used to enhance the readability of the code. The first appendix describes tools that are under-utilised but can be very helpful. The other appendix contains some of the issues that arise in porting the model code to the SGI machines. Examples are given where we feel the correct behavior is not obvious.

This is not intended to be a comprehensive guide to FORTRAN (or to the GISS GCM). It is intended only to highlight some of the more common problems that occur in the model and should be seen as a useful reference for the programmers and scientists in the building.

Please let us know about any features that could be added to this document. They will appear in the next version (contact gavin@giss.nasa.gov).

What to definitely avoid

* Do not calculate invariant expressions at every time step. Put those calculations whose answer never changes in an IFIRST section and calculate them once only at the start of every run. Note that on the SGI machines, code segments containing an IFIRST section must be compiled with the -static option in order to save the local variables. However, unnecessary compilation with -static can add a lot of overhead, slowing down the code. For instance, QUSBxxx routines do not require any saving of local variables and hence need not be compiled with this option. As a better alternative, all needed local variables should be declared in a SAVE statement and the -static option never used. Care must be taken to make sure that all such variables have been identified before switching over.

* Avoid calculating expressions that do not depend on the loop variable inside the loop. Move such independent code outside the loop.

* Do not divide! Division takes many times longer than multiplication and so if you are dividing by the same denominator more than once, calculate the reciprocal and multiply by that. Sometimes the compiler can do this for you, but this cannot be relied upon.

* Organise your loops and arrays so that the innermost loop is for the first index. For instance, in an I,J,L loop over T(I,J,L). L should be the outermost loop, followed by J, followed by I. Doing it any other way is highly inefficient. If your subroutine processes things by (I,J) grid box, and performs calculations in the vertical (such as MSTCNV, CONDSE etc), write the internal arrays with L as the first index.

The most egregious example of bad looping occurs right at the start of subroutine DYNAM,

On the face of it, there’s nothing wrong. It works fine. However, if rewritten properly the model actually runs noticeably faster. What happened here was that the code probably started out correct just processing UX and UT. Then someone must have added QT and decided that they could save some coding by rearranging. (This is theory-the oldest code that could be found was from 1985, and that has the error. Unfortunately, MB112M9.S can’t be fixed because there are too many .U files to check for conflicts.) Anyhow, a couple of years ago Jean went through all the dynamics routines for the 31-layer model and rewrote all the inside-out loops. It ran 15-18% faster when finished.

As a general rule, common blocks should not be initialised like this since it is not transparent. See the example in the FORTRAN 90 section for the preferred method.

* Calls to mathematical functions ( log, sin, cos, etc.) should be minimised. These are expensive. If repeated calculations are needed, calculate them once and then save the result.

* Use constants for DO loops (not variables). Prior to the PARAMETER statement (in FORTRAN 77), IM, JM, LM, and related quantities were in common blocks. Now, IM, JM, and LM are defined as parameters, and replaced by placeholders ( IM0, JM0, LM0) in the common block. When loops use constants, the loops are set up by the compiler, instead of on the fly at execution time. Quantities in common blocks are treated as variables, not constants, by the compiler so there is overhead setting up the loop. So DO loops will run faster with IM and JM as loop variables, rather than IM0, JM0.

However, there are still people using JMM1 and LMM1 (variables). Instead, JM-1 and LM-1 should be used (which could not be done pre-FORTRAN 77). These are constants and are treated as such by the compiler. As a general point, all constants should be declared in a parameter statement.

Further information on optimization is available in the “XL Fortran: Optimization Guide” handbook or the IRIX Fortran manual.

Unstructured habits

Much of the GCM code (and some of the programmers!) dates back to the punch-card days of yore. In contrast to the current fashion for 1960’s nostalgia, we believe that the code really should reflect some of the advances made by FORTRAN 77 (and maybe even FORTRAN 90!). Many of the most confusing examples are driven by a desire to write compact code. While this was an issue with punch-cards, it is no-longer so important. However, if the desire for compact code is still present, we recommend you use the new features of FORTRAN 90 which allow extremely compact code to be written in clear and unambiguous ways (see next section).

* GOTO bad, IF (..) THEN good. Prior to the introduction of BLOCK IF structures (and now DO…WHILE loops) huge numbers of GOTO statements were needed. However, there is a tendency for GOTO structures to become very spaghetti-like, and it becomes very difficult to follow the logic of the code. Replacing GOTO with any of the other structures makes it much clearer for the reader (and for the compiler).

* Initialisation across a common block using only one array is bad habit. If the common block changes or is rearranged, this can cause serious confusion. See the example in the FORTRAN 90 section for a clearer way of doing this efficiently. (Also see last comment in this section)

* Obsolete code should be deleted. Use a new version number for new code (do not just call it ‘new’ or ‘latest’). If you need a record, you can always look at the previous version stored on Ipcc2 or Ra. Commenting it out only leads to confusion. Leaving it in, and not using it, is inefficient and confusing to the next generation of programmers.

* Over-use of equivalence statements. While equivalence statements are useful for writing compact code, their use should really be restricted for operations that are applied uniformly over the equivalenced arrays (such as initiallisation, input/output etc.)

* IF … GOTO branches out of a DO loop. While this is supported by our compilers and is technically correct, it is not a recommended feature of standard fortran. Problems have occured with compilers/optimizers that did not treat it correctly. In particular, if a loop variable is used outside the loop subsequent to such a branch, this sometimes gave incorrect results. This practice should be avoided if possible. More generally, these branches seriously inhibit proper optimization (since loop structures cannot be changed). Use the DO..WHILE construction instead (see next section).

* Do not use active code lines as continue statements. (This makes Jean CRAZY!) Constructs like this

This is a pain in the neck!!! It also makes it harder to see what’s going on when you do a diff on two files, because it looks like two things were changed, not one. Please use CONTINUE or END DO statements. (In fact with DO … END DO, you can dispense with line numbers entirely).

* COMMON block cautions. Two rules of clean programming that are frequently violated in the GCM are i) out-of-bounds array referencing (as alluded to above), and ii) declaring a common block to be different sizes in different routines. While neither of these are strictly illegal, they do cause problems with optimisers/compilers. For instance, on the SGI the compiler option -OPTreorg\_common is on by default. This pads out common block to make them more efficient (by making sure arrays are on the same page of memory for instance). This is now turned off in setup. Other problems can also occur. For instance, when the compiler encounters common blocks of different sizes it will select the largest, except when the common block is initialised via a block data section even if there are larger references elsewhere. If we reduce our dependence on these kinds of violations, future problems are likely to be minimised.

Common blocks that are used in sub-modules (such as the PBL or tracer codes) are sometimes required in MAIN, INPUT, etc. It is much more straightforward to put these in separate (named) files which are then INCLUDE-d in the code. That way revisions and changes can be quickly accommodated.

New structures from FORTRAN 90

Many of the new features in FORTRAN 90 significantly reduce the amount of coding needed and makes the code much more readable. There is much more to FORTRAN 90 then we can do justice to here, but we particularly want to highlight the array processing facilities. To use FORTRAN 90constructs, you need only change f77 to f90 in your compilation scripts. There is one major caveat: the GCM II’ as a whole does not run if compiled with f90. Some parts of it (the radiation?) cannot be compiled, but others have included f90 features with no problems. Gary’s model can be run with f90. Please ensure that any routines you modify will compile with f90 prior to including any new constructs. Please report any problems (and solutions!) you uncover.

* Array arithmetic. In FORTRAN 90, arrays can be used directly in an expression (as long as it is conformal) without having to loop over the indexes. The example discussed above can be compactly written as below with the compiler deciding the most efficient way to loop over the variables.

Another example is when you wish to only set a limited number of indexes, or ranges of the index. For this, : denotes the entire range of an index or 2:7 for instance denotes just the range 2 until 7 (inclusive). For instance, commonly the polar boxes are set to be the same and equal to the value at I1. This could be written:

P(2:IM,JM) = P(1,JM)

* New constructs: CASE and DO…WHILE. The CASE construct allows you to branch to any number of sub-sections. It is similar to an arithmetic IF or a computed GOTO, but is much more flexible. For instance,

The DO…WHILE construct can replace convoluted constructions involving IF and GOTO statements. In particular, it should definitely replace constructs that include IF statements branches out of DO loop. For example,

DO WHILE (Q(1) xyz.o) or
f77 -64 -static -O3 -c xyz.f (-> xyz.o)
for more speed:
f77 -64 -mips4 -static -O3 -OPT:fold_arith_limit1409
-O2 is almost as fast and safer than -O3 (optimization level)
link options:
-mips4 lfastm (uses library of fast math.routines)
5. Differences in FORTRAN (missing/obsolete features etc):
* READ(…,NUMnbytes) is unavailable
* BLOCK DATA have to be named
* ‘open’ cannot be the name of a COMMON BLOCK
* T,F cannot be used to initialize logical variables, unless
PARAMETER (T.TRUE.,F.FALSE.) is added
* Use STOP rather than RETURN in MAIN.
* The function ERFC only takes real*4 args, use DERFC for real*8 args.
* The compiler warns if you use CALL SUBR(A,..) where A is a variable whereas A is an array in the subroutine (using of course only A(1))
* If the arrays and their dimensions are passed to a subroutine, the dimensions have to be passed also in each entry that passes the arrays (the original HNTRP does not work on the SGIs)
* The following construction does NOT work (with or w/o -static)

Data Sources

Not much to say about it, really. The site “is what it is”, but looks to have collected in one place a nice set of pointers to sources of both data and processing (i.e. model codes). Oh, and it also has links to the “pasteurized homogenized data food products” as well… ;-)

In Conclusion

So between those two you could make a passable start at getting the GCM and other Climate Codes, along with the data, and getting them running on the parallel system of your choice / budget. Be it a COW (Collection Of Workstations) dynamically assembled via boot-from-CD-parallel-cluster-Linux or via an NVIDIA board (and a lot of coding…)

While I personally think the models are complete tosh, based on mistaken assumptions about cause and effect, and ignoring way too much the key drivers of clouds, cosmic rays, solar UV variation, lunar tidal ocean modulated oscillations, and other natural variation: They might be a suitable base for adding in those things (while taking the roll of CO2 way way down in them…)

13 Responses to Tech Bits Grab Bag

You are serious about the intricacies of coding, and I am not. Did some work on some simple problems in the old FoxPro with good results, and in a spreadsheet or two. Anyway, the post brings to mind the 1970’s era GIGO, which always translated to Garbage in Gospel Out. Not much has changed in some areas, though Amazon seems to have it conquered.

For a few hours before Christmas eve I “helped” my son Debug the game he was creating with the “Unreal” gaming engine. It had a very nice GUI front end for the programming that allowed you to create the needed files and keep track of the threading. Not much command line stuff, mostly drag and drop and thread connects from block to block. Very easy to keep track of things if you are neat about your layout…pg

Verizon finally p***ed me off to the point that I jumped ship to the other “Evil Empire” known as ATT. My new cellphone company happily adopted all my cellphones but they could not hack my Verizon Ellipsis 7 tablet.

It goes against the grain to throw stuff away so I tried to use the tablet to display the image from my USB microscope. When that failed I sent it to my sister in England so that she can do the following via her Wi-Fi connection:

Browse the “Web” using Firefox and Chrome.
Use Chrome to access Netflix for $9 per month.
Skype. VoIP video-phone.
IHeartRadio. Try my local radio station, WMMB to listen to Rush Limbaugh, 8 p.m. (UK time).
GPS. This does not need a Wi-Fi connection. You don’t need a keyboard either. Just press the microphone button and speak your destination.

Even though my Verizon service was terminated in September they are still billing me, so I have filed a complaint in the Brevard County Small Claims Court. You can bet Verizon will send a $300/hour attorney to squash me like a bug.

The good news is that I have arranged to have the CEO of Verizon Wireless, Thomas G. Stratton served by the Somerset County sheriff in New Jersey. The best $33 I spent this year.

I am hoping some of you can make suggestions for improving my chances at the pre-trial hearing on February 9.

What is more interesting than the nominal subject of the above article on Switzerland is the subtext of the propaganda. RT is a wholly owned propaganda organ of Russia, and what pitch it is selling at any given time is a good window into the long term intentions of Russia. The subtext of the article is preparing the ground for economic war on the west by undermining the value of the US Dollar through uncoupling it from oil sales denominated in Dollars. Granted the US Dollar does not need much help in losing its value as the Fed is doing a fine job to making it worthless in real terms, but the general public has not noticed that fact yet.
Here we have RT actively selling that situation to all who think it is a news organization rather than a propaganda and disinformation organization. I suspect something is a foot, and we need to get a stick and scrape it off before it starts to stink up the house. I doubt that will happen though, as Congress appears to have no sense of smell and has no clue living only in the present and how their actions influence their next election cycle.

Just an FYI that I’ve been desperately short slept for about a week and been catching up the last two days. I’ll get back to posting soon, but there are some family and personal maintenance issues to work through. Nothing horrid, just PITA.

For example, out of the last 2 weeks, I’ve had maybe 5 full nights sleep, 2 of those just the last two. The other nights just a couple of hours of naps. I’m only now starting to think well again. Also sleeping 15 hours out of the last 24 didn’t leave much time for posting…

But the bulk of issues are past now, so back to normal soon.

(Why so little sleep? It’s a long story and only 1/3 the new grandchild… maybe later..)

@EMSmith; It would appear that your dance card has been over subscribed. A bit more of down time to recharge is necessary. As we get wiser (older) we find that we are a bit faster at exhausting a charge and a bit slower to recharge. Kind of like an older battery. I haven’t come up with a way to solve the problem although a vitamin regime helps.
Larry has been entertaining us with interesting links but he doesn’t talk much. We can survive for a while longer, maybe even learn to write a bit of our own.
An interesting New Year to all that lurk here…pg

Not inclined to ramble on all that much but feel some of you might find some of these links interesting diversions until the Chief gets back into his normal highly interesting observations about all sorts of things. I know I have missed reading new posts here but do not begrudge our host taking some time off to take care of more important issues.

@Larry; This “controls on cash” is really bugging me. The possession of more then $500 cash might trigger confiscation as primiafascia evidence of criminal actives, is evidence of the approach of the time of the Beast when all economic activities require permission and a number from the government, the “Great Beast” that consumes everything!
Interesting aside the Number 666 is the number of Islam. It is the number of verses in their holy book. The recitation of all of them every day is supposed to grant special powers to the recitator.
Government always tries to grow until it consumes all liberty and wealth. It is the nature of Bureaucrats and Politicians to lust after more power and resources, to aggrandize themselves until they destroy everything and the people revolt or just walk away and everyone starves to death. In every case, 300 years is about the length of time that is necessary to go from no real control, to Maximum public manifestations and collapse back to no control. The only solution is to “destroy” them first. It is them or us and we don’t need them! Normal people can live their lives without the “Help from the Government” bureaucrats…pg

An interesting article on one of the groups that are deliberately “Breaking the system” to cause the fundamental change that Obama has promised:http://www.discoverthenetworks.org/groupProfile.asp?grpid=7522.
Only by understanding the causes can we create solutions. There are several groups that conspire to break the system so that they can provide the their New Way to paradise. Slavery to their rule…pg

Postings By Date

Prior Months; postings by date

Meta

To Donate via Paypal or Credit card

Paypal Donation Site.
To make a donation, visit Paypal at the link above and put in the email address pub4all @ aol (DOT) com (leaving out the gratuitous blanks and putting in a "." for (DOT) that is in the text here to defeat spam bots). Many thanks to all!