Tag: research

Finally decided to move to ggplot2, after years’ use of gnuplot. I love gnuplot’s simple and straightforward, though its syntax could be a pain sometimes. All the plots in my previous latex docs are generated from gnuplot, using its nice latex terminal for publication (this is still unmatched by ggplot2, about this in a minute).

One thing that motivates me to use ggplot is Hadley’s book. The idea of grammar of graphics looks very sweet. Another reason is that by moving in R ecosystem I can use sweave to produce dynamic docs for some previosu docs (new docs will be written in org-mode so that’s not necessary). As to this, another option is to use org-mode (babel) + matplotlib + latex. I love the elegant idea of writing whole thing in org, and there are plenty examples to check out, see here and here. On the other hand, sweave is just that simple but has been having some long-time problems (some are addressed by knitr). As to matplotlib, it does the work alright, but in my opinion the plots are not as nice looking as those from ggplot.

However, gnuplot and matplotlib have one great advantage against ggplot, which is latex text rendering in plots. Latex rendering is supported by activating the latex option in matplotlib, similar to using latex terminal in gnuplot. While in ggplot, there is no such thing out of box. Workaround includes using tikzdevice (seems the project is in stall) or extrafont, but neither is satisfactory to me. That’s why org-mode gives a straightforward (though not satisfactory) solution by allowing me use gnuplot when needed, at least for a while, because better solution appears.

I was going to say the MS Word by itself is from hell, at least for technical writings. Recently I had to convert my tex writing into word, everything was tolerable, including the conversion of the equations since Mathtype recognizes tex, until at the end when I was converting the reference and citations. There is a “manage source” option in the Word, but it only works for Endnote. For the record, that is bullshit: one product tries to force its user to adopt another equally pathetic commercial product? What an ally!

Then I decited to do it myself.

I regetted it in five minutes.

For any bibtex user, I can not even describe the frustration looking at the fancy menus that Word’s reference management “ribbon” (yeah, they really named it that way!) demonstrates. I don’t think any latex user who hasn’t lost his mind would achieve the task of converting 100+ references manually. For those who are familiar with this kind of problem, the bib2endnote by Trend Apted didn’t work: somehow the generated XML was denied by the Endnote parser in the Word.

So what did I do? I copied the texts of the bibliography and pasted them in the versatile Word, since I don’t need to edit them in the fugure anyway: it’s for whoever asks for the word version, no matter how pathetic it is.

An interesting editorial from Journal of Fluids Engineering by Malcolm J. Andrews, essentially about is worth of showing to people in numerical modeling of flows, contains some guidelines for submitting numerical results to JFE, such as:

Providing complex figures may be nice for a presentation to an audience or sponsor, but are often not scientific or quantitative unless great care is taken in their presentation/discussion or they illustrate a novel aspect of the work, and rarely provide much detailed insight into the problem under consideration. The usual x-y plots are often more illuminating, but also require the author to spend more time thinking about what is important and why which helps make the article more archival.

Simply reporting one parameter when, in fact, there are multiple parameters that self-interact suggests that the author does not understand the diagnostic, or its proper use, and also the basic elements of the flow itself e.g., simply reporting pressure, and not associated velocity fields, might indicate a lack of basic understanding. The article should include a detailed description of the results, their consequences, and their importance i.e., simply stating values or shapes does not warrant archival.

Nondimensional parameters serve not only to collapse data but they demonstrate an understanding of the basic parameters that control the processes of interest and form the basis of generality that can underlie resulting archival value formulas. Not expressing results in nondimensional form substantially weakens the archival value, suggests that the author does not understand the fundamental flow physics, and also suggests that the results have no generality or archival value.

It is crucial to provide the applicable parameter ranges for the commercial software or diagnostic and ensure that they are met in the current application e.g., this might mean answering the question about an appropriate use of a turbulence model, the Reynolds number range of the experiment, or the Stokes relaxation time of particle in the flow relative to the time scale of interest in the flow.

In the academia community a code writer is allowed to be sloppy, sometimes in the name of exploring, but apparently that the whole story. It turns out I am not the only one suffering from a sloppy but not inspiring code in computational science: most recently article in Nature says that I am just tiny part of the deep trouble.

As a general rule, researchers do not test or document their programs rigorously, and they rarely release their codes, making it almost impossible to reproduce and verify published results generated by scientific software, say computer scientists. At best, poorly written programs cause researchers such as Harry to waste valuable time and energy. But the coding problems can sometimes cause substantial harm, and have forced some scientists to retract papers.

Well, the CFD code I am working on is the just the case: almost no comment, or any notes from the writer whatsoever, let alone verifying it’s implementation. And I apparently agree with some comment the mentioned article:

“There are terrifying statistics showing that almost all of what scientists know about coding is self-taught,” says Wilson. “They just don’t know how bad they are.”

Well, there is nothing wrong about “self taught coding”, actually I believe when it comes to writing code, self teaching is almost the only way to do it. As written in “Programming in Emacs Lisp“, the writer describes how a friend of him learns a new language:

I prefer to learn from reference manuals. I “dive into” each paragraph, and “come up for air” between paragraphs.

When I get to the end of a paragraph, I assume that that subject is done, finished, that I know everything I need (with the possible exception of the case when the next paragraph starts talking about it in more detail). I expect that a well written reference manual will not have a lot of redundancy, and that it will have excellent pointers to the (one) place where the information I want is.

I believe that’s the way many computational scientists adopt. What should be taught by others, on the other hand, is the style of coding, which is indeed what programmers learn in school, and I was surprised when found out that many people around have no idea about it, some of them are even from computer (no, NOT computational) science community. The article gives five tips for “amateur” coding:

Version control

Tracking raw material

Write testable code

Test it

Share it

Leave the last one alone, I am level 4, and unfortunately whoever wrote it, he missed mostly all of previous 3 levels, and I am paying for that. GIT system comes pretty handy for vc, especially when someone already wrote a post on using it in research, tip my hat to No.6 of “10 reasons to use Git for Research”:

Keep track of your grad students.

Suspect your grad students are slacking? Check the commit logs! And now I prepare for hate mail from grad students. However, I think that if I had this form of accountability, it would have made me more productive. Of course, you don’t need Git for this, any version control system would do. Of all the systems I’ve used, Git’s presentation of changes is the user-friendliest.

Well, that’s pretty…. evil.

Keep track of your grad students.

Suspect your grad students are slacking? Check the commit
logs! And now I prepare for hate mail from grad students.
However, I think that if I had this form of accountability,
it would have made me more productive. Of course, you don’t
need Git for this, any version control system would do. Of
all the systems I’ve used, Git’s presentation of changes is
the user-friendliest.