Share this:

I’ve been interested for a while in mining ADS (NASA’s Astrophysics Data System, an online repository of bibliographic records). Using the ADS developer API, it is quite simple to download bibliographical records as JSON data and do some analysis on a sample of astronomical publications.

The full, detailed analysis (with some caveats) is available here. The main plot derived from the reduced dataset tracks the number of months between successive first-author papers (i.e. between the first and the second, the second and the third, etc.) to test the hypothesis that the rate of publishing papers increases as the author becomes more experienced and entrenched.

On average, the lag between first-author papers decreases steadily from approximately a year and a half (18 months) between the first and the second paper, flattening to approximately 7-8 months by the tenth paper published.

Share this:

latex2exp is a new R package that parses and converts LaTeX math formulas into R’s plotmath expressions. Plotmath expressions are used to enter mathematical formulas and symbols to be rendered as text, axis labels, etc. throughout R’s plotting system. I find plotmath expressions to be quite opaque and fiddly; LaTeX is a de-facto standard for mathematical expressions, so this package might be useful to others as well. You can check it out on GitHub.

“Supported” LaTeX

Only a subset of LaTeX is supported, and not 100% correctly. Greek symbols (\alpha, \beta, etc.) and the usual operators (+, -, etc.) are supported. Additionally, the following symbols and operators should be supported:

Share this:

Emacs Redux made me aware of a new nifty feature of Emacs 24.4 (the upcoming release; I am currently using one of the nightlies). Emacs 24.4 includes a new mode called prettify-symbols-mode, which replaces tokens within code with more compact symbols (typically unicode characters like λ).

I am using the mode to replace the “function” token in JavaScript — which is used all over the place, since it is the syntax to create functions, modules and closures — with the single character λ. This results in significantly more lightweight expressions:

The mode is smart enough to only replace tokens (it doesn’t replace function within a string, e.g., var a = "This is a function"; will appear unmodified). To activate the mode, use (global-prettify-symbols-mode 1) in one of your init files, and add new symbols to the mode hooks:

Ever been annoyed with the Emacs bell? Aside from disabling it, you can make it slightly less annoying by using a visual error indicator in place of an audible bell. There’s a Visible Bell setting included by default, but I find it quite aesthetically displeasing (at least on my nightly Mac build). I like this snipped a lot better (from EmacsWiki):

Share this:

Over the past week or so, I’ve been browsing WordPress themes over the web. Now, I’ve never been completely satisfied with WordPress. While its dashboard is great when you need to get a blog started, I’ve been finding it increasingly stifling — themes are opaque, the post editor is limiting (needs more Emacs!), and I have a general distaste for the large soup of tags generated for each page.

Today I foolishly tried a new theme, thinking that I could easily revert to my highly mangled Twenty-Fourteen theme at any time. Alas, it appears all the customizations were gone when I tried to do exactly that. Luckily, the new theme (Wilson by Anders Noren) is quite to my liking, and I only had to tweak a few things. I also learnt to create child themes in order to avoid losing my settings in the future.

Still, I feel like it is high time for me to move away from WordPress and its pre-made themes, to a simple static website with my own versioned assets. Jekyll looks like the clear winner — it would be well integrated into my current workflow of simple static HTML and CSS, and it would be easily uploaded on GitHub (automatic backup FTW!). Although I don’t have that many pages up, the idea of migrating all the pages and the posts is slightly daunting, so the migration of the blog will probably occur over a few weeks.

In the meantime, any brokenness on the website is due to the theme switching and will be resolved shortly

Share this:

Yesterday, I attended a very handy webcast by Jeroen Janssens called Data Science at the Command Line (a book is on its way). While I do most of my data manipulation from R, it is undeniably convenient to be able to run some simple tasks interactively from the command line, or as part of a shell script or Makefile.

The presentation touched on several command tools that I either wasn’t aware of, or had forgotten about. If you missed the webcast, this website has a helpful list of commands — of both the well-known and the obscure variety. Below are a few that I had not known about and might be useful to others.

parallel

parallel is a shell command to execute a series of commands in batch, over several CPUs on a local machine or over several computers (using a combination of ssh and rsync to connect and transfer files). I used to use a combination of Xgrid and some homegrown shell scripts to achieve that, but Xgrid is unfortunately no more (RIP!). parallel seems like a way to quickly get a small subset of Xgrid’s functionality.

asciinema

I saw Jeroen use this tool during the webcast to record his terminal session. Asciinema looks very very cool and potentially helpful to create terminal-based tutorials. Install via pip.

csvkit & jq

csvkit is a suite of command line tools to deal with CSV files, but works quite well for tab-separated data as well (which I deal with often). Particularly useful so far: csvlook (nicely formatted table in the terminal), csvstat (column statistics) and csvsql (SQL queries on CSV files). jq can be used to manipulate JSON data, potentially piped into csvkit to create simple text tables.

Share this:

I am very happy to announce that Systemic has been awarded the LIFT grant. This means that, together with my colleagues Joel Green and Randi Ludwig, I will be able to work full-time — instead of as a side project — on improving and expanding Systemic, and creating new educational apps like Systemic Live and Super Planet Crash, over the next year!

I am also releasing a new release of Systemic 2 (2.17) which addresses some bugs and improves the documentation for installation on Linux. You can download it now.

Below are some of the changes:

– NEW: ktable function for listing the fit values as a table, suitable for exporting to TeX or HTML
– NEW: Bayesian Information Criterion menu item
– NEW: Quadratic trend term– CHANGED: Periodograms report the normalized power between 0 < p < 1, where power = 1 is a perfect fit. — This is a bug corrected in 2.172.
– FIXED: bug where the semi-amplitude would be calculated incorrectly for massive bodies. (credit: Trifon Trifonov)
– FIXED: bug in simplex that would crash the application if the minimizer encountered a NaN value.
– FIXED: bug in the GUI that would crash the application in case of excessive text output.
– FIXED: Fixed naming of columns in the matrix returned by kdata().
– FIXED: bug where the radial velocity curve or periodogram would look excessively jagged.
– FIXED: bug in kperiodogram.boot where the function could crash.
– FIXED: bug in kperiodogram.boot where the function would only calculate the ‘full’ periodogram (instead of the periodogram of residuals) for certain inputs.
– FIXED: bug in the GUI periodogram routine, where you could receive an error for certain power spectra.
– FIXED: Kernel plot using plot() respects the chosen xlim.
– FIXED: MCMC would crash in certain situations when set up from the menu.
– FIXED: 1LO-crossval returns the signed sum of logs, instead of the absolute value.
– FIXED: clarified the installation instructions (Readme.txt) for Linux. (credit: Franz Feldtkeller)
– FIXED: you can now choose a path for R that is not /usr/bin/R by selecting Help -> Set path to R…
– FIXED: F-test menu item uses the current kernel instead of the kernel named “k”.
– Various bug-fixes in the plotting routines.

Share this:

When Giants Collide — WGC for short — is one of the “fun” projects I am working on. Once finished, it will be a small in-browser simulator where you can collide giant planets together (with some degree of realism). You can see my progress on my GitHub repo and the series of blog posts under this category.

In this little demo app, you can run an N-body simulation in your browser where you make two spheres made of point masses “collide”. You can tweak various parameters (collision speed, impact parameter, distance and number of particles) to change the outcome of the simulation.

Underneath it all: TreeSPH.js

The app above is powered by the portion of WGC’s code that computes the gravitational force between a set of point particles (a gravitational N-body system). The gravitational force is computed using the Barnes-Hut tree gravity algorithm, and the coordinates of the points are evolved using a third-order embedded Runge-Kutta algorithm. The code is available in the GitHub repo for WGC.

I am now working on writing the hydrodynamical part (via the SPH algorithm), which will let me simulate the collision between two gaseous spheres. The resulting library will be called treesph.js.

treesph.js will be an open-source JavaScript library able to power small-scale hydrodynamical simulations — either in the browser (through web workers), or within a JavaScript environment (e.g. Node.js). It comes with:

– a library to set up initial equilibrium conditions (e.g. Lane-Emden spheres, or N-body spheres with isotropic velocity dispersion);
– a canvas-based library to plot and animate the simulation snapshots;
– a fast library for operating on vectors and matrices that minimizes allocations and copying, and other math routines.

Performance notes

While playing with the app, you may be wondering (a) why there is a “buffering” stage before you can see the evolution of the system, and (b) why so few particles?

Buffering: it’s all about delayed gratification

The app animates the particle motion at 30 frames per second. This requirement places a hard and fast constraint: if you want to compute the particle motion at each frame request, then the computation must take less than 1/30th of a second, otherwise the webpage will freeze as the JavaScript engine tries to catch up with the accumulated frame requests. For reasonable number of particles (say, 100 or more — see below) and the time steps required by the above app, this requirement is way overshot.

This issue can be ameliorated by running the numerical computation in a separate thread (a web worker), and drawing frames on the main thread as soon as they are computed. This is still not as optimal: while it solves the UI freezing issue, the particle motion will appear very jerky as it will be animated at (typically) less than a frame per second!

In order to solve this issue, I created a small JavaScript library (streamingcontroller.js; available in the same GitHub repo, documentation upcoming). Streamingcontroller.js first estimates the expected wall time — the time in seconds — needed to complete the simulation. Then, within the web worker thread, it “buffers” the simulation snapshots by adding them to a pool of snapshots. Once the buffer is big enough that the simulation can be run in real time without hiccups, the library starts streaming the snapshots back to the main thread where the animation is drawn. In the main thread, a second buffer receives the snapshots; the second buffer is then emptied at 30 frames per second.

More particles, pretty please?

The default setting of the app is to animate 250 particles (125 per sphere). Why so few, when typical number of particles quoted for N-body simulations routinely exceed millions — or even billions! — of particles?

There are three bottlenecks at work. The first is obvious: the code isn’t fully optimized and profiled yet, and I am certain there is room for improvement. I am writing a small math library of common mathematical routines called math.js (also in the same GitHub repo) which will be fully optimized for V8.

The second is also obvious: simulations with lots of particles are usually run at full-speed, on multiple cores, and in the background. These simulations can save their snapshots, to be plotted and animated at the end of the run. An online app (or game) with real-time requirements (or, say, a <1 minute buffering time) doesn’t have this kind of luxury!

The third is the worst hurdle, and it is inherent to the nature of JavaScript: JavaScript is slow. I am not a JavaScript guru by any means, but I do have a good amount of experience writing performant numerical code in a variety of languages (mostly C). While JavaScript is typically fast enough for most tasks on the web, it isslow on personal computers and even slower on mobile platforms for physically-motivated, accurate simulations. In its present form, it is not well-suited to run these kinds of numerical tasks as quickly as the underlying hardware allows. Although JavaScript interpreters have been improving by leaps and bounds, and careful code can exploit some of these optimizations, they are hitting a wall of diminishing returns. Since JavaScript is the only runtime available on browsers, it is the ultimate bottleneck.

You can check out the other demos using code from WGC in this webpage.

Share this:

In one of my last posts (An interactive Barnes-Hut tree) I talked briefly about one of the “fun” projects I’m working on, When Giants Collide (work in progress, GitHub repo), and promised myself to blog about its development as I went along. I just finished refining the algorithm for building the tree and calculating the gravitational force.

The small app above is a benchmark pitting the Barnes-Hut algorithm for computing gravity (an O(N log(N)) algorithm) against a brute-force direct summation (an O(N^2) algorithm). It calculates the gravitational field of a random collection of particles using both methods for N = 256 to N = 16,384; a lower amount of time spent indicates a faster algorithm. The time used to compute the gravitational force is averaged over 12 iterations to minimize fluctuations. Results are plotted in real time.

Lastly, it calculates an overall “score” for the JavaScript interpreter by only running the Barnes-Hut algorithm for N = 16,384. You can see a table of scores for a few different browsers and devices I have access to (lower is better). If you’d like, send me your score!

Some observations about JavaScript optimization

Chrome turned out to be the fastest browser at this particular benchmark. Surprisingly, a previous version of the same code was actually the slowest on my MacBook — almost 6x as slow as Safari! That was quite unexpected, as in my (limited) experience building web apps Chrome tends to edge out other browsers in terms of JavaScript execution speed.

So I waded a little bit more into my code to understand what was making my code so inefficient. This Google optimization guide and this post on HTML5Rocks (specifically talking about optimizing for V8, the just-in-time compiler embedded in Chrome) proved very useful. What I learned:

Use the idiomatic JavaScript style for creating classes (using prototypes, new, straightforward constructors etc.) instead of using an object factory and closures.

Avoid creating closures, when possible.

Use node.js to profile the application and identify functions that are not getting optimized (using –trace-opt).

Both Safari and Firefox had good baseline scores even before these optimizations. I found it quite surprising that V8 was much more fastidious about my code than the other JavaScript engines.

Another finding was how much slower alternative browsers (e.g. Chrome, Mercury) are on iOS. Alternative browsers use the same engine as Safari, but they don’t have access to Nitro’s Just-In-Time compilation — this means that they will be quite a bit slower than Safari on a computationally-intensive benchmark. How much slower? On my iPhone 5S, almost a factor of 10!

Web workers are awesome

The benchmark runs in a different thread, so that the page itself remains responsive. This is accomplished using Web Workers, a relatively new technology that allows the page to spin off threads to do computation-heavy work. It’s quite well supported, and I found it pretty easy to learn (aside from some surprising quirks). I plan on spinning off some of the tasks in Systemic Live — which currently either block the interface or use timers — into Web Workers (it’ll be a quite a bit of work, so don’t hold your breath).