During my phd I used westgrid clusters to do some of my computations. Often I needed to interact with a database. It took me a while to get the whole thing working so I thought to share the script with you. Some nice features are that it is trying to figure out an open port that you can run the database on. You never know if something else is running on the port on the node you got assigned already.

I recently started working on a paper for the MSR-data track and started wondering what data format people would prefer.

Personally I usually use four types of data representation the most:

XML

PostGreSQL database tables

MySQL database tables

CSV

R-workspace

I attached a poll, so that you can vote. Of course I am aware that the list is not exhaustive so if you feel strongly enough about your favorite data format that you work with whenever possible let me know in the comments section.
On a related note, if you feel that such a question does not have a general answer but highly depends on the data that you are dealing with, in that case let’s stay practical and consider the data I want to submit to the MSR-data track.

I plan on submitting call-graphs of a java program created for every single commit while marking the methods in that call-graph that have been changed.

I recently needed to store all comments from a list of all work items from a project I was hosting on jazz.net/hub the predecessor of hub.jazz.net and always wanted to write something similarly to git to interact with jazz (started working on it on github (project link)).

I first tried to emulate the interaction as shown on their github integrator which is overly complicated then I stumbled across a library by Kenneth Reitz for doing http requests.
Using that lib the whole authentication and interactions becomes very easy:

Besides the countless examples of discrimination against people whose ancestors are not from the regions the religion originated and all the animals that are considered either as dirty or holy without due cause, I was very intrigued with the story about Mother Teresa. If you want to know how the advances in by Kodak with respect to a role of film made Mother Teresa holy, just google for it or read the book, it is just plain hilarious and frightening at the same time.

Interestingly enough Christopher Hitchens holds Dr. Martin Luther King Jr, but not for his involvement with the church but rather for his courage to go against apartheid which is nothing more than a remnant of the church approved (and actually in the bible demanded) slavery.

If you are up to it and a bit sceptical about religion I recommend reading the book, it contains some pretty fascinating stories.

For our Concurrency class (which is more of a High Performance Computing class) Liam Kiemele and I are implementing a Monte-Carlo simulation that can deal with an insane amount of iterations (relative to the computational complexity for each iterations) as well as dealing with huge amounts of input data to build empirical distribution functions. Since this is a course project where everyone needs to get exposure to programming concurrent/parallelizable applications, and nicely so this application has two parts that need to be parallelized, but let me get back to that.

First of all, why would we choose to look into Monte-Carlo simulations? Through the course of the class we had several guest speakers, some talking about their experiences on research in concurrency/high performance computing, other brought their problems to us in the search for help. Neptune Canada was among those looking for help. They are working on a system for near-field Tsunami detection which requires Monte-Carlo simulation to estimate the likelihood and height of a possible Tsunami. Bottom line, we might see someone actually use our program!

Some of their problems arise from their plan to potentially use millions (or even billions) of input variables. This means two things, (1) they will need more simulation runs and (2) a way to create massive amounts of random input that is based on the distribution of for each individual input variable. Liam is currently taking care of (1) and I will be implementing (2). If you are interested in more details you can visit our project on github.

Let me ramble on a bit about the challenges with the part of generating random numbers. Generating large amount of numbers from a well known distribution such as a Gaussian distribution is easy since the random function can be presented in a very compact way. The issue is, that most of the distributions that underly all variables is not known and the only a massive number of observations for each variable is available. Therefore we will need to build an empirical distribution functions for each input variable, which includes holding as much data for each variable in memory making it necessary to distribute over a set of nodes to allow efficient generation of random numbers.

I will later talk more about the random number generation strategy we choose for our Monte-Carlo simulation as well as how we structured the whole system.

I recently reviewed a couple of papers for my supervisor, and I must say it is always valuable to go through this process as a grad-student for … reasons:

What is publishable? I know that as a grad-student it is very important to publish for various reasons but I still find myself to figure out what presents a unit of publishable work, looking at conference publications gives me only one side of the picture but through reviewing I find myself getting a better understanding on what is good enough and what is not.

Getting exposure to one aspect of what a faculty member does? I guess every grad-student is contemplating a career in academia at one point in time, and reviewing is one aspect.

Evil Tip
I get very excited when I review papers that cite my work, I am pretty sure others feel the same. Therefore to increase chances to get into a conference not only cite the pc’s work but especially their students work.

EditChris Corley@excsc pointed out it is also useful for every grad student’s CV if you are acknowledged as a co-reviewer. And I totally agree!