My previous two entries explained how I am attempting to port the Python code used to create the graph in the DZone article to “R”.

The author of that article published the data that he used to generate the graph. The data consists of several files. I have taken one file and tried to create boxplots from it. I will improve this by combining all the files, parsing them and generating a combined boxplot. But first I coded this “R” script to parse one file and generate a boxplot.

BoxPlot from one file

I believe our measurements are uncertain and we need to show the errors in our capacity measurement plots. I suspect that we are making fundamental mistakes in our attempts to gather performance statistics and drawing graphs. All the more reason for showing these uncertainties. Our management and clients should not be mislead by the lack of skills of our Capacity planners.

This code and the graph are used to learn one aspect of showing such errors. I am yet to investigate the type of errors and their statistical significance.

I have chosen some values from these guidelines from Charlie Hunt’s book. This is the latest ‘R’ code. That last blog entry has the old code.

I am using some of these general rules just as a foundation for further calculation. Generally our capacity planning teams do not have any baseline. I have not investigated the actual justification for some of these figures.

I know that I am publishing ‘R’ code like this in a hurry. But I plan to add more explanations later on. The comments in the code are missing.

I obtain garbage collection log from a production JVM and isolate the ‘Full GC’ lines. The goal is to draw graphs of utilization and find the mean and recommend a size for the memory pools. I refer to Charlie Hunt’s book on Java Performance.