I'm loading a very large data set into R from java. I have created a Java program that calls R using rJava's jri. This program has been wrapped up into an executable jar file and is being called from the terminal (linux). The data is in region of 50 columns by 13.7 million rows. R alone can handle this without a problem. However, when I run it from the Java program, I get a Java OutofMemory Heap error.

The thing is when I run it with half the rows it works, yet R should only be sending the names of each variable (50 in total) back to java regardless of how many rows there are. This is the code I'm using:

re.eval("names(data<-read.csv(file="data.csv", head=TRUE, sep=","));

My understanding is that the re.eval function, evaluates an expression in R and sends the results back to R. Is there any way for you to evaluate the function and not have the result returned to java?

2 Answers
2

One way to do it that would allow you to call R without having anything come back to Java would be to call R as an external process. Since it looks like that is roughly what you are doing anyway, perhaps having the OS execute the call to R, rather than the library inside of Java, would prevent the overflow.

Would that keep the Rengine running in the background? After I load the data in, the user can interact with the program, picking out vcariables, graphing them, manipulating them etc. I need this function to remain. Also, do you know why it is running out of memory? why is it working with half the amount of rows if it is only returning 50 names?
–
Aran BradyDec 2 '11 at 15:10

I'm not particularly familiar with the package you are using, but my guess is that, inside the Java library, it is loading all the data from the CSV file and simply running out of room on the heap. (meaning it is the outgoing data that causes issues, not the inbound data). You could try increasing the heap space that the JVM is running with: viralpatel.net/blogs/2009/01/…
–
cdeszaqDec 2 '11 at 15:14

Have you tried adjusting the JVM Heap size by starting the executable with options?

Like:

java -Xmx1024m -Xms1024m myJar

You can adjust the memory values, obviously, but the option -Xmx sets the maximum heap size for the JVM and -Xms sets the initial size.

This may help if you are processing a lot of data that you actually need to retrieve, otherwise options (as suggested by cdeszaq) where you don't get any data back would obviously be best suited for you.