This is the error i get when i try to run 'java RegexFormat myfile.csv'
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at RegexFormat.ReadFile(regexformat.java:105)
at RegexFormat.main(regexformat.java:70)

i tried to run with -Xmx512m and got this error...
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.lang.StringCoding$StringDecoder.decode(String Coding.java:133)
at java.lang.StringCoding.decode(StringCoding.java:17 3)
at java.lang.StringCoding.decode(StringCoding.java:18 5)
at java.lang.String.<init>(String.java:571)
at java.lang.String.<init>(String.java:594)
at RegexFormat.main(regexformat.java:73)

i don't believe the file can be broken up. The purpose of the program is automate/make it easier to format by allowing the user to just select a file and have it format(replace 8 strings ie. " & " -> " and ", "Northern" -> "Southern", nothing that complex)

Are you saying that:
1)there is only one record in the file and that you have to read the whole record before being able to process it?
2) you don't have all of the editting instructions/formatting rules until all of the file has been read.

i guess i was unsure what you meant by broken up. I guess if the program can parse and process the file line by line it would work. I ahve all the formatting rules already, it's like 8 find+replace strings. Not sure how i would process and build a new string and write that to a file doing it line by line...

The algorithm could be as simple as:
looping thru all lines in the file:
read a line from file1
edit the line as per rules.
write new line to file2
end loop
close both files.
rename/delete input file1
rename file2 to ...

i guess i was unsure what you meant by broken up. I guess if the program can parse and process the file line by line it would work. I ahve all the formatting rules already, it's like 8 find+replace strings. Not sure how i would process and build a new string and write that to a file doing it line by line...

Mean by broken is working on part of data at a time, or small portion at a time. In Java dealing with files, the most perfect way is read line by line. Norm explain that how the algorithm looks like.

I just ran into another problem processing line by line, it's VERY slow. I hae the program temporarily appending to a string(buildMe).
The program has run for over 10 minutes now and i don't believe it's done processing... I dont know if this will speed up when i change code to write directly to a file instead of appending a temporary string

That concatenation above will be slow. It would be a bit faster if you used a StringBuffer to build the string in. But there is no reason to keep the processed string in memory. Write it to a file.
Is there a reason you decided NOT to write the converted line directly to a file?

will speed up when i change code to write directly to a file

Yes it will be a lot faster. Just a little slower than the time to copy the file.

1. it still runs very slow, Takes atleast 15 minutes to process this file. Any tips on optimizing the speed? The for loop runs 8 different patterns against each line.

2. The end file size is much larger than expected. I manually formatted a csv file with notepad and the resulting text was ~3mbs in difference, but the resulting file after running this code is nearly double the initial size.

resulting file after running this code is nearly double the initial size.

Can you look at the output file and see what is in it?
Why do you use a DataOutputStream & writeChars vs PrintStream & println?

Write a test program to see where the time is lost.
Have a smaller input file. Use System.currentTimeMillis() to capture the times for the loop. Compare the times of a version of the program with the replacments with the times for a version of the program without the replacements(ie just a file copy).
Also with another version that just reads the input file.

Create an array of Matchers outside the loop and use it vs calling the matcher() method every time inside the loop.

I just found out that my original file was encoded in Western (Iso8859-1) but the resulting file that's created is encoded in UTF16... Is there a way i can force it to encode back in ISO8859? i tried to 'javac -encoding utf8' to see if it would encode in UTF8 but it still produced the file as UTF16.

Yes you are right. It needs to be inside the loop.
As an exercise I wrote some code to time how long regex took vs using indexOf and substring. The older methods were from 2 to 3 times faster. That was being case sensitive. With case insensitive it was a little less than twice as fast.