Reading In File Data One Line At A Time Using ColdFusion's CFLoop Tag Or Java's LineNumberReader

Last week on Twitter, someone asked about reading in files that were too big to fit in the allocated RAM on the JVM. To this problem, I suggested that the developer try using the file line reader functionality built into ColdFusion 8's CFLoop tag. After this discussion ended, someone else asked me to blog about this new CFLoop functionality as they had never heard of it before. As such, I figured I'd put together this quick ColdFusion demo.

As of ColdFusion 8, there are two new CFLoop attributes related to file parsing:

File - The expanded path of the file to read.

Characters - The number of characters to read from the file with each iteration.

While the File attribute is required for file reading, the Characters attribute is not. If the Characters attribute is omitted, ColdFusion defaults to reading in the file one line at a time (as defined by standard line delimiters - \r, \n, and \r\n). In this case (characters omitted), the Index variable of the loop will contain the line data, minus the line delimiters. If the Characters attribute is provided, the Index variable of the loop will contain the number of characters as defined by the Characters attribute (including the line delimiters).

To see both of these scenarios in action (by-line and by-characters), let's take a look at the following ColdFusion demo:

<!---

We are going to be reading in a file, line by line, so first,

let's create a file to read. Define the path to the file we

are going to populate.

--->

<cfset filePath = expandPath( "./data.txt" ) />

<!---

Delete the file if it exists so that we don't keep populating

the same document.

--->

<cfif fileExists( filePath )>

<cfset fileDelete( filePath ) />

</cfif>

<!--- Write some data to the file. --->

<cfloop

index="i"

from="1"

to="10"

step="1">

<cffile

action="append"

file="#filePath#"

output="This is line #i# in this file."

addnewline="true"

/>

</cfloop>

<!--- ----------------------------------------------------- --->

<!--- ----------------------------------------------------- --->

<cfoutput>

<!---

Now, we are going to read the file in line-by-line using

ColdFusion 8's new CFLoop behavior. The File attribute

tells ColdFusion what file to read in, the Index attribute

defines the variable into which ColdFusion will put the

parsed text line.

--->

<cfloop

index="line"

file="#filePath#">

Line: #line#<br />

</cfloop>

<br />

<!---

CFLoop also allows for a Characters attribute. If we omit

this attibute (as above), ColdFusion reads the file line-by-

line. If we use the Characters attribute, however, ColdFusion

will read the file a chunk at a time based on the number of

characters defined.

Here, we are going to read the file in 50 characters at

a time.

--->

<cfloop

index="chunk"

file="#filePath#"

characters="50">

50 Char Chunk: #chunk#<br />

</cfloop>

</cfoutput>

The first part of this demo simply creates and populates the file that we are going to read-in. Then, I use two CFLoop tags - one with just the File attribute and one with both the File and Characters attribute. When we run the above code, we get the following page output:

Line: This is line 1 in this file.Line: This is line 2 in this file.Line: This is line 3 in this file.Line: This is line 4 in this file.Line: This is line 5 in this file.Line: This is line 6 in this file.Line: This is line 7 in this file.Line: This is line 8 in this file.Line: This is line 9 in this file.Line: This is line 10 in this file.

50 Char Chunk: This is line 1 in this file. This is line 2 in thi50 Char Chunk: s file. This is line 3 in this file. This is line50 Char Chunk: 4 in this file. This is line 5 in this file. This50 Char Chunk: is line 6 in this file. This is line 7 in this fil50 Char Chunk: e. This is line 8 in this file. This is line 9 in50 Char Chunk: this file. This is line 10 in this file.

As you can see, when we provide the File attribute but omit the Characters attribute, ColdFusion will read the file in one line at a time. When we include the Characters attribute, ColdFusion will read the file in one-character-chunk at a time.

NOTE: While it is not represented in the rendered output, by-line reading does not include line delimiters; by-characters reading, on the other hand, does include line delimiters.

It's awesome how easy ColdFusion makes some of this functionality. And, while I can't be sure, I would guess that ColdFusion is using Java's LineNumberReader under the covers. The LineNumberReader class provides both by-line and by-characters parsing which makes it ideal for this new combination of CFLoop attributes.

If you are not using ColdFusion 8+ yet, you can still get this kind of functionality by dipping down into the Java layer and invoking the LineNumberReader class directly. ColdFusion provides a clean, simple abstraction for this functionality, so you'll see that using the LineNumberReader directly is quite a bit more complicated.

In the following demo, I am going to replicate the previous CFLoop output using the LineNumberReader class:

Again, the first part of the demo simply creates and populates the file that we are going to be reading. Once that is done, I then use the readLine() method for by-line parsing and the read() method for by-characters parsing. While the readLine() method is fairly straightforward, the read() method requires us to use a strongly-typed character array buffer which, as you can see, greatly increases the complexity of the code.

When we run the above ColdFusion and Java code, we get the following page output:

Line: This is line 1 in this file.Line: This is line 2 in this file.Line: This is line 3 in this file.Line: This is line 4 in this file.Line: This is line 5 in this file.Line: This is line 6 in this file.Line: This is line 7 in this file.Line: This is line 8 in this file.Line: This is line 9 in this file.Line: This is line 10 in this file.

50 Char Chunk: This is line 1 in this file. This is line 2 in thi50 Char Chunk: s file. This is line 3 in this file. This is line50 Char Chunk: 4 in this file. This is line 5 in this file. This50 Char Chunk: is line 6 in this file. This is line 7 in this fil50 Char Chunk: e. This is line 8 in this file. This is line 9 in50 Char Chunk: this file. This is line 10 in this file.

As you can see, this output is exactly the same as the output generated by the CFLoop-only demonstration.

In the above code, you'll notice that the LineNumberReader composes a BufferedReader instance. It's the BufferedReader that really makes this approach (and most likely the CFLoop approach) so efficient. I don't want to talk too much about how buffered readers work, as I'm not really a Java developer; but, they optimize the way the character data is read into memory so as to both minimize disk I/O as well as overall memory consumption.

The CFLoop tag is really one of the most amazing tags in ColdFusion. Between for-loops, query-loops, array-loops, list-loops, file-loops, and conditional-loops there's very little that the CFLoop tag can't do. It makes looping so easy, in fact, that you probably never even think about how much work this ColdFusion tag is actually abstracting. It's like they say - Great design should be invisible. Anyway, I hope this helps clarify how this part of the CFLoop tag works.

Reader Comments

Hey Ben, I just got a quick thought while reading your post, what about reading the whole file into memory (using cffile action=read) and then we use list or maybe better regular expression to split the content string into chunks that we need. For the first case, they can be split into list items using the newline as delimiter. For the second case, we can use string manipulation to get 50 chars at a time (or regexp somehow, gotta think more)

Do you think that will works? Or it would be a lot slower? Like i said, i just thought of it, haven't tried it yet. well, lying on my bed with my ipad right now, so cant test it out :-)

If you can read the file into memory at one time, it will definitely be faster - the buffering is going to add overhead to the performance. I have definitely seen people deal with CSV files in this way - reading in the file, then converting it to an array using listToArray() in which the line delimiters are used as list-delimiters.

If you cannot read a file into memory, however, the buffered reader approach is going to be slower, but critical for overall performance (ie. not eating up all your RAM and causing overflow problems).

@Peter,

This just made me laugh out loud:

>> "The problem with "reading the whole file into memory" is that you're reading the whole file into memory."

@Peter, @Ben hahaha, sorry, stupid me, look like I missed the point of the article :-) I always read the whole file in. I think my applications are just not big enough that I encounter that overflow scenario :-p

Thanks for the article Ben. Next time my app hangs when it reads a file, I know why and will think of you :-p

I suppose you could just call readLine() a given number of times until you get to portion you want. How do you identify the portion you are targeting?

@Henry,

Yeah, good point. My server was hemming and hawing this morning so I didn't have a lot of time to do the most in-depth exploration.

@Vinh,

No problem at all :)

@Robert,

Looks very interesting. I've played around a bit with large XML files and I know there are a number of event-driven parses; I've not had a great success with using them inside of ColdFusion. I'll take a look at this one as it seems to be doing something a bit different.

If I understand your method correctly, you use indexOf() to find the position of the newline character (or whatever char), mark it as the start, then find the next position of the same char, mark it as end and then use mid() to get the portion. Is that correct? If so, why not use list with the newline (or whatever char) as delimiter?

I had this problem last week (reading in a large file) and found the cfloop solution to work nicely. I ran into a timeout issue then though with the cfloop. I managed to solve this by adding my own timer to the page that checks how long the import has been running (read in a line, quick time check, read in a line, time check, etc). Once it hit a preset time, it breaks out of the loop, pushes the user back to the "import" page, I run a javascript redirect (thus CF is now taking a break :) ) which pushes the user back to the CF page to continue the import at whatever line the import left off at. I set the CF Admin timeout to 10 seconds, and my time check to 8 seconds and it was still running (correctly) after 15 minutes. I just make sure to show the user a message to let them know what is happening, but it was a quick way to get around the timeout issues.

@Ben,Thanks! It came down to necessity at the time. It's crazy what you come up with when it's late, you're running out of time and need to have a solution :) I had tried at first to use the method you described in a blog post a while back about extending the timeout so you can perform different tasks, but depending on the file size, this didn't seem feasible (and I'm sure would've caused fits on the server). The only way I could see getting around the timeout was to leave the CF execution and pass the reins back to the browser, then start the CF process up again. As long as you know the timeout in your administrator and set the timer to less than that value, it should run nicely.

Yeah, I've had trouble with my previous suggestion as well. In fact, when I recently changed it, I started getting a *ton* of error emails. At first, I thought maybe I was just getting new errors; what I think was happening though, was that my previous concept was just failing and the emails were never getting sent.

In a file contains the data as |(pipe) delimiter format and the file contains 50 rows. In that file each line 17 th | delimeter data I want, and How to change that 17th column data in same line and how to store that data in to same line in the same file.

how to reads one line from an input file, does some processing on it, and writes the resultant data to the same line in the same file.suppose the line contains the data in delimiter format,I want to change the data in 17 th delimiter then how to add the modified data in the same location(17 th delimiter) in the same line of the same file in cold fusion.

Thanks, the blog post was very useful! I had to analyze very large files. I couldn't read them as a whole because of their size.But you probably should consider closing the readers in the end. ;) (CF < 8 solution)

Thanks Ben for you very interesting articles on handling large files. As I am not a programmer, I am quite insecure whether my issue is directly related to your description or not.My problem is, that I've got a PDF file generated by Adobe Coldfusion which contains several languages. In fact we always only display one of these languages at the same time, but with each language we add to the PDF it takes longer to open (> 30 sec).According to my interpretation of the underlying XML structure the file should be parsed quite quickly and only the lines of the active language have to be considered, but somehow we don't find a solution for that. Can you give us a hint how to solve this issue?

I am the co-founder and lead engineer at InVision App, Inc — the world's leading prototyping,
collaboration & workflow platform. I also rock out in JavaScript and ColdFusion 24x7 and I dream about
promise resolving asynchronously.