Tuesday, September 23, 2008

Reading and writing blob data to/from your database can be a resource hog, because it normally involves holding the entire stream in memory. Holding 10kb, 20, 100kb in memory might not be an issue, however as you start allowing larger file uploads, your application starts to feel the stress and easily starts to run out of resources.

Particularly, say you had a relatively small file (2mb) your allowing users to upload from your pages. User A uploads, it's only 2mb, you have not exceeded your memory allocation yet. However, as more users try to upload simultaneously, now your allocating more memory slots at a time, eg. if they were 5 simultaneous users, that's 2*5 = 10mb, 100 users, then it's 2*100.

This is all too stressful on your webserver, or atleast was until ASP.NET 2.0. Since, things have changed and now asp.net supports setting a diskThreshold that allows you to specify the amount of data buffered in server memory for the request. So, say you wanted to keep the default diskThreshold of 256 bytes, when the file is larger than 256bytes, asp.net begins to write the filestream to disk Versus keeping it in memory. This is just great because i remember back in the day, this was quite annoying. Glad it's fixed. This means we do nothing as far as the upload is concerned from the client to your webserver.

Still, if we are storing the stream in SQL Server, now we have a whole new scenario, and the question remains, how do we stream this data to sql server while conserving memory ? Should we stream the entire data as is in one lump ? Wont that allocate memory on our database server before writing to the database field ? This is what i want to address. Basically, what we want to do is save our file stream to sqlserver in smaller chunks, that way our database server will allocate only enough memory for that little chunk before writing it to the database field, in this way conserving memory at the cost of write speed(no free lunch, i'm afraid).

However, the price to pay in latency for streaming chunks of data to sql server might be well worth, because the alternative is to tax your database server when memory is available and to crash it (I assume it may crash when it runs out of memory. I have not tried personally. You may try that yourself if your feeling curious. Do not forget to share your findings if you do :P)

Take, SQL SERVER Express edition as an example which has the imposed limit of 1gb memory. That's not a whole lot is it ? Streaming the data to sql server in chunks is actually not as taxing as it may seem. True, the write performance is degraded but you can regulate it by increasing/decreasing the size of your chunk to find a compromise.

To Quote the documentation from the above link :
Because the .WRITE clause cannot be used to modify a NULL column, the column is first populated with temporary data. This data is then replaced with the correct data by using the .WRITE clause

Now, that's not cool! We can work around that nicely though. Ineffect, it does not even feel like were working around anything. I've basically followed microsofts own documentation located at : http://msdn.microsoft.com/en-us/library/aa719771.aspx, which had the proper code, however, instead of .Write, it uses an older method that is not compatible with the newer varbinary,varchar(max),nvarchar(max) datatypes supported by SQL Server 2005 as opposed to the older not recommeded types -> image, text, and ntext.

Following is an extract of the code i use to stream chunks of data to sql server using the .Write method. The purpose of the sample code is to display how you can go about writing your own ofcourse. I have just ripped the piece of code out of my DAL with a few minor adjustments, so just running a copy/paste of the code below will or may not yield much depending on your level of expertise. I am just way too lazy to write a completely contained example so bear with me. As it is, this article has gotten longer than I want it to be.

Note above how we introduce the value 0x0 (null) as per the article on MSDN into the data field ? "data" is the name of the field that will be holding our data stream and is a varbinary type. The reason we introduce the value 0x0 is because .Write cannot write into a NULL field. This is the workaround i was talking about earlier and it does not effect the appended chunks in anyway.

As you can see, it all boils down to this. Quite simple indeed. the .WRITE method takes as argument your chunk, the offset and the length. Not much for me to add here. Oh one thing you may also want to note is that .WRITE is called via the field to which it needs to write to. so the syntax takes the form of -> fieldName.WRITE(...) ; in my case the field name is "data" as you can note from the code in the stored proc above.

We have now successfully stored our file stream in chunks. Next we want to see how to retrieve our filestream in a similar manner, that way we do not hog resources on our webserver when binary content is requested, requiring us to stream it out from the database server to the webserver where our code streams it out again to the client. Firstly, I hope you are already familiar with an HttpHandler, if not you can look it up on google.

An HttpHandler/Module is general knowledge so i'm not going to get into it. I've mapped somefile.ashx to my HttpHandler, that way, every request that comes in will be handled by my custom handler, where i request the file stream from the database and then stream it out again in chunks to the client. My HttpHandler looks like this :

A few things i will explain quickly from the above code is how we retrieve an IDataReader from our DAL instead of retrieving a stream. The reason is i want to retrieve the filestream in chunks from the DataReader and every chunk i read from the database, i want to write it out to the client, as i receive the chunks without storing them in memory on my webserver where my application resides.

Another thing you may want to notice is the usage of the Http Header Content-Disposition and Content-Length ; basically, it was important for me to flush out the file with the original filename when the user choses to save the file and Content-Disposition helped there. Whereas Content-Length helped ensure the filesize on the receiving end (the client) since without it i was getting errors unzipping a file for example where winzip complained that the file was corrupt :-)

So that's pretty much it. And oh, the method in my DAL that returns the IDataReader follows :

With that i think we have succeeded in conserving resources on both our database server and our Web server. One last thing, when you got this all working, experiment by changing the buffer size and regulate it to the write speed that is acceptable as per your standards. Basically from my tests on my dev machine, i was getting a download speed of 500KB - 517KB/sec with a buffer size of 100kb (102400 bytes). While setting the same buffer size to 10kb (10240bytes), i was getting a download speed of 50KB - 60KB/sec ;

These tests are ofcourse based on what i see as i request a file handled through my custom httphandler above. These values are more or less the same i reckon when streaming the same data to sql server in chunks or a good approximation, so use that as a starting point to regulate the desired speed/latency required by your app based on how much resources your webserver/database server has. Normally, uploads are usually restricted to a certain amount by most ISP's so it means you can set a much smaller chunk size when streaming from your application to sql server. And increase the chunk size when streaming from your webserver to the client (since your requesting clients download speeds are significantly high). It will also depend on the bandwidth availability of your webserver and other factors ofcourse, but that's off topic.

Update 23 september 2008 :

One last thing i forgot to mention but my friend filip duyck brought up is that now since we are sending the data in chunks to sql server (whilst reusing the same connection), it's still additional hits to our database server. So consider that too while you evaluate how big you want your chunks to be.