We have implemented a process using C# and ReportViewer to create PDFs by the LocalReport.Render method. When we have to process a big file, say more than 20000 records, the process fails with an OutOfMemoryException. I know there may be lot of causes for this error, but I want to know if the below code is okay or if you can give me a better solution. I have about 40 case statements each calling different .rdlc files and it is based on the type; I did not include all the case statements due to lack of space.

USP_Select_Batch_Address_To_Sort is a DataSet defined in the solution executing a stored procedure accepting three parameters (in this case batch, run and sequence #).

if it doesn't work, it doesn't fit code review. have you tested the code you provided?
–
MalachiAug 20 '14 at 18:16

Based on how your question was posed, I am under the impression that your code is working, but running out of memory on large cases; I've edited your question to reflect this and have accordingly voted not to close it. I am not familiar with C#, so if this is not the case, then your question is unfortunately off-topic and will be closed as such; we review only working code here on Code Review.
–
SchismAug 20 '14 at 19:24

1 Answer
1

I probably can't solve your issue with memory but I will say that you are loading a large DataSet into memory from ADO, and then you are not Dispose-ing the DataSet. This will not release the resources associated with the DataSet (it is not guaranteed to Dispose or be GCed after it goes out of scope), which could be a contributing factor.

Also worth noting is that DataSet has poorer performance than, say, an IEnumerable. I know there is a reason for it but I can't put my finger on it and I don't want to do a disservice to you by explaining the wrong one, but you can look here for a question on the performance between the two.

You could convert your DataSet to an IEnumerable after retrieving it from SQL using Cast<T>. After this you can then invoke LINQ comprehensions and operators on the IEnumerable. This will be much faster, and also, will be lazily evaluated - this has the benefit of meaning you can build up a query on the IEnumerable and return it from your method, and it will never be iterated until you actually need to access the elements. Rejoice! Less computational power needed and less memory needed.

This also has the added benefit of preventing you from having to quite literally copy from one DataTable into another (which is heavy enough as it is).

And another tip - Don't make this method return a void. Your method is doing too much right now. You should split it up into multiple methods - like such.,

// Where StronglyTypedRepresentation is a type that holds all of the fields
// you expect to receive back from the stored procedure
// uspSelectBatchAddressToSortTableAdapter
ReportViewer GeneratePdf(IEnumerable<StronglyTypedRepresentation> data) {}
IEnumerable<StronglyTypedRepresentation> MapSqlResult(DataSet data) {}
void SavePdf(ReportViewer reportViewer) {}

Remember that each method ( / type) should only do one single thing.

So, to summarize:

Split your methods up into more granular chunks

Convert your initial data from ADO to an IEnumerable using Cast<T>. LINQ comprehensions are almost always (citation needed - I only have anecdotal evidence to offer) faster. You can always convert what you need to dispatch to the stored procedure back to a datatable if necessary (if you need to convert everything to a stored procedure, then I would suggest doing this computation on the database side as it is optimized to do batch operations)

Consider that maybe your files are too big - if this is the case, find some way of reading them in chunks or splitting the files up before reading them

@Dan..Thanks for the response and its pretty informative. You definitely made some valid points and I will try to implement them. Just to be on the same page, which dataset are you talking about here? If you notice my code, I use two. One to get all the records to get the count and also enter a loop. Second dataset is used to bind to the report and this dataset uses a tableadapter and accepts parameters. So at any point of time, this dataset will return just one record.
–
AndyAug 20 '14 at 20:31

The data set I was talking about was the one that you fill with the contents of the adapter. I meant to say that you use Cast on the table of the data set, not the data set itself. Surely the table has more than 1 record when in your OP you state it has ~20k? :)
–
Dan PantryAug 20 '14 at 20:33