Our web app is hosted on a virtual machine with 8 vCPUs. We have an intensive data operation that runs on a nightly schedule (console app / windows task scheduler) which I'd like to parallelize somehow. The operation iterates many times over many data sets to calculate different statistics. Currently when it runs, task manager shows that its CPU usage never goes above 13%.

Here is code from one of the methods that gets called (the web app is a large questionnaire):

...but nothing I do pushes CPU usage above 13% and the time taken to execute the method stays pretty much the same. What trick am I missing here? Parallel programming is new to me so I'm trying to make use of PLINQ/TPL as simply as possible.

DbContext is not thread safe, so you'll have to create a separate one per task/thread you want to use. At best, it may slow you down due to locking, at worst it just won't work every time.
–
Joachim IsakssonApr 13 '14 at 10:02

Is probably performance problem, because it is hitting database for each year, section and question, which is a lot. You should prefer preloading everyting into memory with single query and work with in-memory data.

Also, I forgot to mention: Before you even try any kind of performance optimalizations, you should profile your code. This way, you know if your problem is I/O bound or algorithmic, which will dictate way you should optimize the code.

Thanks for the tip. I tried this - pulled database sets into memory and used collections instead of db.. Weirdly the method took even longer to execute this way. RAM usage increased plenty as expected, but no performance gains. Also going to try ANTS profiler, thanks
–
user982119Apr 13 '14 at 10:50

@user982119 That is weird. Can you post the modified code?
–
EuphoricApr 13 '14 at 11:19

Now when I look properly at your code, it seems it can all be solved using one simple query.
–
EuphoricApr 13 '14 at 11:30

List<Answer> answers = db.Answers.ToList(); and replaced "var answer = db.Answers.Where..." with "var answer = answers.Where...". Did that for the other entities as well. What simple query is that? Could you give me a hint at least?
–
user982119Apr 13 '14 at 11:59

@user982119 And where is that first query? Outside the questions loop? The emphasis should be put on minimizing number of queries on DB, not pulling stuff into memory, that is secondary. Also, all 3 loops and the dictionary could be written as single query that joins multiple tables and does a group by. And this whole query would run on DB. No need to do anything in memory.
–
EuphoricApr 13 '14 at 12:09