One of my machines picked up a rather long running work unit. The machine is a 2.4 GHz Q6600 with 4G of RAM running linux64 so should be powerful enough.

It is current at 6 days compute time and went high priority yesterday. The time to complete continues to increase and if it is correct, will no longer complete before the deadline. Two people have already aborted Workunit 157852, two people have had errors and 9 of us are still grinding away at it.

I read the thread that says a 2 GHz athlon will finish all work in a couple of hours. If this is true then shouldn't the work automatically abort after a day and not tie up a core for a week when you guys give us buggy work?

Are all work units that have run more than 2 hours buggy and need to be aborted? It seems wasteful to tie up resources for so long when they would be more productive doing something else.

fractal wrote:I read the thread that says a 2 GHz athlon will finish all work in a couple of hours. If this is true then shouldn't the work automatically abort after a day and not tie up a core for a week when you guys give us buggy work?

Well, what thread are you actually talking about? I ask, because I need to inform you that this information is completely incorrect. There are CMC work units out that require more than a week of full-time computing even on such a powerful machine as you luckily have one. Just as said in our FAQ. Of course, I cannot guarantee to you that it will finish in time because you did not give any information on the deadline but normally, this WU should be completable by your machine - this at least I can tell you. The progress bar is good but not strictly reliable since with the very long WUs indeed we observe a steady increase in run time estimation while the machine continues to compute. Later on, the remaining run time should start decreasing or the WU just suddenly fnishes validly. Please check in your Linux system monitor whether that WU is a zombie task or not. If not, I would keep it running. I hope this helps a bit.

Michael.

P.S.: We do not send out any buggy WU and no WU requires manual abortion. There are also only very few if any reports on errors with this project. Quite frequently, people just have not enough patience to wait for proper completion. In this case, it would help us to improve project overall performance by deleting early if a WU seems too long because by doing this we can send it out again quickly. Anything else will just delay the entire project progress.

It is currently at 145 hrs of processing and time to complete has fluctuated between 10 and 12 hrs for the past two days. Report deadline is 5 hrs from now. You might want to take a look at that unit since, as I reported earlier, nobody has finished it and 64 bit linux cores that could be used to clean up the backlog of pending are tied up on it.

No, you are referring to something unrelated. Here we have a CMC WU and not a CMS set of the CSP type which is the topic of the thread you link to.

fractal wrote:It is currently at 145 hrs of processing and time to complete has fluctuated between 10 and 12 hrs for the past two days. Report deadline is 5 hrs from now. You might want to take a look at that unit since, as I reported earlier, nobody has finished it and 64 bit linux cores that could be used to clean up the backlog of pending are tied up on it.

Hmmm, that means you must have had that WU quite long in your queue without working on it, because the CMC WUs use to have a 14 day deadline. So, if you invested 145 hrs and the final deadline is in 5 hrs - well, you can calculate yourself. I do not know whether it will be completed in time, but I can do a test run on a 955 BE to see what run time is expected. Maybe just keep it running if you do not mind. It might really help us fix a problem.

Michael.

[edit]: Run time estimate on AMD 955 BE (3.0 GHz Quad): 85 hrs. I had this WU before, it took more than 145 hrs then my machine was accidentially detached from BOINC - restart. I noticed that building this WU takes 2.2 GB of RAM! Could you please have a look whether your machine is swapping RAM to disk due to memory limits? Is it possible that this causes the delay? In that case, we would need to assign higher RAM requirements than we use presently. I checked that the forecast runs of this WU use already 800 MB of memory. So, the original run will most liely require more. The problem s that RAM requirement of the CMC WUs cannot be precomputed. That is a problem and therefore we are in contact with the developers to have a RAM forecast function in a future version.

As the deadline is already blown, should I abort those or should I wait another ~340hrs for them to complete ?(i.e. will the server still accept them ?)

-- edit --

Restarting them seems to have resetted their figures. I guess I'll keep an eye on how they proceed (?)

That is strange. My 955 BE estimates a run time of 244 hrs for this WU (will most likely be a bit more) with a RAM usage of up to 1.2 GB (might be higher, will increase and even change up and down during computation). Could you please also check for swapping of RAM to HD?

boinc ran some units from another project and switched to other units of the same project leaving them in memory. This eventually ate all of memory. I have suspended all other projects until this unit completes.

It is weird. I have told boinc not to switch between applications, but it just won't listen. It just loads them all up in memory and alternates between them. Silly program.

No, please keep it running. If the machine is not swapping and no zombie task is detectable, I do not see a reason why the WU should be dead. It should count down, soon. I just had that sitation on my box.

And another very long unit (513689, cms_6S6[e]_Monodelphis-domestica-(gray-short-tailed-opossum)_CM000370.lin.EMBL_f_1268060823_33_0), running on a Q6600/2.4GHz, but only 2GB RAM.

It's currently at 63¾ hours @ 9.1%, so ~637 hours/26½ days to go at the current rate; the progress bar is clicking up 0.001% per tick. The deadline is 18/3 (just under 7 days); it's showing 125 hours to go, so BOINC hasn't put it on high priority yet.

Al Dente wrote:And another very long unit (513689, cms_6S6[e]_Monodelphis-domestica-(gray-short-tailed-opossum)_CM000370.lin.EMBL_f_1268060823_33_0), running on a Q6600/2.4GHz, but only 2GB RAM.

It's currently at 63¾ hours @ 9.1%, so ~637 hours/26½ days to go at the current rate; the progress bar is clicking up 0.001% per tick. The deadline is 18/3 (just under 7 days); it's showing 125 hours to go, so BOINC hasn't put it on high priority yet.