Message boards : Number crunching : Very long run time.
Author | Message |
---|---|
adrianxw Send message Joined: 18 Sep 05 Posts: 653 Credit: 11,840,739 RAC: 34 |
I have this wu here at the moment. It's remaining is "---" like a finished wu, but the elapsed is 40:09:09 increasing. It purports to be 54.248% completed. This is the second issue I've reported to Rosetta today, I've suspended that wu, and set No Now Tasks pending replies. Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Did you happen to notice if the task was actually getting any CPU time? There are cases where between BOINC Manager and the OS, the task does not actually get CPU time. It would also explain why the watch-dog hasn't detected the problem and wrapped up the WU. Rosetta Moderator: Mod.Sense |
adrianxw Send message Joined: 18 Sep 05 Posts: 653 Credit: 11,840,739 RAC: 34 |
I hadn't but will enable it and watch. <edit> And I saw that the elapsed time reduced to 03:28:23, the time to completion reverted to a normal looking 03:16:28. </edit> Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
adrianxw Send message Joined: 18 Sep 05 Posts: 653 Credit: 11,840,739 RAC: 34 |
The wu has run to completion and reported a success. That, of course, does not alter the fact that the problem occurred. Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,169,305 RAC: 3,078 |
I hadn't but will enable it and watch. This means it wasn't actually crunching but the clock was still running, it was a 'hung unit'. This is a LONG STANDING Boinc problem that happens once in a while but never seems to happen when the 'experts' try to replicate it. Kind of like the noise you take the car to the mechanic for that he never hears, it is there just not when the 'experts' look at it. Doing what you did, suspend the project, waiting a few seconds, and then resuming the project, often 'fixes' the problem. |
adrianxw Send message Joined: 18 Sep 05 Posts: 653 Credit: 11,840,739 RAC: 34 |
I allowed new tasks and have not seen this again. I am, however, removing Rosetta from unattended systems. Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,169,305 RAC: 3,078 |
I allowed new tasks and have not seen this again. I am, however, removing Rosetta from unattended systems. That's what alot of people end up doing. |
adrianxw Send message Joined: 18 Sep 05 Posts: 653 Credit: 11,840,739 RAC: 34 |
This one also twitched the, now sensitized wu radar. Seemed stuck at 97.715%, elapsed going up, to completion not moving. Stopped and restarted, seemed to start progressing again. I'll let it go and the other wu I've got, (unstarted), run then think I'll take a "vacation" from Rosetta. The completed wu's disappear from the list to fast to do any comparison, (I'm thinking wu type etc.). I have always regarded Rosetta as a steady safe project, but it is bordering on the dubious area right now. Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
adrianxw Send message Joined: 18 Sep 05 Posts: 653 Credit: 11,840,739 RAC: 34 |
That one finished fine after restarting it. Grossly low credit though, 31.90 for 36,259.55 seconds. There IS something wrong here. Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
declis Send message Joined: 13 Jul 06 Posts: 1 Credit: 123,727 RAC: 0 |
|
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,169,305 RAC: 3,078 |
The cryo units are screwing me again too: 25,773.17 115.91 20.00 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=533977210 and: https://boinc.bakerlab.org/rosetta/workunit.php?wuid=533939813 25,782.22 115.96 25.64 PATHETIC!! As well as the rb units: 25,782.21 115.96 33.99 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=533965328 I am normally getting around something like this: 9,695.84 43.61 53.33 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=533966921 BUT these darn rb and cryo units are STILL BAD, BAD, BAD!!! |
Polian Send message Joined: 21 Sep 05 Posts: 152 Credit: 10,141,266 RAC: 0 |
Part (most?) of this is working out the kinks and helping develop new techniques by volunteering our PCs' resources. They have made breakthroughs as evidenced in the other subforum and the twitter feed, etc. If you think you're getting "screwed" and that it's "pathetic", then it's probably time to move on, bro. Life is far too short to get your blood pressure up and/or be constantly worried about a background process that others use (to great benefit, granted) to crunch numbers in. Make no mistake: it's helping, whether the units fail or not. From the failures come learning. |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,169,305 RAC: 3,078 |
Part (most?) of this is working out the kinks and helping develop new techniques by volunteering our PCs' resources. They have made breakthroughs as evidenced in the other subforum and the twitter feed, etc. If you think you're getting "screwed" and that it's "pathetic", then it's probably time to move on, bro. Life is far too short to get your blood pressure up and/or be constantly worried about a background process that others use (to great benefit, granted) to crunch numbers in. Make no mistake: it's helping, whether the units fail or not. From the failures come learning. Come 3.5 million and I will be gone, I have a goal that I am trying to meet and until them I am here come good news or bad. |
adrianxw Send message Joined: 18 Sep 05 Posts: 653 Credit: 11,840,739 RAC: 34 |
>>> If you think you're getting "screwed" and that it's "pathetic", then it's probably time to move on, bro The fact remains, however, that a project that is causing "issues" for the cruncher pool WILL cause people to move on. There are a lot of good projects out there now, and if DB wants to, at least, stay where he is in terms of numbers of participants, something needs to be done. Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
Message boards :
Number crunching :
Very long run time.
©2024 University of Washington
https://www.bakerlab.org