Not sending out failing WUs over and over!

Message boards : Number crunching : Not sending out failing WUs over and over!

To post messages, you must log in.

AuthorMessage
tralala

Send message
Joined: 8 Apr 06
Posts: 376
Credit: 581,806
RAC: 0
Message 14735 - Posted: 27 Apr 2006, 8:04:51 UTC
Last modified: 27 Apr 2006, 8:05:42 UTC

I made a proposal a week ago to set maximum number of results to 1 in order to avoid sending out failed WUs to several hosts. Rhiju respoended positively:

I like the idea below of not passing on bad jobs to another client when they fail -- so only 1 computer will have the problem, not 4. I'm running this idea by David Baker and David Kim now. Unlike other BOINC projects its not critical for every single workunit to get processed. Its way more important to keep bad workunits from causing trouble!


But no decision has been taken so far. I realize that we all hope with the new watchdog-technology stuck WUs will be aborted BUT reported valid so that they won't be send out agan. Nevertheless as a safety net I would still recommend not to send out failed WU again (at least not automatically).
ID: 14735 · Rating: 1 · rate: Rate + / Rate - Report as offensive    Reply Quote
MAOJC

Send message
Joined: 19 Jan 06
Posts: 15
Credit: 2,727,567
RAC: 0
Message 14749 - Posted: 27 Apr 2006, 13:47:16 UTC

Actually a nice this would be to have Rosetta automaticly send them to a queue on their computers instead of "out into the wild" Some simple logic and a few desktop systems should be adequate to got a diagnostic loop running to correct the code issues.
ID: 14749 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rhiju
Volunteer moderator

Send message
Joined: 8 Jan 06
Posts: 223
Credit: 3,546
RAC: 0
Message 14778 - Posted: 27 Apr 2006, 18:03:10 UTC - in response to Message 14735.  

Hi tralala:

We've decreased the number of users that get a workunit from four to two. Sorry, I forgot to post this in the R@H forums! Will do after the next release.


I made a proposal a week ago to set maximum number of results to 1 in order to avoid sending out failed WUs to several hosts. Rhiju respoended positively:

I like the idea below of not passing on bad jobs to another client when they fail -- so only 1 computer will have the problem, not 4. I'm running this idea by David Baker and David Kim now. Unlike other BOINC projects its not critical for every single workunit to get processed. Its way more important to keep bad workunits from causing trouble!


But no decision has been taken so far. I realize that we all hope with the new watchdog-technology stuck WUs will be aborted BUT reported valid so that they won't be send out agan. Nevertheless as a safety net I would still recommend not to send out failed WU again (at least not automatically).


ID: 14778 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
tralala

Send message
Joined: 8 Apr 06
Posts: 376
Credit: 581,806
RAC: 0
Message 14780 - Posted: 27 Apr 2006, 18:12:03 UTC - in response to Message 14778.  

Hi tralala:
We've decreased the number of users that get a workunit from four to two. Sorry, I forgot to post this in the R@H forums! Will do after the next release.


IMHO that's a good decision. It's not effective for past WU though is it? I see a lot of recent error reports on WU from March which are now at the fourth host. However this will eventually correct itself.
ID: 14780 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : Not sending out failing WUs over and over!



©2024 University of Washington
https://www.bakerlab.org