Message boards : Number crunching : Not sending out failing WUs over and over!
Author | Message |
---|---|
tralala Send message Joined: 8 Apr 06 Posts: 376 Credit: 581,806 RAC: 0 |
I made a proposal a week ago to set maximum number of results to 1 in order to avoid sending out failed WUs to several hosts. Rhiju respoended positively: I like the idea below of not passing on bad jobs to another client when they fail -- so only 1 computer will have the problem, not 4. I'm running this idea by David Baker and David Kim now. Unlike other BOINC projects its not critical for every single workunit to get processed. Its way more important to keep bad workunits from causing trouble! But no decision has been taken so far. I realize that we all hope with the new watchdog-technology stuck WUs will be aborted BUT reported valid so that they won't be send out agan. Nevertheless as a safety net I would still recommend not to send out failed WU again (at least not automatically). |
MAOJC Send message Joined: 19 Jan 06 Posts: 15 Credit: 2,727,567 RAC: 0 |
Actually a nice this would be to have Rosetta automaticly send them to a queue on their computers instead of "out into the wild" Some simple logic and a few desktop systems should be adequate to got a diagnostic loop running to correct the code issues. |
Rhiju Volunteer moderator Send message Joined: 8 Jan 06 Posts: 223 Credit: 3,546 RAC: 0 |
Hi tralala: We've decreased the number of users that get a workunit from four to two. Sorry, I forgot to post this in the R@H forums! Will do after the next release. I made a proposal a week ago to set maximum number of results to 1 in order to avoid sending out failed WUs to several hosts. Rhiju respoended positively: |
tralala Send message Joined: 8 Apr 06 Posts: 376 Credit: 581,806 RAC: 0 |
Hi tralala: IMHO that's a good decision. It's not effective for past WU though is it? I see a lot of recent error reports on WU from March which are now at the fourth host. However this will eventually correct itself. |
Message boards :
Number crunching :
Not sending out failing WUs over and over!
©2024 University of Washington
https://www.bakerlab.org