WUs freeze !!! computer 117981

Message boards : Number crunching : WUs freeze !!! computer 117981

To post messages, you must log in.

AuthorMessage
Profile Carlos_Pfitzner
Avatar

Send message
Joined: 22 Dec 05
Posts: 71
Credit: 138,867
RAC: 0
Message 10646 - Posted: 10 Feb 2006, 22:39:58 UTC

I am with a problem that some WUs
running on subject computer freezes !!!

rosetta 4.80

Initially I just killed boinc
and after ps xu clears
restarted boinc again

note that rosetta is the only project running
on this computer

top
shows 0.0% of use of cpu by rosetta
killing boinc and restarting boinc
only servers to I lost more time doing nothing

the freezes occurs again

Today I lost more the than of 4 hours of cpu IDLE !

Finally I discovered that aborting that WU
via remote gui rpc

that next WU comes to crunch normally !!

A big problem, for my *unmonitored* server

With these freezes I can end with a week of CPU IDLE
Else, to monitor that freezes and abort offending WUs
I have to pay a very costly $$$ diallup connection

*Please, that WUs cannot auto-abort ???

see a example of a returned result
Result ID 10549605
Name BARCODE_30_1ubi__299_25012_0
Workunit 8520285
Created 10 Feb 2006 11:13:57 UTC
Sent 10 Feb 2006 12:50:57 UTC
Received 10 Feb 2006 22:11:01 UTC
Server state Over
Outcome Client error
Client state Computing
Exit status -197 (0xffffff3b)
Computer ID 117981
Report deadline 17 Feb 2006 12:50:57 UTC
CPU time 551.93
stderr out <core_client_version>5.2.14</core_client_version>
<message>aborted by user
</message>
<stderr_txt>

</stderr_txt>


Validate state Invalid
Claimed credit 1.64110392447523
Granted credit 0
application version 4.80



Click signature for global team stats
ID: 10646 · Rating: 1 · rate: Rate + / Rate - Report as offensive    Reply Quote
milw0rm

Send message
Joined: 10 Dec 05
Posts: 22
Credit: 6,212,738
RAC: 0
Message 11705 - Posted: 6 Mar 2006, 9:23:24 UTC - in response to Message 10646.  

I am with a problem that some WUs
running on subject computer freezes !!!

rosetta 4.80

Initially I just killed boinc
and after ps xu clears
restarted boinc again

note that rosetta is the only project running
on this computer

top
shows 0.0% of use of cpu by rosetta
killing boinc and restarting boinc
only servers to I lost more time doing nothing

the freezes occurs again

Today I lost more the than of 4 hours of cpu IDLE !

Finally I discovered that aborting that WU
via remote gui rpc

that next WU comes to crunch normally !!

A big problem, for my *unmonitored* server

With these freezes I can end with a week of CPU IDLE
Else, to monitor that freezes and abort offending WUs
I have to pay a very costly $$$ diallup connection

*Please, that WUs cannot auto-abort ???

see a example of a returned result
Result ID 10549605
Name BARCODE_30_1ubi__299_25012_0
Workunit 8520285
Created 10 Feb 2006 11:13:57 UTC
Sent 10 Feb 2006 12:50:57 UTC
Received 10 Feb 2006 22:11:01 UTC
Server state Over
Outcome Client error
Client state Computing
Exit status -197 (0xffffff3b)
Computer ID 117981
Report deadline 17 Feb 2006 12:50:57 UTC
CPU time 551.93
stderr out <core_client_version>5.2.14</core_client_version>
<message>aborted by user
</message>
<stderr_txt>

</stderr_txt>


Validate state Invalid
Claimed credit 1.64110392447523
Granted credit 0
application version 4.80



I utterly agree with this.
i have had so many different units spend many hours doing nothing because the rosetta client does not auto cancel a unit that is broken or spends much toolong over the average processing time. this i have to check my computer output stat page every so often, find out who is not submitting and then go to the machine and abort the unit and get it to download a new one, thanks to a small oversight in the rosetta programmer's thought process. "Surely nothing could go wrong, we dont need this", unfortunately, it does! :(

Please fix :D
ID: 11705 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Moderator9
Volunteer moderator

Send message
Joined: 22 Jan 06
Posts: 1014
Credit: 0
RAC: 0
Message 11710 - Posted: 6 Mar 2006, 12:56:07 UTC

Moved from Science forum
Moderator9
ROSETTA@home FAQ
Moderator Contact
ID: 11710 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : WUs freeze !!! computer 117981



©2024 University of Washington
https://www.bakerlab.org