Message boards : Number crunching : Report stuck & aborted 5.01 WU here please - III
Previous · 1 · 2 · 3 · 4 · 5 . . . 6 · Next
Author | Message |
---|---|
Moderator9 Volunteer moderator Send message Joined: 22 Jan 06 Posts: 1014 Credit: 0 RAC: 0 |
|
Tribaal Send message Joined: 6 Feb 06 Posts: 80 Credit: 2,754,607 RAC: 0 |
22.04.2006 19:39:50|rosetta@home|Unrecoverable error for result PROD_ABINITIO_1tul__454_145_0 ( - exit code -1073741811 (0xc000000d)) Hope this helps =( - trib' |
[DPC]Division_Brabant~OldButNotSoWise Send message Joined: 23 Jan 06 Posts: 42 Credit: 371,797 RAC: 0 |
|
mewbysea Send message Joined: 29 Jan 06 Posts: 17 Credit: 15,880,002 RAC: 2,727 |
Aborted 2 stuck wus: HBLR_1.0_1di2_420_4698 at 10:11 hours and 3.6941% see result id 17749702 (full atom relax, model 1, step 32974) HBLR_1.0_2tif_420_9229 at 8:59 hours and 4.996% See result id 17770512 (full atom relax, model 1, step 34201) Both were re-releases from 6 April (no results returned) |
Ian Send message Joined: 14 Apr 06 Posts: 29 Credit: 335,780 RAC: 731 |
Aborted this one after 25hrs, as per my other thread... https://boinc.bakerlab.org/rosetta/result.php?resultid=17774846 Ian Cundell, St Albans, UK |
Chilcotin Send message Joined: 5 Nov 05 Posts: 15 Credit: 16,969,500 RAC: 0 |
Workunit https://boinc.bakerlab.org/rosetta/workunit.php?wuid=13401144 aborted after 27 hours. It was making progress but was only up to 12 % completed by the time I quit. |
Moderator9 Volunteer moderator Send message Joined: 22 Jan 06 Posts: 1014 Credit: 0 RAC: 0 |
I have moved the discussion about the new abort feature to this thread. Moderator9 ROSETTA@home FAQ Moderator Contact |
Delk Send message Joined: 20 Feb 06 Posts: 25 Credit: 995,624 RAC: 0 |
This one stuck without progress after 7%: https://boinc.bakerlab.org/rosetta/result.php?resultid=17853207 WU name: NO_TERM_STRAND_1ogw_423_2866 checkpoint CPU time: 98378.230000 current CPU time: 98951.020000 fraction done: 0.077710 estimated CPU time remaining: 115357.613121 |
Rebel Alliance Send message Joined: 4 Nov 05 Posts: 50 Credit: 3,579,531 RAC: 0 |
Just aborted 4 work units from 4 different machines Longest had been running close to 10 hours and was at 5% the shorted 6 hours and at one percent #1 from 2700xp Result ID 17772227 Name HBLR_1.0_1mky_420_9630_1 Workunit 13428053 Created 20 Apr 2006 21:42:41 UTC Sent 21 Apr 2006 4:22:49 UTC Received 23 Apr 2006 5:53:20 UTC Server state Over Outcome Client error Client state Computing Exit status -197 (0xffffff3b) Computer ID 148992 Report deadline 5 May 2006 4:22:49 UTC CPU time 32013.537868 #2 From 1800 xp Result ID 17805638 Name NO_TERM_STRAND_1ogw_423_6947_2 Workunit 13496532 Created 21 Apr 2006 5:49:41 UTC Sent 21 Apr 2006 8:05:02 UTC Received 23 Apr 2006 5:52:38 UTC Server state Over Outcome Client error Client state Computing Exit status -197 (0xffffff3b) Computer ID 105489 Report deadline 5 May 2006 8:05:02 UTC CPU time 24477.506926 #3 from 2000 xp Result ID 17748958 Name FACONTACTS_RECENTER_NOFILTERS_1ig5A_448_551_1 Workunit 14550587 Created 20 Apr 2006 16:34:25 UTC Sent 20 Apr 2006 22:38:14 UTC Received 23 Apr 2006 5:51:22 UTC Server state Over Outcome Client error Client state Computing Exit status -197 (0xffffff3b) Computer ID 106748 Report deadline 4 May 2006 22:38:14 UTC CPU time 25011.984375 #4 from 2500 Xp Result ID 17786001 Name HBLR_1.0_1n0u_ROT_TRIALS_TRIE_449_5_0 Workunit 14630032 Created 21 Apr 2006 1:00:11 UTC Sent 21 Apr 2006 3:09:30 UTC Received 23 Apr 2006 5:50:36 UTC Server state Over Outcome Client error Client state Computing Exit status -197 (0xffffff3b) Computer ID 107679 Report deadline 5 May 2006 3:09:30 UTC CPU time 22721.8125 |
Lucky Angel~AES_koetje Send message Joined: 18 Mar 06 Posts: 4 Credit: 0 RAC: 0 |
22.04.2006 19:39:50|rosetta@home|Unrecoverable error for result PROD_ABINITIO_1tul__454_145_0 ( - exit code -1073741811 (0xc000000d)) I have seen this error code: exit code -1073741811 (0xc000000d) too often. Spend over an hour searching for an useful interpretation. Does somebody know the answer? |
Tallguy-13088 Send message Joined: 14 Dec 05 Posts: 9 Credit: 843,378 RAC: 0 |
Just as a quick update, these two work units are up in the 30+ hour range and seem to be progressing albeit slowly (around 12% completion). Since I have read in another post that these are "more intense" models that are using new code, I don't have a problem running them as long as they make forward progress. What I am seeing is that the "slowdown" is definitely in the Full Atom Relax stage (Ab Initio seems to crank right along). Current estimates put both of these WU's in the 300 Hr. range. I guess we will all know how it works out in about a couple of weeks if all continues to compute. Folks, |
universum Send message Joined: 22 Mar 06 Posts: 1 Credit: 42,464 RAC: 0 |
I have been running the same work form more than 17 hours now (usually one wrok unit takes 2-4 hours for me), and it seems like it restarts over and over on "model 1" and is stuck on a few percent. I was up at 3%-something and restarted the BOINC manager and it started from 1.00% and is now up at 1.6%. It's just not making any progress and it doesn't abort automatically either. Something must be wrong.. |
de Mecquenem Pascal Send message Joined: 11 Oct 05 Posts: 1 Credit: 1,366,202 RAC: 0 |
Had to abort this work today. Stuck at 8,04 % after 13 hours (Time to completion 13 hours). Closed and restarted Boinc : it was then stuck at 1,01 %. Graphics worked fine. Windows XP Home Edition Name HBLR_1.0_1n0u_420_7152_1 Workunit 13415665 Created 20 Apr 2006 19:27:24 UTC Sent 21 Apr 2006 1:29:25 UTC Received 23 Apr 2006 0:44:07 UTC Server state Over Outcome Client error Client state Computing Exit status -197 (0xffffff3b) Computer ID 176653 Report deadline 5 May 2006 1:29:25 UTC CPU time 51250.078125 stderr out <core_client_version>5.2.13</core_client_version> <message>aborted via GUI RPC </message> <stderr_txt> # cpu_run_time_pref: 7200 # random seed: 1543553 # cpu_run_time_pref: 7200 # random seed: 1543553 </stderr_txt> Validate state Invalid Claimed credit 138.782538165437 Granted credit 0 application version 5.01 |
The Cow Association Send message Joined: 15 Jan 06 Posts: 1 Credit: 145,104 RAC: 0 |
i have one job that is running for 43 hours right now en till completion it says still 31 hours. the job is making progress en is at 29.47%. do i get the normal amount of points for this ? , or is it better to abort the job. it is a HBLR_1.0_1b71_ROT_TRIALS_TRIE_449_30_0 job |
Rhiju Volunteer moderator Send message Joined: 8 Jan 06 Posts: 223 Credit: 3,546 RAC: 0 |
Go ahead and abort the jobs that have been going for more than 10 hours -- we are seeing incompatibility of certain workunits with certain machines. (We're testing the fix over on ralph now.) You'll still get credit later in the week when we grant credit for claimed credit! And you'll get some workunits that should not get stuck. Thanks for posting. i have one job that is running for 43 hours right now en till completion it says still 31 hours. |
TCU Computer Science Send message Joined: 7 Dec 05 Posts: 28 Credit: 12,861,977 RAC: 0 |
These 5.01 WUs were aborted today: 11.8 hrs https://boinc.bakerlab.org/rosetta/result.php?resultid=17929192 FACONTACTS_RECENTER_NOFILTERS_1ubi__448_846 34.7 hrs https://boinc.bakerlab.org/rosetta/result.php?resultid=17754714 HBLR_1.0_1hz6_420_5519 44.2 hrs https://boinc.bakerlab.org/rosetta/result.php?resultid=17786665 HBLR_1.0_1di2_ROT_TRIALS_TRIE_449_49 50.7 hrs https://boinc.bakerlab.org/rosetta/result.php?resultid=17762275 HBLR_1.0_1hz6_420_7237 49.3 hrs https://boinc.bakerlab.org/rosetta/result.php?resultid=17773010 FACONTACTS_RECENTER_NOFILTERS_1vls__448_927 27.5 hrs https://boinc.bakerlab.org/rosetta/result.php?resultid=17797075 NO_TERM_STRAND_1ogw_423_3285 |
Metal-Phantom~MetalMike Send message Joined: 8 Mar 06 Posts: 2 Credit: 2,052,366 RAC: 0 |
HBLR_1.0_1n0u_420_9804_2 After 9,05 hours and 3,6% it killed itself on my P-M 1.6 running WinXP <error_code>-161</error_code> https://boinc.bakerlab.org/rosetta/result.php?resultid=17871439 |
biodoc Send message Joined: 19 Feb 06 Posts: 14 Credit: 30,717,792 RAC: 0 |
I just aborted a WU (details below). It was running 8+ hours at 2% complete and I have a 2 hr runtime pref. set. The behavior of this WU was similar to the FACONTACT & HBLR_1.0 WUs that I've seen posted here & experienced myself as "long runners". They seem to run normally through the "model 1" process (# of steps in the 6 figure range) & instead of moving on to "model 2, step 1", they start over as model 1, step1. Perhaps it could be described as a "model 1 loop" bug? Anyone else seen this? I'm only running Rosetta & I have "leave in memory" checked as a pref. Could this be Accepted RMSD & Accepted energy parameters or "goals" are not met during the Model 1 calculation & thus does not move on to a model 2 calculation & just starts the model 1 calculation over again? Result ID 18037907 Name FARELAX_NOFILTERS_1scjB_417_302_3 Workunit 13208951 |
Tallguy-13088 Send message Joined: 14 Dec 05 Posts: 9 Credit: 843,378 RAC: 0 |
Finally aborted these two work units. The links are as follows: HBLR_1.0_1dtj_420_4640 and HBLR_1.0_1n0u_420_9429_1 Other than the large compute times, nothing really "stands out" about these two. Just as a quick update, these two work units are up in the 30+ hour range and seem to be progressing albeit slowly (around 12% completion). Since I have read in another post that these are "more intense" models that are using new code, I don't have a problem running them as long as they make forward progress. |
Rebel Alliance Send message Joined: 4 Nov 05 Posts: 50 Credit: 3,579,531 RAC: 0 |
Three more aborted. shortest had a running time of 26 hours the longest was 36 hours.Plus one failed work unit. #1 barton 2700 Result ID 17806384 Name FACONTACTS_RECENTER_NOFILTERS_1pgx__448_969_1 Workunit 14579018 Created 21 Apr 2006 5:58:36 UTC Sent 21 Apr 2006 12:17:56 UTC Received 24 Apr 2006 16:13:08 UTC Server state Over Outcome Client error Client state Computing Exit status -197 (0xffffff3b) Computer ID 148992 Report deadline 5 May 2006 12:17:56 UTC CPU time 89068.004663 #2 3 gig P4 Result ID 17783537 Name FACONTACTS_RECENTER_NOFILTERS_1enh__448_738_1 Workunit 14563296 Created 21 Apr 2006 0:23:48 UTC Sent 21 Apr 2006 7:09:10 UTC Received 24 Apr 2006 16:16:05 UTC Server state Over Outcome Client error Client state Computing Exit status -197 (0xffffff3b) Computer ID 130403 Report deadline 5 May 2006 7:09:10 UTC CPU time 126339.9375 and on the same machine #3 Wasn't an aborted unit. It failed on its own. Result ID 17796282 Name NO_TERM_STRAND_1ogw_423_2065_1 Workunit 13457432 Created 21 Apr 2006 3:19:41 UTC Sent 21 Apr 2006 5:42:04 UTC Received 23 Apr 2006 5:06:12 UTC Server state Over Outcome Client error Client state Computing Exit status -1073741819 (0xc0000005) Computer ID 130403 Report deadline 5 May 2006 5:42:04 UTC CPU time 6977.347175 #4 2500 Barton Result ID 17809982 Name NO_TERM_STRAND_1ogw_423_8417_1 Workunit 13508136 Created 21 Apr 2006 6:57:56 UTC Sent 21 Apr 2006 8:04:02 UTC Received 24 Apr 2006 16:18:43 UTC Server state Over Outcome Client error Client state Computing Exit status -197 (0xffffff3b) Computer ID 155638 Report deadline 5 May 2006 8:04:02 UTC CPU time 131644.328125 |
Message boards :
Number crunching :
Report stuck & aborted 5.01 WU here please - III
©2024 University of Washington
https://www.bakerlab.org