Report Problems with Rosetta Version 5.07

Message boards : Number crunching : Report Problems with Rosetta Version 5.07

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

AuthorMessage
Profile Rebirther
Avatar

Send message
Joined: 17 Sep 05
Posts: 116
Credit: 41,315
RAC: 0
Message 15601 - Posted: 6 May 2006, 7:47:22 UTC - in response to Message 15597.  

Hi Rebirther and others with "1.04%" after 3 hours or so, please let them run until they go about 4 times your cpu run time preference. (If you haven't set a preference, our default is 3 hours, so let them run 12 hours.) If they're running longer, the jobs should be aborted by the watchdog, but please post here if not!

I have suspend following WU: FA_CASP6_t198__470_5745_0
After 2:13h only 1.04%. Steps increasing very low.
Last entry stdout.txt:
CYCLES::number is 1 x total_residue: 69
initializing full atom coordinates
BOINC :: [2006-05-04 11:46:11] :: checkpoint_decoys() :: saved decoy info :: attempted_decoys: 7 :: num_decoys: 7 :: farlx_stage: 10
dump_fullatom_pdb: farlxcheck
starting score 357.328156 rms 4.70180273
starting full atom minimization
[T/F OPT]Default FALSE value for [-infinite_loop]

Should I running further or abort it? Don`t know how long does it take? Normally 3h for one WU. 200MB RAM usage now.



Don`t worry Rhiju, I have finished this larger WU in 3h ;)
ID: 15601 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mmn

Send message
Joined: 11 Mar 06
Posts: 1
Credit: 23,115
RAC: 0
Message 15603 - Posted: 6 May 2006, 11:11:02 UTC

MODEL 1 STEP 0
cpu time : 7 min

created 6 May 2006 7:14:58 UTC
name AB_CASP6_t216__486_401
ID: 15603 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Knorr

Send message
Joined: 18 Feb 06
Posts: 21
Credit: 373,953
RAC: 0
Message 15604 - Posted: 6 May 2006, 11:26:15 UTC

Got an exit code 0x1 on this WU:

HBLR_1.0_1n0u_ROT_TRIALS_TRIE_CHECKPOINTS_482_4843_0

Just came out of the blue. Not while rescheduling etc.
ID: 15604 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Jon C Melusky
Avatar

Send message
Joined: 29 Nov 05
Posts: 12
Credit: 208,931
RAC: 1,214
Message 15613 - Posted: 6 May 2006, 17:08:18 UTC

>>>Hi Jon: thanks for posting. We definitely don't want Rosetta to be dysfunctional on your PC! Can you possibly post here a link to your failed workunits? In the boinc manager, you can hit "Your results" and it will give you the links.

We are now beginning a big push on our test server ralph to track down the final set of bugs in rosetta@home. The app there is getting more debugging machinery added every few days. So if any users out there are seeing repeated failures on rosetta@home (there don't seem to be many -- our error rates are low), please consider attaching your computer to ralph!>>>

Hi Rhiju,

Here is the link to my failed work units. Looks like I have had one Rosetta success since April 22nd. I am running XP Home. HP Presario 6000. 384 Ram. Rosetta gets 20% like all my projects do.

I don't know what ralph is. I am attached to 4 of the other main BOINC projects. They run fine. Well, lately they have. (^:

https://boinc.bakerlab.org/rosetta/results.php?userid=23144

cheers,

Jonathan
ID: 15613 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Snake Doctor
Avatar

Send message
Joined: 17 Sep 05
Posts: 182
Credit: 6,401,938
RAC: 0
Message 15631 - Posted: 6 May 2006, 22:07:50 UTC
Last modified: 6 May 2006, 22:10:07 UTC

Well I don't get many errors but I have two within just a few moments of each other. Mac g4 Dual, 1GB of memory. It looked like BOINC tried to start running them before they finished downloading.

here and here

the errors were both -
<core_client_version>5.4.9</core_client_version>
<message>
Couldn't start or resume: -146
</message>

ID: 15631 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bones

Send message
Joined: 16 Sep 05
Posts: 3
Credit: 713,317
RAC: 0
Message 15633 - Posted: 7 May 2006, 1:36:51 UTC

resultid=19261039.

This one hasn't yet failed, but acted strangely in that it was at 0.00% complete after 2 hours and the cpu usage was also 2% (normally 100%) even though the wu was supposedly running. I restarted boinc (5.2.13) and the progress jumped to 66.01% and now appears to be running ok. Not sure if this problem is rosetta app or boinc causing the issue, but thought i'd let you know anyway.

ID: 15633 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Astro
Avatar

Send message
Joined: 2 Oct 05
Posts: 987
Credit: 500,253
RAC: 0
Message 15635 - Posted: 7 May 2006, 3:15:03 UTC - in response to Message 15633.  

Not sure if this problem is rosetta app or boinc causing the issue, but thought i'd let you know anyway.


if it happens again, you might take a look at "task manager" to see what's taking up the other percentage of cpu usage. Maybe some other task was running?
ID: 15635 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Nightbird

Send message
Joined: 17 Sep 05
Posts: 70
Credit: 32,418
RAC: 0
Message 15640 - Posted: 7 May 2006, 8:05:41 UTC
Last modified: 7 May 2006, 8:08:58 UTC

Got after i rebooted my machine : (0x1) - exit code 1 (0x1)

FACONTACTS_NOFILTERS_1vie__441_93_1

stderr out <core_client_version>4.32</core_client_version>
<message>Fonction incorrecte. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# random seed: 3203608
# cpu_run_time_pref: 21600

</stderr_txt>


Validate state Invalid

https://boinc.bakerlab.org/rosetta/result.php?resultid=18716104






ID: 15640 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
R/B

Send message
Joined: 8 Dec 05
Posts: 195
Credit: 28,095
RAC: 0
Message 15644 - Posted: 7 May 2006, 9:14:50 UTC

Result ID 19329923
Name JUMP_CLOSE_CHAINBREAK_ALLBARCODE_1q7sA_SAVE_ALL_OUT_472_6930_0
Workunit 16022107
Created 6 May 2006 3:45:09 UTC
Sent 6 May 2006 7:44:35 UTC
Received 7 May 2006 8:15:56 UTC
Server state Over
Outcome Client error
Client state Computing
Exit status 0 (0x0)
Computer ID 92884
Report deadline 20 May 2006 7:44:35 UTC
CPU time 19425.453125
stderr out <core_client_version>5.2.13</core_client_version>
<stderr_txt>


ID: 15644 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Snake Doctor
Avatar

Send message
Joined: 17 Sep 05
Posts: 182
Credit: 6,401,938
RAC: 0
Message 15649 - Posted: 7 May 2006, 15:30:02 UTC
Last modified: 7 May 2006, 15:32:37 UTC

All three on Mac OS 10.4.6, Dual G4, 1 GB memory. BOINC Ver 5.4.9, Rosetta 5.07.

AB_CASP6_t272__486_1242_0 - this WU Was killed by watchdog.
HBLR_1.0_1dtj_RDFLAGS_473_8871_0 - this WU failed almost on arrival.
HBLR_1.0_1mky_ROT_TRIALS_TRIE_CHECKPOINTS_482_7412_0 - This Wu Failed almost on arrival.
ID: 15649 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Jon C Melusky
Avatar

Send message
Joined: 29 Nov 05
Posts: 12
Credit: 208,931
RAC: 1,214
Message 15657 - Posted: 7 May 2006, 17:02:50 UTC - in response to Message 15616.  

...I don't know what ralph is. I am attached to 4 of the other main BOINC projects. They run fine. Well, lately they have. (^:

https://boinc.bakerlab.org/rosetta/results.php?userid=23144

cheers,

Jonathan

Ralph is the Alpha test project for Rosetta. It is located Here.

Also this will help find any errors on the application.


Thank you for the link to Ralph. Sadly, my system only has 384 ram and the min requirements of Ralph are 512 ram.

I guess I have no choice but to scale back rosetta to 5% instead of 20%.

Jonathan
ID: 15657 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Jon C Melusky
Avatar

Send message
Joined: 29 Nov 05
Posts: 12
Credit: 208,931
RAC: 1,214
Message 15678 - Posted: 8 May 2006, 8:59:48 UTC - in response to Message 15669.  

Sadly, my system only has 384 ram and the min requirements of Ralph are 512 ram.

I guess I have no choice but to scale back rosetta to 5% instead of 20%.

Jonathan


Actually the basic requirements are the same for both Ralph and Rosetta.[/quote]

Well, all I know is that Rosetta worked perfectly from 29 Nov 2005 to early April 2006 with only 384 ram, so I don't know why it used to work so well below basic requirements. Was it 512 ram back in Nov of 2005 ? Should I not have been allowed to attach to Rosetta with 384 ram ? Should I try Ralph with 384 ram ?

Please advise.

Jonathan
ID: 15678 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile anders n

Send message
Joined: 19 Sep 05
Posts: 403
Credit: 537,991
RAC: 0
Message 15681 - Posted: 8 May 2006, 11:36:04 UTC - in response to Message 15678.  
Last modified: 8 May 2006, 11:36:54 UTC

Well, all I know is that Rosetta worked perfectly from 29 Nov 2005 to early April 2006 with only 384 ram, so I don't know why it used to work so well below basic requirements. Was it 512 ram back in Nov of 2005 ? Should I not have been allowed to attach to Rosetta with 384 ram ? Should I try Ralph with 384 ram ?

Please advise.
Jonathan


Do try joining Ralf.

There are computers there with less than 512 in memory.

Anders n



ID: 15681 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Astro
Avatar

Send message
Joined: 2 Oct 05
Posts: 987
Credit: 500,253
RAC: 0
Message 15682 - Posted: 8 May 2006, 11:45:50 UTC
Last modified: 8 May 2006, 11:50:22 UTC

like my Celeron 500, Win98, and 256 Mram. If I'm not mistaken it's "minimum Recommended" specs, not "min Required".
ID: 15682 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
TioSuper

Send message
Joined: 2 May 06
Posts: 17
Credit: 164
RAC: 0
Message 15684 - Posted: 8 May 2006, 12:09:46 UTC

https://boinc.bakerlab.org/rosetta/result.php?resultid=19492107 Resulted in one of the now infamous 107 type of errors.
ID: 15684 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
kevint

Send message
Joined: 8 Oct 05
Posts: 84
Credit: 2,530,451
RAC: 0
Message 15685 - Posted: 8 May 2006, 14:41:52 UTC
Last modified: 8 May 2006, 14:45:39 UTC

So I came in this morning and noticed that this machine

machine

had all the WU's (about 50 or so)aborted/errors for no apparent reason.
This machine has been running very nicely now for several months without ever a hickup. I have not changed anything with this machine for a long time. Did we get a batch of bad WU's.

5/8/2006 6:20:46 AM|rosetta@home|3 consecutive failures fetching scheduler list - deferring 604800 seconds
5/8/2006 6:24:06 AM|rosetta@home|Computation for result HBLR_1.0_1n0u_RDFLAGS_485_7128_0 finished
5/8/2006 6:24:06 AM|rosetta@home|Starting result AB_CASP6_JUMPING_STRAND2_STRAND5_t212_SAVE_ALL_OUT_488_3479_0 using rosetta version 507
5/8/2006 6:24:15 AM|rosetta@home|Unrecoverable error for result AB_CASP6_JUMPING_STRAND2_STRAND5_t212_SAVE_ALL_OUT_488_3479_0 ( - exit code -1073741819 (0xc0000005))
5/8/2006 6:24:15 AM|rosetta@home|3 consecutive failures fetching scheduler list - deferring 604800 seconds
5/8/2006 6:24:15 AM||Rescheduling CPU: application exited
5/8/2006 6:24:15 AM|rosetta@home|Computation for result AB_CASP6_JUMPING_STRAND2_STRAND5_t212_SAVE_ALL_OUT_488_3479_0 finished
5/8/2006 6:24:15 AM|rosetta@home|Starting result AB_CASP6_JUMPING__t242_SAVE_ALL_OUT_488_3479_0 using rosetta version 507
5/8/2006 6:24:26 AM|rosetta@home|Unrecoverable error for result AB_CASP6_JUMPING__t242_SAVE_ALL_OUT_488_3479_0 ( - exit code -1073741819 (0xc0000005))
5/8/2006 6:24:26 AM|rosetta@home|3 consecutive failures fetching scheduler list - deferring 604800 seconds
5/8/2006 6:24:26 AM||Rescheduling CPU: application exited
5/8/2006 6:24:26 AM|rosetta@home|Computation for result AB_CASP6_JUMPING__t242_SAVE_ALL_OUT_488_3479_0 finished
5/8/2006 6:24:26 AM|rosetta@home|Starting result JUMP_ALLBARCODES_ANTIPARALLEL_1tul__SAVE_ALL_OUT_490_1401_0 using rosetta version 507
5/8/2006 6:24:53 AM|rosetta@home|Unrecoverable error for result JUMP_ALLBARCODES_ANTIPARALLEL_1tul__SAVE_ALL_OUT_490_1401_0 ( - exit code -1073741819 (0xc0000005))
5/8/2006 6:24:53 AM|rosetta@home|3 consecutive failures fetching scheduler list - deferring 604800 seconds
5/8/2006 6:24:53 AM||Rescheduling CPU: application exited
5/8/2006 6:24:53 AM|rosetta@home|Computation for result JUMP_ALLBARCODES_ANTIPARALLEL_1tul__SAVE_ALL_OUT_490_1401_0 finished






SETI.USA


ID: 15685 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Charles Dennett
Avatar

Send message
Joined: 27 Sep 05
Posts: 102
Credit: 2,081,660
RAC: 0
Message 15687 - Posted: 8 May 2006, 16:24:12 UTC - in response to Message 15682.  

like my Celeron 500, Win98, and 256 Mram. If I'm not mistaken it's "minimum Recommended" specs, not "min Required".


My older son moved out on his own a few weeks ago. Took his laptop with him but left his old Dell Optiplex GX110. Said I could do what I wanted with it. It's got a 667 P3 with only 128 MB of memory. I'm running W2K on it. All I did was bump up the initial virtual memory allocation (I think it was from 192 MB to 256 MB) after it complained about running out of VM and it's been running R@H just fine.


-Charlie
ID: 15687 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Nite Owl
Avatar

Send message
Joined: 2 Nov 05
Posts: 87
Credit: 3,019,449
RAC: 0
Message 15694 - Posted: 8 May 2006, 23:30:53 UTC
Last modified: 8 May 2006, 23:45:03 UTC

I just had a failure on a machine that I believe was caused by my viewing the graphics. It is a A64 x2 4400+ w/ 1GB memory, SLI duel Graphics boards w/ 256MB each.

Result ID 19507834
Name FA_CASP6_t212__470_13327_0
Workunit 16182368
Created 7 May 2006 20:29:38 UTC
Sent 8 May 2006 0:15:29 UTC
Received 8 May 2006 21:52:32 UTC
Server state Over
Outcome Client error
Client state Computing
Exit status 1 (0x1)
Computer ID 201779
Report deadline 22 May 2006 0:15:29 UTC
CPU time 49699.375
stderr out <core_client_version>5.2.13</core_client_version>
<message>Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# random seed: 1549694
# cpu_run_time_pref: 86400
No heartbeat from core client for 31 sec - exiting
# cpu_run_time_pref: 86400
ERROR:: Exit at: .dock_structure.cc line:401

</stderr_txt>


Validate state Invalid
Claimed credit 355.538220736369
Granted credit 0
application version 5.07


Two other failures on an Intel Pentium 4HT, 3218 MHz, 1GB memory, NVIDIA GeForce FX 5200 (128 MB) graphics and 30 GB HD drive... Both failures were attributed to "Maximum disk usage exceeded".

Result ID 19546463
Name JUMP_CLOSE_CHAINBREAK_ALLBARCODE_1q7sA_SAVE_ALL_OUT_472_11569_0
Workunit 16217955
Created 8 May 2006 5:33:41 UTC
Sent 8 May 2006 9:21:18 UTC
Received 8 May 2006 21:25:16 UTC
Server state Over
Outcome Client error
Client state Computing
Exit status -177 (0xffffff4f)
Computer ID 142263
Report deadline 22 May 2006 9:21:18 UTC
CPU time 43029.984375
stderr out <core_client_version>5.2.13</core_client_version>
<message>Maximum disk usage exceeded
</message>
<stderr_txt>
# random seed: 1491312
# cpu_run_time_pref: 86400

</stderr_txt>


Validate state Invalid
Claimed credit 186.565840823042
Granted credit 0
application version 5.07

Join the Teddies@WCG
ID: 15694 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
kevint

Send message
Joined: 8 Oct 05
Posts: 84
Credit: 2,530,451
RAC: 0
Message 15696 - Posted: 8 May 2006, 23:49:38 UTC - in response to Message 15695.  

So I came in this morning and noticed that this machine

machine

had all the WU's (about 50 or so)aborted/errors for no apparent reason.
This machine has been running very nicely now for several months without ever a hickup. I have not changed anything with this machine for a long time. Did we get a batch of bad WU's....


Well, the larger workunits now running on the system may be a problem for you. You could try increasing Virtual memory. But it looks like a file problem. It is possible that the BOINC files system has become corrupted somehow. Try resetting the project. If that does not work, then increase the virtual memory for the system (expect more disk activity if you do that).



Will do, virtual memory may be an issue however, looks like I need to install a 2nd hard drive. I think I have a couple of old junkers laying around.

THanks.
SETI.USA


ID: 15696 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Buffalo Bill
Avatar

Send message
Joined: 25 Mar 06
Posts: 71
Credit: 1,630,458
RAC: 0
Message 15700 - Posted: 9 May 2006, 1:21:11 UTC
Last modified: 9 May 2006, 1:28:14 UTC

Is there any way to save this WU which is stuck at:

CPU Time: 05:37:38 Progress: 100% Status: Uploading

I've tried rebooting etc. but it won't progress to "ready to report".

https://boinc.bakerlab.org/rosetta/result.php?resultid=19494764

What should I do with it? Next WU is at 1 hour 30 min. and running.
ID: 15700 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

Message boards : Number crunching : Report Problems with Rosetta Version 5.07



©2025 University of Washington
https://www.bakerlab.org