Message boards : Number crunching : Report Problems with Rosetta Version 5.07
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · Next
Author | Message |
---|---|
Astro Send message Joined: 2 Oct 05 Posts: 987 Credit: 500,253 RAC: 0 |
I still keep getting units that say they are running when they are not. No Error messages in the log. Go to the Projects tab, look at rosetta, then follow that line over to the "Status" column. does it say "suspended" there? You should also check the work/tasks tab to see if that particular WU is suspended. Is rosetta your only project? If not, then are the other projects working OK? If Rosetta is your only project, right click on the B in the systray and see if Boinc is suspended, or set to do work based on prefs. If set on based on prefs, then check you "general preferences" under "your account" and see if you have asked it to stop work while in use, or at specific times. tony |
Philip Hood Send message Joined: 11 Feb 06 Posts: 3 Credit: 35,986 RAC: 0 |
I suspended the work unit after I noticed it wasn't consuming any CPU time, I don't have time to baby sit it right now. Seti and Predictor are also running on this machine and have no problems. Roseeta seems to have this problem every few work units. It used to be worse before 5.07. |
Astro Send message Joined: 2 Oct 05 Posts: 987 Credit: 500,253 RAC: 0 |
I still keep getting units that say they are running when they are not. No Error messages in the log. I just reread your post. Do you mean it says "running" in the status column of the work/tasks tab? If yes, have you viewed the graphics to see if they're running. Are you a Win98/me user? [edit]I see two linux and one win2000 puter, which puter? |
Philip Hood Send message Joined: 11 Feb 06 Posts: 3 Credit: 35,986 RAC: 0 |
This is a linux machine I don't run the graphics on it, and so have no I dea what they would look like. The siutation was definitly that the status of the Work unit was running and that no CPU was being consumed. When the Work units get in this state they hog all the computer time without accomplishing anything. |
Astro Send message Joined: 2 Oct 05 Posts: 987 Credit: 500,253 RAC: 0 |
Sorry, Philip, I'm linux stupid and can't help you further, though I'd like to. tony |
tralala Send message Joined: 8 Apr 06 Posts: 376 Credit: 581,806 RAC: 0 |
This is a linux machine I don't run the graphics on it, and so have no I dea what they would look like. The siutation was definitly that the status of the Work unit was running and that no CPU was being consumed. When the Work units get in this state they hog all the computer time without accomplishing anything. If you can restart BOINC. If there is still no CPU usage abort. You get credit and the WU will be sent out to someone else. |
Dimitris Hatzopoulos Send message Joined: 5 Jan 06 Posts: 336 Credit: 80,939 RAC: 0 |
This is a linux machine I don't run the graphics on it, and so have no I dea what they would look like. The siutation was definitly that the status of the Work unit was running and that no CPU was being consumed. When the Work units get in this state they hog all the computer time without accomplishing anything. Can you do a "ps" to see the status of the BOINC and Rosetta processes? Or use "top" to see if it consumes CPU time? E.g. on my Linux (notice the STAT column, RN=Running, Nice): ps u -U boinc USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND boinc 2120 0.0 0.4 7396 3684 ? S Apr27 0:06 ./boinc_client boinc 8605 21.8 8.5 158868 63416 ? RN May02 404:22 rosetta_5.07_i686 boinc 8606 0.0 8.5 158868 63416 ? SN May02 0:00 rosetta_5.07_i686 boinc 8607 0.0 8.5 158868 63416 ? SN May02 0:00 rosetta_5.07_i686 boinc 8608 0.0 8.5 158868 63416 ? SN May02 0:00 rosetta_5.07_i686 I had a similar problem with yours 3+ months ago, on a under-spec'ed Linux where I was running 6 different BOINC projects with leave-preempted-in-mem=Yes on a PC with just 256MB RAM, where BOINC would think Rosetta was running, but it didn't. So BOINC wouldn't switch between projects, effectively "hanging". I never looked into it, I just reduced # of BOINC projects to 3 and I've never had the problem again. Best UFO Resources Wikipedia R@h How-To: Join Distributed Computing projects that benefit humanity |
Rebel Alliance Send message Joined: 4 Nov 05 Posts: 50 Credit: 3,579,531 RAC: 0 |
|
David Emigh Send message Joined: 13 Mar 06 Posts: 158 Credit: 417,178 RAC: 0 |
OS = Linux 2.6.10 CPU = AMD Sempron 3000+ Memory = 1024M (64M shared video) Failure Rate: approximately 70% With v5.01 of the Rosetta app, this rig ran clean. Near 100% completion. With v5.07, I'm lucky to get 1 result in 3 successfully completed. Any suggestions? |
tralala Send message Joined: 8 Apr 06 Posts: 376 Credit: 581,806 RAC: 0 |
OS = Linux 2.6.10 I looked on your host and I see as many errors for 5.01 as for 5.07. All failed WU on your host completed succesful on another host. Almost all your errors have exit code 131 - this may help the team to figure out what's going on on your machine. |
Dimitris Hatzopoulos Send message Joined: 5 Jan 06 Posts: 336 Credit: 80,939 RAC: 0 |
OS = Linux 2.6.10 Looking at your host's log, you seem to get SIGSEGV errors. Btw, do you have Leave-in-mem-when-preempted=YES? (I would try this first). It looks as if WUs are restarted several times. Which Linux distro (FC5?) Also see here for others having similar problem. Best UFO Resources Wikipedia R@h How-To: Join Distributed Computing projects that benefit humanity |
charmed Send message Joined: 2 Nov 05 Posts: 11 Credit: 1,780,440 RAC: 0 |
This work unit failed as I was watching it https://boinc.bakerlab.org/rosetta/result.php?resultid=19041999 Running Win xp on an Athlon64 3200+ with 1gb memory. |
David Emigh Send message Joined: 13 Mar 06 Posts: 158 Credit: 417,178 RAC: 0 |
I looked on your host and I see as many errors for 5.01 as for 5.07. All failed WU on your host completed succesful on another host. Almost all your errors have exit code 131 - this may help the team to figure out what's going on on your machine. Many thanks for your quick response =) I am especially grateful for the detective work that produced the exit code. I will be sure to include this in further posts regarding this host. |
David Emigh Send message Joined: 13 Mar 06 Posts: 158 Credit: 417,178 RAC: 0 |
Looking at your host's log, you seem to get SIGSEGV errors. Awesome response!!! I set the leave-in-mem-when-preempted to NO quite a while ago when I saw a message in technical news that said to do so. I will return that variable to YES immediately. The Linux distribution I am using is LinSpire 5.0 (build 5.0.59). Thank you very much for your response =) |
tralala Send message Joined: 8 Apr 06 Posts: 376 Credit: 581,806 RAC: 0 |
Not a failure but a "suspicious" WU: https://boinc.bakerlab.org/rosetta/result.php?resultid=19037537 This one generated 1123 decoys in 8 hours. Each model started in Full Atom Relax Mode in a somewhat "unfolded" stage (only a part of the amino acid strain was visible) and had alsways high RMSD (about 50). After a few steps it quited and started a new model. |
Rebirther Send message Joined: 17 Sep 05 Posts: 116 Credit: 41,315 RAC: 0 |
I have suspend following WU: FA_CASP6_t198__470_5745_0 After 2:13h only 1.04%. Steps increasing very low. Last entry stdout.txt: CYCLES::number is 1 x total_residue: 69 initializing full atom coordinates BOINC :: [2006-05-04 11:46:11] :: checkpoint_decoys() :: saved decoy info :: attempted_decoys: 7 :: num_decoys: 7 :: farlx_stage: 10 dump_fullatom_pdb: farlxcheck starting score 357.328156 rms 4.70180273 starting full atom minimization [T/F OPT]Default FALSE value for [-infinite_loop] Should I running further or abort it? Don`t know how long does it take? Normally 3h for one WU. 200MB RAM usage now. |
tralala Send message Joined: 8 Apr 06 Posts: 376 Credit: 581,806 RAC: 0 |
I have suspend following WU: FA_CASP6_t198__470_5745_0 t198 is one of the bigger proteins - 235 amino acids. I'd let it run at least 4 hour before I abort. Better abort only if reaching 24 hours and the 300 credit claiming barrier for failed WUs. |
rbpeake Send message Joined: 25 Sep 05 Posts: 168 Credit: 247,828 RAC: 0 |
I have suspend following WU: FA_CASP6_t198__470_5745_0 I had one run over my 8 hour preference time, but then it completed. Just a huge protein it seems! Now I have set my preference time to 12 hours. And I have learned to be patient! :) Regards, Bob P. |
Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0 |
Keep in mind, that 1.04% number is REALLY just telling you that it is still on model 1. Once it completes model one it will recompute the % completed and may determine that you're 60% done, or even 100% and end it. Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ |
kb7rzf Send message Joined: 7 Oct 05 Posts: 16 Credit: 35,427 RAC: 0 |
Got this error on This WU, and here's the info: Result ID 18984230 Name HBLR_1.0_2tif_ROT_TRIALS_TRIE_CHECKPOINTS_482_214_0 Workunit 15712387 Created 3 May 2006 0:08:00 UTC Sent 3 May 2006 4:07:40 UTC Received 4 May 2006 16:13:06 UTC Server state Over Outcome Client error Client state Computing Exit status 1 (0x1) Computer ID 12719 Report deadline 17 May 2006 4:07:40 UTC CPU time 8127.546875 stderr out <core_client_version>5.4.2</core_client_version> <message>Incorrect function. (0x1) - exit code 1 (0x1) </message> <stderr_txt> # random seed: 1065667 # cpu_run_time_pref: 14400 # cpu_run_time_pref: 14400 # cpu_run_time_pref: 14400 ERROR:: Exit at: .hbonds.cc line:293 </stderr_txt> Validate state Invalid Claimed credit 15.0702212672248 Granted credit 0 application version 5.07 Thanks. Jeremy |
Message boards :
Number crunching :
Report Problems with Rosetta Version 5.07
©2024 University of Washington
https://www.bakerlab.org