Report Problems with Rosetta Version 5.22

Message boards : Number crunching : Report Problems with Rosetta Version 5.22

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · Next

AuthorMessage
RWIoffice

Send message
Joined: 7 Jun 06
Posts: 4
Credit: 37,344
RAC: 0
Message 18465 - Posted: 11 Jun 2006, 14:56:18 UTC

Screensaver or possibly some other flavor of "completion" problem with a t299__CASP7 work unit.

I noticed lack of completion progress from home this morning. When I got to the office, the screensaver was sitting on "Model 9, Step 0."

Once I got past the screensaver, BOINC Manager was reporting the work unit as "Running" and "100%" for Progress. BOINC Manager would not display the graphics.

Shutdown of BOINC Manager seemed to take a long time, but it finally happened. I rebooted the system. Once BOINC Manager launched it reported the status on this work unit as completed, "Ready to Report" and started the next work unit.

I forced an update so the result would be available prior to my posting this report.

Keeping in mind that I'm a newbie and could easily be misinterpreting, the result seems to be referring to only 8 models, so the screensaver graphic's Model 9 reference doesn't make sense to me.

ID: 18465 · Rating: 1 · rate: Rate + / Rate - Report as offensive    Reply Quote
tralala

Send message
Joined: 8 Apr 06
Posts: 376
Credit: 581,806
RAC: 0
Message 18473 - Posted: 11 Jun 2006, 18:46:15 UTC - in response to Message 18465.  

Screensaver or possibly some other flavor of "completion" problem with a t299__CASP7 work unit.

I noticed lack of completion progress from home this morning. When I got to the office, the screensaver was sitting on "Model 9, Step 0."

Once I got past the screensaver, BOINC Manager was reporting the work unit as "Running" and "100%" for Progress. BOINC Manager would not display the graphics.

Shutdown of BOINC Manager seemed to take a long time, but it finally happened. I rebooted the system. Once BOINC Manager launched it reported the status on this work unit as completed, "Ready to Report" and started the next work unit.

I forced an update so the result would be available prior to my posting this report.

Keeping in mind that I'm a newbie and could easily be misinterpreting, the result seems to be referring to only 8 models, so the screensaver graphic's Model 9 reference doesn't make sense to me.



This is related to a bug report on Ralph. The behaviour is exactly the same. It was supposed to be fixed in 5.22 obviously it is not.
ID: 18473 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Craig Miller

Send message
Joined: 5 Jun 06
Posts: 1
Credit: 241,534
RAC: 0
Message 18482 - Posted: 11 Jun 2006, 23:29:23 UTC

I am having a problem running Rosetta. I attach to Rosetta using BOINC manager, and receive the notice of a successfull attachment. When I look at BOINC manager it shows Rosetta running, while Einstein and SETI are suspended. But when I come back several hours later Rosseta is not present, either in Projects or Tasks. When I look at the messages they seem to show Rosetta being loaded and started, but then it ends with: Detaching from project, shown below.

-------------
11-Jun-06 12:41:20|rosetta@home|Starting task t314__CASP7_ABRELAX_SAVE_ALL_OUT_hom002__666_13970_1 using rosetta version 522
11-Jun-06 12:49:01||Contacting account manager at http://bam.boincstats.com/
11-Jun-06 12:49:03||Account manager: BAM Host-ID: 2098
11-Jun-06 12:49:03||Account manager contact succeeded
11-Jun-06 12:49:03|rosetta@home|Resetting project
11-Jun-06 12:49:04||Rescheduling CPU: exit_tasks
11-Jun-06 12:49:04|rosetta@home|Detaching from project


When I check with BAM, my resources are shown as Einstein, SETU, and Rosetta.

What could be causing this problem?

Caig Miller

ID: 18482 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Dogbytes
Avatar

Send message
Joined: 4 Dec 05
Posts: 37
Credit: 207,563
RAC: 0
Message 18483 - Posted: 12 Jun 2006, 0:17:55 UTC
Last modified: 12 Jun 2006, 0:31:24 UTC

The below linked WU crunched for over 2 hours, and yet was stuck at 0.00%. I aborted the unit because it appeared to be completely hung up, and stopped crunching. I would be interested to know what the problem was...

Aborted 5.22 WU link
ID: 18483 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Ian

Send message
Joined: 14 Apr 06
Posts: 29
Credit: 361,378
RAC: 763
Message 18484 - Posted: 12 Jun 2006, 0:50:05 UTC

Here's one from Saturday:

https://boinc.bakerlab.org/rosetta/result.php?resultid=23575087

And one from Friday:

https://boinc.bakerlab.org/rosetta/result.php?resultid=23484615

The only two errors for quite q while.
Ian Cundell, St Albans, UK
ID: 18484 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Dogbytes
Avatar

Send message
Joined: 4 Dec 05
Posts: 37
Credit: 207,563
RAC: 0
Message 18485 - Posted: 12 Jun 2006, 1:12:54 UTC

The below linked WU has severe memory leakage...using >275Megs of CPU memory bringing the hosts commit charge to nearly 600Megs. WU was aborted by user.

Aborted WU
ID: 18485 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 18487 - Posted: 12 Jun 2006, 1:57:59 UTC - in response to Message 18482.  
Last modified: 12 Jun 2006, 1:59:43 UTC

When I check with BAM, my resources are shown as Einstein, SETU, and Rosetta.

What could be causing this problem?


Rosetta's servers were just upgraded to support BAM last week. But it looks like BAM did something, not Rosetta.

Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 18487 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile scsimodo

Send message
Joined: 17 Sep 05
Posts: 93
Credit: 946,359
RAC: 0
Message 18525 - Posted: 12 Jun 2006, 17:52:24 UTC

Had a few WUs crashing when hitting the "show graphics" button. The window popped up, closed immediately and trashed the WU. The Wus are:

WU1
WU2
WU3
WU4

Host list is unhidden, host is a Mac Mini Core Duo, 1,66Ghz, 2GB RAM. Please drop a short notice when I can hide my hosts again...

What's strange is: hitting the "show graphics" button a few minutes before worked perfectly, seems to be a random problem...



ID: 18525 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Billy

Send message
Joined: 29 May 06
Posts: 13
Credit: 1,536,368
RAC: 0
Message 18569 - Posted: 13 Jun 2006, 14:12:12 UTC - in response to Message 18196.  
Last modified: 13 Jun 2006, 14:12:54 UTC

I had a work unit processing at about 80% complete and it seemed to be going normally. I suspended the project (as well as Einstein and Seti) and quit Boinc. I shutdown the computer and restarted. When Boinc started again, it reported this work unit as complete and uploaded it. Either it was stuck before or isn't actually complete. I had a similar thing happen a couple of days ago and it also reported work units complete even though the completion times were unusually short.

https://boinc.bakerlab.org/rosetta/result.php?resultid=23946621

iMac Core Duo, Rosetta version 5.22
ID: 18569 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Stwato

Send message
Joined: 11 Jan 06
Posts: 150
Credit: 655,634
RAC: 0
Message 18570 - Posted: 13 Jun 2006, 14:44:02 UTC
Last modified: 13 Jun 2006, 14:44:25 UTC

This work unit looks very strange in the graphics. Parts of the protein do not appear to be connected to the rest of it. It's like some of its missing. The protein appears very small. I first noticed the problem when I saw that the work unit is only using 20Mb memory and ~70Mb virtual but has nearly 9 million page faults. Assuming it was a small protein I took a look at the graphics and noticed the disjointed parts. Any ideas? I'll let it continue for now (5 hours into an 8 hour work unit)

If you would like me to screen shot it please let me know the best dimensions for posting into the forum as I have nowhere to upload it to.

Cheers
Stwato
ID: 18570 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Keith Akins

Send message
Joined: 22 Oct 05
Posts: 176
Credit: 71,779
RAC: 0
Message 18572 - Posted: 13 Jun 2006, 16:04:07 UTC

After patting myself on the back for so many successful WU's, I get the following error on this unit:

6/12/2006 11:05:25 PM|rosetta@home|Unrecoverable error for result t306__CASP7_JUMPABINITIO_SAVE_ALL_OUT_BARCODE_hom001__680_902_0 ( - exit code -1073741811 (0xc000000d))

This could be a v5.22, BOINC 5.4.9 or a conflict when checking mail with Mozilla Thunderbird.

Win XP Home Service Pack 2

Mozilla Firefox/Thunderbird combo.

Computers are visible and BOINC 5.4.9 should be debug reporting.

Ignore the Linux Computer as mine is a dual booter.
ID: 18572 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Snake Doctor
Avatar

Send message
Joined: 17 Sep 05
Posts: 182
Credit: 6,401,938
RAC: 0
Message 18580 - Posted: 13 Jun 2006, 17:57:34 UTC - in response to Message 18570.  

This work unit looks very strange in the graphics. Parts of the protein do not appear to be connected to the rest of it. It's like some of its missing. The protein appears very small. I first noticed the problem when I saw that the work unit is only using 20Mb memory and ~70Mb virtual but has nearly 9 million page faults. Assuming it was a small protein I took a look at the graphics and noticed the disjointed parts. Any ideas? I'll let it continue for now (5 hours into an 8 hour work unit)

If you would like me to screen shot it please let me know the best dimensions for posting into the forum as I have nowhere to upload it to.

Cheers
Stwato

this is a known problem with a few of the processing techniques being used. Not all the work units are using the same processing approach. In some cases they are only looking at parts of the protein structure and that somehow affects the display.

We Must look for intelligent life on other planets as,
it is becoming increasingly apparent we will not find any on our own.
ID: 18580 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1480
Credit: 4,334,829
RAC: 0
Message 18581 - Posted: 13 Jun 2006, 18:12:31 UTC

ID: 18581 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
TCU Computer Science

Send message
Joined: 7 Dec 05
Posts: 28
Credit: 12,861,977
RAC: 0
Message 18612 - Posted: 14 Jun 2006, 3:21:02 UTC

rosetta 5.22
WU Name: t314__CASP7_ABRELAX_SAVE_ALL_OUT_hom004__666_16529_0
running on Mac OS 10.4.6

BOINC Manager Tasks tab shows CPU Time stuck at 01:30:40 and 15%
top command shows TIME = 28:53:41 and climbing

stopped and restarted BOINC
CPU Time reverted to 01:13:00 and 15% but no longer stuck

Symptoms are identical to my post for ralph 5.18
ID: 18612 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 18614 - Posted: 14 Jun 2006, 4:10:16 UTC

Tony, do you have JP's EMail address? The French guy who always needs new Rosetta .exe EMailed? Could you ask him to see if he can help with this post by French person on Q&A boards? The main parts that translated properly were that he's poor, alone and in a wheelchair. Perhaps JP can read between the lines better than the translation website.
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 18614 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Astro
Avatar

Send message
Joined: 2 Oct 05
Posts: 987
Credit: 500,253
RAC: 0
Message 18625 - Posted: 14 Jun 2006, 10:27:59 UTC - in response to Message 18614.  

Tony, do you have JP's EMail address? The French guy who always needs new Rosetta .exe EMailed? Could you ask him to see if he can help with this post by French person on Q&A boards? The main parts that translated properly were that he's poor, alone and in a wheelchair. Perhaps JP can read between the lines better than the translation website.

Mail sent
ID: 18625 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Stwato

Send message
Joined: 11 Jan 06
Posts: 150
Credit: 655,634
RAC: 0
Message 18626 - Posted: 14 Jun 2006, 10:48:49 UTC

Another question quickly, efficiently and comprehensivly answered!
Thanks a lot guys.

Stwato
ID: 18626 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Ian

Send message
Joined: 14 Apr 06
Posts: 29
Credit: 361,378
RAC: 763
Message 18676 - Posted: 15 Jun 2006, 1:09:09 UTC

Another for you. Result for WU 20318986

Hope these are useful.
Ian Cundell, St Albans, UK
ID: 18676 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Robert Everly

Send message
Joined: 8 Oct 05
Posts: 27
Credit: 665,094
RAC: 0
Message 18758 - Posted: 16 Jun 2006, 3:02:58 UTC
Last modified: 16 Jun 2006, 3:04:18 UTC

Here's one that went crazy. resultid 23976562

On this host hostid 214416

This host does nothing but crunch.

Another host did complete the WU sucessfully.

<core_client_version>5.4.9</core_client_version>
<message>
- exit code -1073741819 (0xc0000005)
</message>
<stderr_txt>
X_NB neighbors
increase MAX_NB in param.cc
allatom: 1434 res: 95 atom: 1 has more than MAX_NB neighbors
increase MAX_NB in param.cc
allatom: 1437 res: 95 atom: 3 has more than MAX_NB neighbors
increase MAX_NB in param.cc
allatom: 1437 res: 95 atom: 3 has more than MAX_NB neighbors
increase MAX_NB in param.cc
allatom: 1437 res: 95 atom: 3 has more than MAX_NB neighbors
increase MAX_NB in param.cc
allatom: 1437 res: 95 atom: 3 has more than MAX_NB neighbors
increase MAX_NB in param.cc
allatom: 1437 res: 95 atom: 3 has more than MAX_NB neighbors
increase MAX_NB in param.cc
allatom: 1450 res: 95 atom: 4 has more than MAX_NB neighbors
increase MAX_NB in param.cc
allatom: 1450 res: 95 atom: 4 has more than MAX_NB neighbors
increase MAX_NB in param.cc
allatom: 1450 res: 95 atom: 4 has more than MAX_NB neighbors
increase MAX_NB in param.cc
allatom: 1450 res: 95 atom: 4 has more than MAX_NB neighbors
increase MAX_NB in param.cc
allatom: 1450 res: 95 atom: 4 has more than MAX_NB neighbors
increase MAX_NB in param.cc
allatom: 1438 res: 95 atom: 5 has more than MAX_NB neighbors
increase MAX_NB in param.cc
allatom: 1438 res: 95 atom: 5 has more than MAX_NB neighbors
increase MAX_NB in param.cc
allatom: 1438 res: 95 atom: 5 has more than MAX_NB neighbors
increase MAX_NB in param.cc
allatom: 1438 res: 95 atom: 5 has more than MAX_NB neighbors
increase MAX_NB in param.cc
allatom: 1438 res: 95 atom: 5 has more than MAX_NB neighbors
increase MAX_NB in param.cc
allatom: 1434 res: 95 atom: 1 has more than MAX_NB neighbors
increase MAX_NB in param.cc
allatom: 1434 res: 95 atom: 1 has more than MAX_NB neighbors
increase MAX_NB in param.cc
allatom: 1434 res: 95 atom: 1 has more than MAX_NB neighbors

*snipped a lot of lines*

allatom: 1567 res: 103 atom: 4 has more than MAX_NB neighbors
increase MAX_NB in param.cc
allatom: 1567 res: 103 atom: 4 has more than MAX_NB neighbors
increase MAX_NB in param.cc
allatom: 1550 res: 103 atom: 5 has more than MAX_NB neighbors
increase MAX_NB in param.cc
allatom: 1550 res: 103 atom: 5 has more than MAX_NB neighbors
increase MAX_NB in param.cc
allatom: 1550 res: 103 atom: 5 has more than MAX_NB neighbors
increase MAX_NB in param.cc
allatom: 1550 res: 103 atom: 5 has more than MAX_NB neighbors


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x0061B538 read attempt to address 0x790C3DE3

Engaging BOINC Windows Runtime Debugger...


ID: 18758 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Martin P.

Send message
Joined: 26 May 06
Posts: 38
Credit: 168,333
RAC: 0
Message 18775 - Posted: 16 Jun 2006, 8:11:37 UTC
Last modified: 16 Jun 2006, 8:12:29 UTC

Problems with download of WUs: Either now work or heavily overcommitted.

https://boinc.bakerlab.org/rosetta/forum_thread.php?id=1703

Hi,

I run SETI@Home, Einstein and Rosetta. Rosetta is set to 20%. The problem is, that once Rosetta has finished all WUs it never downloads any new WUs. Even when the long_term_debt is highly positive (e.g. 30,000 and bigger) it does not download any WUs. The only way to force download is to pause other projects, but in this case it downloads so many WUs that the computer is overcommitted for many days.

I currently run at "Contact server every 3 days". Even when setting this to 0.3 days before suspending the other projects and resetting it after the download it still downloads too many WUs.

This is what I tried:
1. Set "Contact server every 3 days" to 0.3 days.
2. Set SETI@Home and Einstein to "No new work"
3. Suspend SETI@Home and Einstein
4. Rosetta downloads some WUs
5. Set SETI@Home and Einstein to "Allow new work"
6. Restart SETI@Home and Einstein
7. Set "Contact server every xx days" back to 3 days.
8. Now Rosetta downloads even more WUs, which should not happen since SETI and Einstein are both active -> computer is overcommitted.

Is there a solution to this problem? Resetting long_term_dept to 0.0 on all projects does not help either.


The client errors are there because I have other projects running and therefore manually aborted these Work-Units so that the other project get their share as well. Otherwise Rosetta would have taken over my computers exclusively for several days.

I followed your advice and let it run for several days without any interfearance. I did not get any new work for 5 days but tonight it downloaded 30 work-units and is overcommitted again!

Obviously the scheduling of Rosetta does not work at all.


ID: 18775 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · Next

Message boards : Number crunching : Report Problems with Rosetta Version 5.22



©2025 University of Washington
https://www.bakerlab.org