Message boards : Number crunching : Report Problems with Rosetta Version 5.13
Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next
Author | Message |
---|---|
senatoralex85 Send message Joined: 27 Sep 05 Posts: 66 Credit: 169,644 RAC: 0 |
I see. Thank you to all of those who answered my question. I did not think to look in the FAQ because I had never seen this problem hence it was not "frequently" happening. I also didn't know about adjusting the time preferences either! Good info on the FAQ ***Edit**** I am still a little confused. Does the time to completion decrease after each checkpoint or does it decrease after each completion of a model? When I first started crunching for Rosetta, the project crunched three models. The first two in low resolution and the third in high resolution. In that case, the progress was often stuck at 70%. Is the same logic followed here with the CASP workunits. The time to completion is not adjusted until the first model is completed? |
David Emigh Send message Joined: 13 Mar 06 Posts: 158 Credit: 417,178 RAC: 0 |
I am running another BOINC Application. Lets see what happens there. Jose, I can truly empathize with the frustration you are feeling. If you scroll back through the old posts in this thread, you can see that, at one time, I had a string of at least 41 consecutive errors! But that is in the past now. I have nearly twenty consecutive SUCCESSFUL workunits with the exact same machine and setup. I have a last ditch proposal for you: Don't run Rosetta at all when you need to use the computer, but do let it run overnight. Let the errors fall where they may. Perhaps when you get through a certain group of WU's, the error situation will resolve itself. Before you tuck the computer in for an evening of number crunching, disable the screen saver. Power the machine down completely. You might go so far (as I did) as to unplug the power supply so that the 5v standby power is interrupted long enough for the capacitors to discharge. Do a cold restart, get the BOINC client running and attached to Rosetta, turn off the monitor and go to bed. You might wake up to a page full of errors, but amidst those errors there may be some successes. My experience was that once the successes resumed, they were continuous. If all else fails, you might consider attaching to Ralph@home I personally place a tremendous value in the science of the Rosseta project. I hope that you can continue to be a part of it in some way. Rosie, Rosie, she's our gal, If she can't do it, no one shall! |
Moderator9 Volunteer moderator Send message Joined: 22 Jan 06 Posts: 1014 Credit: 0 RAC: 0 |
I see. Thank you to all of those who answered my question. I did not think to look in the FAQ because I had never seen this problem hence it was not "frequently" happening. I also didn't know about adjusting the time preferences either! Good info on the FAQ The time actually rises during processing. When the percent complete changes, the time will jump down to a lower value. From there it will rise until the percent complete changes again. It is like it is running backwards for a while then resets itself. In terms of the way they run there is no difference between a CASP work unit, and any other work unit. The only difference is that for CASP we do not know the structure in advance. The goal of CASP is to see if we can figure out the structure. The normal Work units are use to develop the methods for figuring out the structures. So in those cases we know what the computer should be looking for, and the idea is to see if the software is good enough to find it. You might want to take a look at This thread. There is a lot of information that explains all of this in far more detail. Moderator9 ROSETTA@home FAQ Moderator Contact |
NJMHoffmann Send message Joined: 17 Dec 05 Posts: 45 Credit: 45,891 RAC: 0 |
I am still a little confused. Does the time to completion decrease after each checkpoint or does it decrease after each completion of a model? The time to completion decrease happens only, when the percent completed is recalculated. For now this is done only when a model is finished. I would like a recalculation at every checkpoint (that should be possible). Norbert |
Moderator9 Volunteer moderator Send message Joined: 22 Jan 06 Posts: 1014 Credit: 0 RAC: 0 |
I am still a little confused. Does the time to completion decrease after each checkpoint or does it decrease after each completion of a model? I have submitted a request for just that feature. I would like the whole number of the percentage to represent the percent complete, the 1/10th to be the checkpoint, and the 1/1000 to be the location in the model as it is now. Such that 15.459 would represent 15% complete, checkpoint 4, position 59. That way we can see the checkpointing. If a problem occurs the project can see the position, and we have a rough idea what percent of the work unit is complete. We will have to see what they come up with. I would also like to see a checkpoint message in the messages tab of BOINC monitor. All of this would allow people to manage the shutdown of their system to best advantage. Moderator9 ROSETTA@home FAQ Moderator Contact |
Seth Aaronson Send message Joined: 5 Mar 06 Posts: 18 Credit: 3,976 RAC: 0 |
I have totally new error now: 5/16/2006 5:26:38 PM|rosetta@home|Unrecoverable error for result T0283_FACONTACTS_hom006_508_12872_0 (Incorrect function. (0x1) - exit code 1 (0x1)) I still have to crtl-alt-del, end the process, and decline to debug it program. What does this new error mean? -Seth |
Moderator9 Volunteer moderator Send message Joined: 22 Jan 06 Posts: 1014 Credit: 0 RAC: 0 |
I have totally new error now: It found something in the work unit it did not like. Rhiju or Bin will have to take a look to answer your question. Moderator9 ROSETTA@home FAQ Moderator Contact |
Seth Aaronson Send message Joined: 5 Mar 06 Posts: 18 Credit: 3,976 RAC: 0 |
I have totally new error now: Thanks. I wonder what they'll find. It also now looks like BOINC manager has downloaded version 5.16: 5/16/2006 7:27:21 PM|rosetta@home|Finished download of file rosetta_5.16_windows_intelx86.exe It still seems to freeze my machine. This is now my latest error message: 5/16/2006 9:22:21 PM|rosetta@home|Unrecoverable error for result TEST_HOMOLOG_ABRELAX_hom003_1fna__503_50195_0 (Incorrect function. (0x1) - exit code 1 (0x1)) -Seth |
[B^S] sTrey Send message Joined: 25 Sep 05 Posts: 16 Credit: 15,524 RAC: 0 |
My box hung during the 5 minutes Rosetta screensaver was allowed to run. (at 82.93% done, model 8 step 339873, 3 hr 24 min 28 sec cpu). The display was frozen on the Rosetta screensaver image and I had to power-cycle the box. The BOINC Log shows: 5/16/2006 21:52:03 rosetta not responding to screensaver, exiting 5/16/2006 21:52:09 Unrecoverable error for result HOMOLOG_ABRELAX_hom003_t283__505_33632_0 ( - exit code -1 (0xffffffff)) It then went on to crunch CPDN, supposedly. The above was preceded by a Windows application event log error 1000, timestamp 21:51:57: Faulting application rosetta_5.13_windows_intelx86.exe, version 0.0.0.0, faulting module rosetta_5.13_windows_intelx86.exe, version 0.0.0.0, fault address 0x0056b66e. Result link I do have an ATI graphics card (Radeon 9000, circa 2003 )but I've been running boinc with this hw for over a year without this happening, and drivers are up to date. Not willing to crunch 5.16 with only 1 GB memory, did it with ralph and it's way too greedy. Will wait until more memory arrives but concerned about this error because it effectively crashed my system. |
Simon Walker Send message Joined: 17 Oct 05 Posts: 3 Credit: 459,592 RAC: 0 |
Well woke up this AM to new Rosetta errors, I was wondering is it at all possible for a Rosetta process being worked on by one CPU (in a dual Core) to have the other CPU come along and try to run it as well, resulting in a crash? Just a query. Anyway Boinc says this about the crash : 17/05/2006 07:43:36|rosetta@home|Unrecoverable error for result T0283_FACONTACTS_hom006_508_10773_0 ( - exit code -1073741811 (0xc000000d)) On the FX-53 showing these problems the messages are : 16/05/2006 07:50:40|rosetta@home|Unrecoverable error for result JUMP_ALLBARCODES_ANTIPARALLEL_1tul__SAVE_ALL_OUT_491_30163_0 ( - exit code -1073741811 (0xc000000d)) 16/05/2006 16:40:43|rosetta@home|Unrecoverable error for result T0283_FACONTACTS_hom003_508_8903_0 ( - exit code -1073741811 (0xc000000d)) 17/05/2006 07:55:34|rosetta@home|Unrecoverable error for result HOMOLOG_ABRELAX_hom003_t283__505_32419_0 ( - exit code -1073741811 (0xc000000d)) Both of these machines were running unattended, and when the monitors were powered up the error messages were present. The Dual core processor result listing : https://boinc.bakerlab.org/rosetta/results.php?hostid=145422 The FX-53 result listing : https://boinc.bakerlab.org/rosetta/results.php?hostid=193752 If you look at the FX-53 result listing you will be able to see that of the 12 completed units, 6 failed (only 1 of which was credited) Whats going wrong? Active PC's https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=145422 https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=193752 Results: https://boinc.bakerlab.org/rosetta/results.php?userid=5150 |
Moderator9 Volunteer moderator Send message Joined: 22 Jan 06 Posts: 1014 Credit: 0 RAC: 0 |
Well woke up this AM to new Rosetta errors, I was wondering is it at all possible for a Rosetta process being worked on by one CPU (in a dual Core) to have the other CPU come along and try to run it as well, resulting in a crash? Please see This FAQ concerning the credits. As for the dual core issue, anything is possible, but it is very unlikely that both cores would run the same Work unit. Moderator9 ROSETTA@home FAQ Moderator Contact |
Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0 |
I was wondering is it at all possible for a Rosetta process being worked on by one CPU (in a dual Core) to have the other CPU come along and try to run it as well, resulting in a crash? I run two dual core systems. BOINC assigns the work, and will not assign the same WU to two different processes. You may see two different WUs running at the same time. This is the power of the dual core. But BOINC is designed to assure no such conflicts occur. As for credit, when the "client state" still shows "computing", and yet you've reported the WU, and have credit claimed, it just means that the daily process to grant credit for client errors hasn't been run yet. You should see credit tomorrow. Your machine seems to be successfully crunching several (indeed one had 99) models before the errors. So, even though the last model had a problem, you are still producing valuable results and will be granted credit... in 24hrs. Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ |
senatoralex85 Send message Joined: 27 Sep 05 Posts: 66 Credit: 169,644 RAC: 0 |
I am still a little confused. Does the time to completion decrease after each checkpoint or does it decrease after each completion of a model? Thank you! The checkpoint button is a great idea! |
Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0 |
I would also like to see a checkpoint message in the messages tab of BOINC monitor. That might get to be a lot of messages, I always prefer less messages, but if you could tell by the % complete when a checkpoint occurs, then you wouldn't even have to bring up the graphic to tell. Perhaps a field in the graphic for the CPU time of the last checkpoint? That might help people more readily see the reality of what checkpoints are, and what work they're throwing away. And yet still not add any more math to show like a count up of time since last checkpoint. Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ |
Moderator9 Volunteer moderator Send message Joined: 22 Jan 06 Posts: 1014 Credit: 0 RAC: 0 |
I would also like to see a checkpoint message in the messages tab of BOINC monitor. What they are doing will work for the widest possible audience, and will satisfy the broadest possible range of tastes. A single message line of the message tab every 20-30 min is not going to tax anyones machine. Moderator9 ROSETTA@home FAQ Moderator Contact |
Jose Send message Joined: 28 Mar 06 Posts: 820 Credit: 48,297 RAC: 0 |
I did too, But I tried what you proposed : one more error, a phantom Wu and a Wu that is so slow a slug can out race it. This after I had gotten the computer to use more CPU for Rosetta (around 90% then , now it is barely 19 %) And what I have been reading is that 5.16 has not been the panacea promised. OH I am running 5.16. So no more tasks after this one. G-d knows I tried. Ps the only thing I have not tried is a human sacrifice and I doubt Moderator nine would volunteer . ( Lame attempt at joke.) This and no other is the root from which a Tyrant springs; when he first appears he is a protector.†Plato |
Moderator9 Volunteer moderator Send message Joined: 22 Jan 06 Posts: 1014 Credit: 0 RAC: 0 |
|
Ian Send message Joined: 14 Apr 06 Posts: 29 Credit: 326,863 RAC: 577 |
Failure from Tuesday - only just noticed it. https://boinc.bakerlab.org/rosetta/result.php?resultid=20513809 One from Monday https://boinc.bakerlab.org/rosetta/result.php?resultid=20390954 And looking at my list there's one for 5.16 that seems to have happened before my 5.13 WUs finished (posted on the 5,16 thread in a mo). My results list seems to be out of chronological order. Ian Cundell, St Albans, UK |
Moderator9 Volunteer moderator Send message Joined: 22 Jan 06 Posts: 1014 Credit: 0 RAC: 0 |
|
David Emigh Send message Joined: 13 Mar 06 Posts: 158 Credit: 417,178 RAC: 0 |
I wish you the best success at whatever BOINC projects you are able to run =) Please consider Ralph @home as one possibility. If the particular configuration of your computer produces a lot of errors in the Rosetta app, I can only imagine that the full error codes available in the Ralph app would be of tremendous value. I also run SETI @home and SZTAKI. Perhaps I will see you on the message boards at those projects... Okay, probably NOT at the SZTAKI site, unless we both learn Hungarian :p Rosie, Rosie, she's our gal, If she can't do it, no one shall! |
Message boards :
Number crunching :
Report Problems with Rosetta Version 5.13
©2024 University of Washington
https://www.bakerlab.org