Message boards : Number crunching : can it really be so slow ?
Previous · 1 · 2 · 3 · Next
Author | Message |
---|---|
Padanian Send message Joined: 27 Sep 05 Posts: 14 Credit: 15,190 RAC: 0 |
I had a WU stuck at 83.66% for 5 hours (2.8GHz P4). It suddenly jumped to 91.66, and then back to 83.66% per 5 or 6 times. I had to kill it. I'll try this one out. Thanks |
Doug Worrall Send message Joined: 19 Sep 05 Posts: 60 Credit: 58,445 RAC: 0 |
For sure that should not be stuck for that long.Myseld have put a W/U through the wringer,and it had nada errors.First,went into Root,oops,thats o,k.,Woops Rebotted on a Rosetta W/u due to swap issues.Then tried "Aborting",doesnt Boinc Manager start doing the "Chicken".Flashing and Frozen.Did Cntrl Alt Backsp, no go,dohhhh.Had to Manually Reset P.C.Signed back into user account went to Boinc and it was there,and the W/U {thinking w/u corrupted} I Paused it,Ran Predictor when I found it was back.36 hours later took Rosetta off Pause let w/u finish. Result Rosetta@home Result Result ID 84726 Name 1pvaA_abrelax_no_cst_15903_0 Workunit 72423 Created 26 Sep 2005 21:18:38 Sent 30 Sep 2005 22:17:27 Received 1 Oct 2005 23:12:02 Server state Over Outcome Success Client state Done Exit status 0 (0x0) Computer ID 3584 Report deadline 28 Oct 2005 22:17:27 CPU time 8660.681376 stderr out 4.43 Validate state Valid Claimed credit 10.5439759610782 Granted credit 10.5439759610782 application version 4.77 WOW,This is great news for either a great O.S sooo Stable or that W/u was a Superduper W/U and if so,and Rosetta does not need to be handled with "Kid Gloves" then will begin Round Robin again.I thought a R. unit could not be stopped and you could not Quit Boinc?Apologise for getting Off Topic Doug Worrall Boinc Synergy |
Rebirther Send message Joined: 17 Sep 05 Posts: 116 Credit: 41,315 RAC: 0 |
|
[AF>france>pas-de-calais]symaski62 Send message Joined: 19 Sep 05 Posts: 47 Credit: 33,871 RAC: 0 |
|
Pconfig Send message Joined: 26 Sep 05 Posts: 6 Credit: 56,254 RAC: 0 |
Note: I haven't got a hanging wu since the new type of wu's are handed out... Proud member of the Dutch Power Cows |
Padanian Send message Joined: 27 Sep 05 Posts: 14 Credit: 15,190 RAC: 0 |
I had a WU stuck at 83.66% for 5 hours (2.8GHz P4). It suddenly jumped to 91.66, and then back to 83.66% per 5 or 6 times. I had to kill it. It worked out https://boinc.bakerlab.org/rosetta/workunit.php?wuid=112718 |
Juerschi Send message Joined: 17 Sep 05 Posts: 8 Credit: 14,145 RAC: 0 |
Today I had my first WU stucking at 1% over 1hour. A normal WU is crunched at 2 hours at my host. I closed Boinc, started it again and WU finished in normal time without any error |
STE\/E Send message Joined: 17 Sep 05 Posts: 125 Credit: 4,100,301 RAC: 93 |
I get 4-6 WU's a day stuck at the 1% Mark, all my PC's the are P4 HT type. If I didn't monitor them fairly closely I can easily envision all of them after a few days just doing nothing but sitting there with WU's at the 1% Completion Mark. This is most definitely a problem that needs to be addressed by the Dev's as it is a lot of wasted CPU Time. Some of the stuck 1% WU's I don't catch until after 3 or 4 hr's of running time. The WU's should have been done by then but I either have to shut down BOINC & restart it again or Abort the WU & try to get another WU to pass the 1% Mark. |
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1018 Credit: 4,334,829 RAC: 0 |
Can anyone who sees this error (stuck at 1%) send me the stdout.txt file while it is still stuck and running located in the boinc client installation (for windows, it is likely located at c:/Program Files/BOINC/slots/0/). Please, only the first one who can do this, reply to this post and I will confirm and then you can email me the file, so I just get one email rather then emails from all of you. Thanks! |
STE\/E Send message Joined: 17 Sep 05 Posts: 125 Credit: 4,100,301 RAC: 93 |
I have 1 WU right stuck @ 1% now showing 3:03:30 running time with 303:12:15 to Completion Time. I can send the stdout.txt file to you if you still need it. I'm going to just suspend the WU for now and start another one ... PS: File sent David ... |
Shaktai Send message Joined: 21 Sep 05 Posts: 56 Credit: 575,419 RAC: 0 |
I have seen them stuck at 1% on both Windows and Mac. Seems to occur more often on the dual core boxes, but that could just be because they crunch more units. Had one stuck today for about 12 hours on my AMD 64 X2 4200. A simple quit and restart of BOINC has fixed the problem every time. maybe 4-5 a week for me. Team MacNN - The best Macintosh team ever. |
Padanian Send message Joined: 27 Sep 05 Posts: 14 Credit: 15,190 RAC: 0 |
It happened to me with this last WU. 2 hours spent on 1%. I had to abort the WU and resume, after that a computation error showed up. And no, I don't release Rosetta from memory. Maybe I shall try to attribute rosetta to one logical CPU only to avoid what Shaktai is supposing here above me. I think the devs must look seriously into this, because is affecting the whole BOINC efficiency very much. |
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1018 Credit: 4,334,829 RAC: 0 |
It happened to me with this last WU. 2 hours spent on 1%. I'm looking into it. Can everyone on this thread (no one else as I don't want too many emails) who is having the problem send me the stdout.txt files in slot0 and slot1 (if you are running dual cpu) in the BOINC installation? dekim at u.washington.edu. Thanks, David K |
Shaktai Send message Joined: 21 Sep 05 Posts: 56 Credit: 575,419 RAC: 0 |
I'm looking into it. Can everyone on this thread (no one else as I don't want too many emails) who is having the problem send me the stdout.txt files in slot0 and slot1 (if you are running dual cpu) in the BOINC installation? dekim at u.washington.edu. You've got mail. 2 computers. One a Pentium D 840 dual core and the other an AMD 64 X2 4200 dual core. Both just happened to have 1 each work unit frozen at 1%. One for 6 hours and 1 for 18 hours. Hope that helps. Team MacNN - The best Macintosh team ever. |
STE\/E Send message Joined: 17 Sep 05 Posts: 125 Credit: 4,100,301 RAC: 93 |
David, I'm sending you 2 stdout.txt files to you because both WU's were stuck @ 91.67% this morning when I got up. Both WU's were between 5 & 6 Hours CPU running time when the normal completion time for this P4 3.4Ghz HT PC is around 3 1/2 hours. I Suspended the Project & shut down BOINC & Re-Started it again & the WU's showed about 3 hours running time @ 91.67% . After they ran for a little while the % for both WU's actually dropped back to 83.33% with the completion time climbing up over 50 hours... PS: Both of the WU's have finished now without any futher complications ... The files I sent can still be of some help hopefully to see why the WU's were stuck in the first place ... |
Shaktai Send message Joined: 21 Sep 05 Posts: 56 Credit: 575,419 RAC: 0 |
I've been watching my Windows PC's closer and the only machines that seem to get stuck at 1% are the dual core boxes, both the AMD and the Intel. My P4 3.4 ghz with HT hasn't had the problem yet that I've caught it, and none of the single CPU boxes have had it recently. The dual core boxes are experiencing it almost daily now, where one of the two units will get stuck at 1%. Each time, a simple restart of BOINC fixes it. Team MacNN - The best Macintosh team ever. |
Padanian Send message Joined: 27 Sep 05 Posts: 14 Credit: 15,190 RAC: 0 |
I've been watching my Windows PC's closer and the only machines that seem to get stuck at 1% are the dual core boxes, both the AMD and the Intel. My P4 3.4 ghz with HT hasn't had the problem yet that I've caught it, and none of the single CPU boxes have had it recently. The dual core boxes are experiencing it almost daily now, where one of the two units will get stuck at 1%. Each time, a simple restart of BOINC fixes it. No, i've just got one HT machine, and it get stuck this morning at 83.33% |
Shaktai Send message Joined: 21 Sep 05 Posts: 56 Credit: 575,419 RAC: 0 |
I've been watching my Windows PC's closer and the only machines that seem to get stuck at 1% are the dual core boxes, both the AMD and the Intel. My P4 3.4 ghz with HT hasn't had the problem yet that I've caught it, and none of the single CPU boxes have had it recently. The dual core boxes are experiencing it almost daily now, where one of the two units will get stuck at 1%. Each time, a simple restart of BOINC fixes it. There are two different problems. One where it gets stuck at 1%, and the one where it gets stuck at 83.33%. For many folks the solution to the 83.33% issue was to "leave application in memory" and if running more then one project, to extend the time between "switches" from the default of 60 minutes, to 90-180 minutes (depending on the speed of the machine). With the 83.33%, the calculations become much more complex and take longer. If you are switching between projects every 60 minutes, then it may not reach the next step (91.66%) before the switch and then will restart at the 83.33% when it switches back. David Kim is looking at both of these issues for fixes. Team MacNN - The best Macintosh team ever. |
Padanian Send message Joined: 27 Sep 05 Posts: 14 Credit: 15,190 RAC: 0 |
I've been watching my Windows PC's closer and the only machines that seem to get stuck at 1% are the dual core boxes, both the AMD and the Intel. My P4 3.4 ghz with HT hasn't had the problem yet that I've caught it, and none of the single CPU boxes have had it recently. The dual core boxes are experiencing it almost daily now, where one of the two units will get stuck at 1%. Each time, a simple restart of BOINC fixes it. I'm well aware of the workaround, though neither leaving the app in memory, nor increasing the switching time helped me. |
Fuzzy Hollynoodles Send message Joined: 7 Oct 05 Posts: 234 Credit: 15,020 RAC: 0 |
Thanks, I changed my setting to 120 minutes. I have one of these 83.33 % units at the moment and I'll hate to abort it! But if 120 min isn't enough, I'll change it to 180 min. If it's still stuck after this, I'll abort it! But as I am the second one, who's getting this WU, one could get a suspicion that it's bad! [b]"I'm trying to maintain a shred of dignity in this world." - Me[/b] |
Message boards :
Number crunching :
can it really be so slow ?
©2024 University of Washington
https://www.bakerlab.org