Message boards : Number crunching : WU not reporting CPU time
Author | Message |
---|---|
Charles Dennett Send message Joined: 27 Sep 05 Posts: 102 Credit: 2,078,372 RAC: 303 |
I have a Win98SE machine with a PII running 100% Rosetta. A short while ago it hit one of thos short running WUs. It aborted after 180 seconds. It then started up the next WU in the queue. I happened to notice that the Boinc Manager says it was running but the CPU time for it is empty (actually, it had -- in it.) I brought up the graphics display and it *was* running. The cpu time on the graphics said 0 hrs 0 min 0 sec. I let it keep going. It soon finished the first pass and started on the second. The cpu time jumped to 0 hr 17 min 29 sec. As the second pass is processing now, the clock is not advancing. I'm going to guess that when the second pass finishes, the clock will jump again. I've never seen this before and I think someone else reported this earlier today (and ended up getting into a bit of a tussle with Bill Michael.) I'm going to let this keep going to see how it ends up. The WU in question can be seen at: https://boinc.bakerlab.org/rosetta/workunit.php?wuid=3893355 I've been running RAH on this computer (and my home Linux server) for quite a while now and neither has given me any problems before. Both run RAH 100%. See my profile for why if you're curious. -Charlie |
Tern Send message Joined: 25 Oct 05 Posts: 576 Credit: 4,695,359 RAC: 13 |
The basic problem is that Win9x doesn't "know" about CPU time... there's just not a system call that accurately tracks it. For whatever reason, it appears that Rosetta 4.81 "trips" the problem a lot more often than 4.80 did. On other projects, if you send in a "0 seconds, 0 credit" result, you may never even notice it, because the quorum approach causes you to get the middle credit anyway. Here, it's obvious. The "tussle" had nothing to do with the problem. :-/ On other projects, the recommendation has been to set "leave application in memory" to _no_ if you're on Win9x, which is just the opposite of the recommendation for Rosetta. I don't know if this will help or hurt overall - more errors, but fewer 'wasted' 0's? Flops-counting will solve this, as the time becomes unimportant. I don't know when we'll see that though. |
Charles Dennett Send message Joined: 27 Sep 05 Posts: 102 Credit: 2,078,372 RAC: 303 |
Well, I've been running RAH since September and this is the first time I've seen this particular problem. It's always recorded the cpu time continuously before. When I was running more than one project at a time I did notice the cpu time increasing for a paused WU. Boy, did that inflate the requested credit! Also, Win98 (and I assume the SE variant) cannot report cpu time. What it is actually reporting is wall time, which would explain the increasing time for a paused WU. Oh, and it did take a jump in time when it finished the second trajectory as I suspected it would. Charlie -Charlie |
Charles Dennett Send message Joined: 27 Sep 05 Posts: 102 Credit: 2,078,372 RAC: 303 |
Oh, I realize that. Without going back and rereading the posts, I thought the person was reporting the same (or similar) problem - the cpu clock had apparently stopped. That's all I was trying to say. Sorry if I was not clear on that. I'm in agreement with you on the whole thing. Hope you have a great holiday. Charlie -Charlie |
Tern Send message Joined: 25 Oct 05 Posts: 576 Credit: 4,695,359 RAC: 13 |
You might find this thread at SETI interesting. It seems that after an error, the next result processed is more likely to have "CPU timing" issues. So it may be the "short WUs" here, rather than the app version change... |
Charles Dennett Send message Joined: 27 Sep 05 Posts: 102 Credit: 2,078,372 RAC: 303 |
Well, I won't do that again! I let the WU continue until it finished. Even though it said it used some amount of CPU time (probably about 18 hours - hey give me a break! It's an old slow 300 MHz PII!), when all was said and done and it was reported, it showed up as 0 cpu seconds for 0 credit. Next time I restart the WU and if that doesn't work, I'll reboot. WU is https://boinc.bakerlab.org/rosetta/workunit.php?wuid=3893355 Charlie -Charlie |
River~~ Send message Joined: 15 Dec 05 Posts: 761 Credit: 285,578 RAC: 0 |
... Next time I restart the WU and if that doesn't work, I'll reboot. I think if you see the clock issue, reboot right away. Win98 is clearly confused and even if the WU runs OK you don't know what other problems the operating system has got. Once an OS starts to crumble, the longer you leave it the worse the symptoms! River~~ |
Message boards :
Number crunching :
WU not reporting CPU time
©2024 University of Washington
https://www.bakerlab.org