Message boards : Number crunching : Client errors
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · Next
Author | Message |
---|---|
Rick A. Sponholz Send message Joined: 6 Sep 10 Posts: 14 Credit: 7,823,937 RAC: 0 |
[quote]So... To continue my testing... I uninstalled everything nVidia, restarted, got some Rosetta tasks, and let them process. The scheduler request (which reported the completed tasks), did not have any blocks for <coprocs>, for <coproc_cuda>, or for <coproc_opencl>. And guess what. It worked, and the Task details shows "Outcome: Success" and "application version: 3.45" So... Rosetta Project admins... The post right above has the scheduler request that results in Client error. This post right here has the scheduler request that results in Success. It seems that the bug may be with your code's processing/parsing of a scheduler request xml block that has 1 or more of the following tags: <coprocs>, <coproc_cuda>, <coproc_opencl> ... possibly also dependent on the details within those tags. Please find a way to fix this! I've done everything I possibly can to help you. It's on you now to actually fix it! Until you do, you are WASTING TONS OF PEOPLE'S TIME (since all their work gets invalidated) [quote] Dear Rosetta Admins, Please respond to JacobKlein's post. Do you acknowledge the problem? Do you agree with the cause? When are you going to fix this? I'm waiting for your reply, and I bet many other volunteers are waiting too. Thanks In Advance, Rick |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,171,954 RAC: 3,083 |
In the past when they did respond the Admins, or their reps, said things are progressing normally and the errors are within acceptable limits. Dr. David Anderson, the main Boinc Programmer, DID come here and a newer version of Boinc DID seem to help, but the newer still versions seem to be having the same old problems again. IMO there are too many other Boinc Projects that can use our help, if someone can't get Rosetta to work properly they should just move on and put Rosetta on the back burner until the Project gets its act together. This fishing spot seems to be taking all the bait but not letting people actually catch the fish, it is time to move to another spot, the sea is full of other spots! |
JAMES DORISIO Send message Joined: 25 Dec 05 Posts: 15 Credit: 201,211,526 RAC: 36,125 |
This problem appears to be fixed for me. I have 5 linux computers running under my name and as of 3-20-13 they all started returning successful tasks, before this they were all client errors. I have made no changes to them not even a reboot. I don't see any posts from Rosetta admins that they changed anything but something has changed. I would suggest that anybody with this problem enable new work to see if this problem is really fixed. Thanks Jim |
Jacob Klein Send message Joined: 3 Jul 07 Posts: 15 Credit: 7,098,747 RAC: 0 |
The problem appears to have been fixed. I don't know the details of the fix, but from what I understand... The Rosetta Admins and David Anderson were able to work together, using the scheduler requests that I posted, to identify and correct the problem. Rosetta's server software was recompiled, but I'm not sure if code was changed/updated; http://srv4.bakerlab.org/rosetta_cgi/cgi still shows scheduler version 605. My tasks are currently resulting in: Server state: Over Outcome: Success Client state: Done Exit status: 0 (0x0) Validate state: Valid application version: 3.45 ... even though I have nVidia GPUs capable of OpenCL work, using the latest nVidia drivers! So, as far as I can tell, IT'S FIXED! Thank you Rosetta for finally resolving this issue. |
Rayburner Send message Joined: 4 Oct 05 Posts: 32 Credit: 16,518,823 RAC: 0 |
corrected typo The Problem is fixed for me too. My host returned its first valid results ever. Thank You Rosetta and David Anderson for finally fixing it. Rayburner The problem appears to have been fixed. |
Chilean Send message Joined: 16 Oct 05 Posts: 711 Credit: 26,694,507 RAC: 0 |
The problem appears to have been fixed. I thought the latest drivers and latest boinc versions fixed the issue... |
Jacob Klein Send message Joined: 3 Jul 07 Posts: 15 Credit: 7,098,747 RAC: 0 |
I thought the latest drivers and latest boinc versions fixed the issue... You thought wrong, as evidenced by my results in a post within this thread: https://boinc.bakerlab.org/rosetta/forum_thread.php?id=6177&nowrap=true#75240 But, I do believe the issue is truly fixed now. |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,171,954 RAC: 3,083 |
I thought the latest drivers and latest boinc versions fixed the issue... I am AMAZED and VERY HAPPY that they were able to keep working on it until it was HOPEFULLY fixed permanently!! WELL DONE ROSETTA!!! Now they just need to put it on the home page to let EVERYONE know it is fixed!! |
Student Send message Joined: 24 Oct 06 Posts: 3 Credit: 57,404 RAC: 0 |
Upgrading Boinc Manager from 7.0.28 to 7.0.52 didn't help. Upgrade nVidia driver to 314.07 helped :-D. So far 8 successful tasks. Great!!!! |
Jacob Klein Send message Joined: 3 Jul 07 Posts: 15 Credit: 7,098,747 RAC: 0 |
Upgrading Boinc Manager from 7.0.28 to 7.0.52 didn't help. Upgrade nVidia driver to 314.07 helped :-D. So far 8 successful tasks. Great!!!! I just want it to be clear that: The real fix was that the server software was recompiled on 3/20/2013. It had nothing to do with BOINC Manager version, and nothing to do with nVidia driver version. |
Polian Send message Joined: 21 Sep 05 Posts: 152 Credit: 10,141,266 RAC: 0 |
Upgrading Boinc Manager from 7.0.28 to 7.0.52 didn't help. Upgrade nVidia driver to 314.07 helped :-D. So far 8 successful tasks. Great!!!! I'm confused, there were users who reported success before 3/20 by upgrading drivers. |
Jacob Klein Send message Joined: 3 Jul 07 Posts: 15 Credit: 7,098,747 RAC: 0 |
I'm confused, there were users who reported success before 3/20 by upgrading drivers. From the best I can tell, the task result status really depended on whether your scheduler request had certain xml data in it or not, when reporting a result. It's probable that clients without nVidia GPUs never saw the problem. It's possible that certain configurations of GPUs never had problems. And it's also possible that certain driver versions had certain data in the xml scheduler request that triggered the problem, while other driver versions didn't have that data. The problem itself was that the server was not properly handling certain xml within the scheduler requests. I don't know any more details than that. I had notified the Rosetta project admins, as well as David Anderson. I do know they recompiled the server software on 3/20/2013, and I was asked to re-test it. And, when I re-tested using the exact same GPU configuration along with the exact same BOINC Manager version and nVidia driver versions, the results were now successful instead of Client Error. It's the recompile of the server software that fixed this nasty error. |
Kenneth DePrizio Send message Joined: 15 Jul 07 Posts: 15 Credit: 3,123,915 RAC: 0 |
Yeah, I can confirm that the server software is what fixed the problem for me. Upgrading drivers did nothing before. |
JugNut Send message Joined: 30 Apr 12 Posts: 11 Credit: 2,437,453 RAC: 0 |
Wooo Hoo my first successful WU's since installing my new cruncher (i7 3930k) some 6 months ago. FINALLY!!! |
googloo Send message Joined: 15 Sep 06 Posts: 133 Credit: 22,722,686 RAC: 3,784 |
Just to clarify for me: is it now safe to upgrade both my BOINC Manager and my nVidia driver to the latest versions? I'm at 6.12.34 (x64) and 306.97, respectively. That's how I've been avoiding client error for several months. Processor: 8 GenuineIntel Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz [Family 6 Model 58 Stepping 9] Processor: 256.00 KB cache Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 sse4_2 syscall nx lm vmx smx tm2 popcnt aes pbe OS: Microsoft Windows 7: Professional x64 Edition, Service Pack 1, (06.01.7601.00) Memory: 15.94 GB physical, 31.88 GB virtual Disk: 197.98 GB total, 124.20 GB free NVIDIA GPU 0: GeForce GT 620 (driver version 30697, CUDA version 5000, compute capability 2.1, 2048MB, 182 GFLOPS peak) |
Jacob Klein Send message Joined: 3 Jul 07 Posts: 15 Credit: 7,098,747 RAC: 0 |
Just to clarify for me: is it now safe to upgrade both my BOINC Manager and my nVidia driver to the latest versions? I'm at 6.12.34 (x64) and 306.97, respectively. That's how I've been avoiding client error for several months. I'd say, Yes. I'm running the latest beta BOINC, BOINC v7.0.58 x64 Beta, along with the latest beta nVidia drivers, v314.21 x64 Beta, without any "Client error" problems. If you only want to run "release" software, then I believe BOINC v7.0.28 and nVidia v314.07 WHQL should both work just fine. Good luck! |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Thank you JacobKlein for all of your efforts and initiative to help get this resolved. To try and word what Jacob has been saying a little differently, the results that the clients were producing, regardless of BOINC or GPU versions, were good. This is probably part of what threw off the Rosetta admins. But some of the server code that runs was flagging things as invalid. So the tasks were treated as invalid (i.e. reissued to another host), but credit granted by the daily script that awards credit for such outcomes (reflecting the value to the project of learning what's working and what's not). Since the root cause was in the server validation code, and it's now been revised to handle the various XML tags that various combinations of GPU driver version, and BOINC version might throw at it, you should not see any problems with making changes on your client host. Rosetta Moderator: Mod.Sense |
Jacob Klein Send message Joined: 3 Jul 07 Posts: 15 Credit: 7,098,747 RAC: 0 |
Awesome! So, not only is it a win for the clients (results are now successful, no more confusion or error reporting), but it's a win for Rosetta too (work is done quicker since results will only have to calculated once, instead of having to re-send the work unit on every Client Error.) I just spent some money to upgrade my computer's RAM from 6GB to 12GB, so I can do more Rosetta tasks at once. Keep on crunching! |
Stephen Miller Send message Joined: 18 Sep 05 Posts: 13 Credit: 16,294,215 RAC: 0 |
I nominate JacobKlein for the Rosetta@home's HERO AWARD or some method on the FRONT PAGE to acknowledge his persistent effort to get credit where credit is due (double entendre intended). Crunching since 18 Sep 2005 |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,171,954 RAC: 3,083 |
I nominate JacobKlein for the Rosetta@home's HERO AWARD or some method on the FRONT PAGE to acknowledge his persistent effort to get credit where credit is due (double entendre intended). I second that!! |
Message boards :
Number crunching :
Client errors
©2024 University of Washington
https://www.bakerlab.org