Client error for ALL tasks since a month. (Linux 64 bits boinc 7.0.27)

Message boards : Number crunching : Client error for ALL tasks since a month. (Linux 64 bits boinc 7.0.27)

To post messages, you must log in.

AuthorMessage
Daedalus

Send message
Joined: 1 Aug 08
Posts: 39
Credit: 10,106,899
RAC: 377
Message 73657 - Posted: 19 Aug 2012, 19:59:46 UTC

Hello all,

I just checked my stats and noticed all my rosetta tasks on my quad 95550 failed since around a month. The computer ID is 1549082. I will not paste here the complete error messages because there are so many tasks. You can browse them from the forum i think.

Here is ONE of the results i got.

<core_client_version>7.0.27</core_client_version>
<![CDATA[
<stderr_txt>
[2012- 8-12 17:57:47:] :: BOINC:: Initializing ... ok.
[2012- 8-12 17:57:47:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev48292.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/C6_hexamer_ferredoxin_2klo_0001_INPUT__16_0001_A_fragments_fold_data.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
Setting up folding (abrelax) ...
Beginning folding (abrelax) ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
Starting work on structure: _00001
Starting work on structure: _00002
Starting work on structure: _00003
Starting work on structure: _00004
Starting work on structure: _00005
Starting work on structure: _00006
Starting work on structure: _00007
Starting work on structure: _00008
Starting work on structure: _00009
Starting work on structure: _00010
Starting work on structure: _00011
Starting work on structure: _00012
Starting work on structure: _00013
Starting work on structure: _00014
Starting work on structure: _00015
Starting work on structure: _00016
Starting work on structure: _00017
Starting work on structure: _00018
Starting work on structure: _00019
Starting work on structure: _00020
Starting work on structure: _00021
Starting work on structure: _00022
Starting work on structure: _00023
Starting work on structure: _00024
Starting work on structure: _00025
Starting work on structure: _00026
Starting work on structure: _00027
Starting work on structure: _00028
Starting work on structure: _00029
Starting work on structure: _00030
Starting work on structure: _00031
======================================================
DONE :: 1 starting structures 10478.3 cpu seconds
This process generated 31 decoys from 31 attempts
======================================================
BOINC :: WS_max 3.75376e+255

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish

</stderr_txt>
]]>


I do not know how to debug these codes and would appreciate any help. I set the project to refuse new work. By the way, it continues to hog my memory in spite of my preferences.
ID: 73657 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 73658 - Posted: 19 Aug 2012, 22:01:57 UTC

There is an existing thread on problems that some are having when running under the newer v7 versions of BOINC: Link to thread
Rosetta Moderator: Mod.Sense
ID: 73658 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mad_Max

Send message
Joined: 31 Dec 09
Posts: 209
Credit: 25,992,337
RAC: 12,090
Message 73660 - Posted: 20 Aug 2012, 1:52:12 UTC

This bug not bound to BOINC 7.x !
One of my team members faced with the same problem: link to host
Among other things (after few reboots of the computer and reset the project) have tried to install few different versions of BOINC 7юч (7.0.25, 7.0.28), including 6.x (6.12.34) (including "clean install" - with full deletion of all BOINC related files)
The problem persisted. In the end, even reinstalled the operating system (Windows 7), but even this did not help too.
So it seems this bug on the server side(validator?). But rare bug.

Only one thing that we found out - when the this error hits in the task logs is application version is missing:
application version ---
all other looks OK
ID: 73660 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Link
Avatar

Send message
Joined: 4 May 07
Posts: 356
Credit: 382,349
RAC: 0
Message 73661 - Posted: 20 Aug 2012, 7:17:04 UTC - in response to Message 73660.  

This bug not bound to BOINC 7.x !
One of my team members faced with the same problem: link to host

This computer is using 7.0.28. Most of the WUs were completed successfully by the wingmen, except for those who had BOINC v7 as well.

.
ID: 73661 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mad_Max

Send message
Joined: 31 Dec 09
Posts: 209
Credit: 25,992,337
RAC: 12,090
Message 73662 - Posted: 20 Aug 2012, 13:05:06 UTC
Last modified: 20 Aug 2012, 13:10:31 UTC

You see only a last iteration after Windows reinstall now.
6.х version was installed over a month ago. Tasks performed on it have already been removed from the server. But it was exactly the same error on 6.12.34: no any errors in log(except for the missing version of app) and 100% of tasks marked as invalid.
ID: 73662 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Daedalus

Send message
Joined: 1 Aug 08
Posts: 39
Credit: 10,106,899
RAC: 377
Message 73663 - Posted: 20 Aug 2012, 17:34:41 UTC

Ok, i just suspend the project then. What's funny is my current version of BOINC is tied to my OS which is a long term support version so i will not change my BOINC version anytime soon.
ID: 73663 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mad_Max

Send message
Joined: 31 Dec 09
Posts: 209
Credit: 25,992,337
RAC: 12,090
Message 73802 - Posted: 10 Sep 2012, 10:12:55 UTC - in response to Message 73661.  

This bug not bound to BOINC 7.x !
One of my team members faced with the same problem: link to host

This computer is using 7.0.28. Most of the WUs were completed successfully by the wingmen, except for those who had BOINC v7 as well.

Another computer has been hit by this bug.
https://boinc.bakerlab.org/rosetta/results.php?hostid=1555324
Reseting project and removing BOINC version 7.0.28 and install version 6.12.34 has not changed anything. This is the third computer with this bug in our team, which can not be solved by any means from user side.
In all cases, it does not depend on the version of BOINC, so I'm sure that the problem is not with 7.x BOINC versions or OS version, and is a is on the side of the project.

P.S.
Now, when people in your team have caught the bug we will immediately recommend to switch to a different dc project and do not waste time trying to resolve it.
ID: 73802 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Link
Avatar

Send message
Joined: 4 May 07
Posts: 356
Credit: 382,349
RAC: 0
Message 73806 - Posted: 10 Sep 2012, 13:09:50 UTC - in response to Message 73802.  

Another computer has been hit by this bug.
https://boinc.bakerlab.org/rosetta/results.php?hostid=1555324
Reseting project and removing BOINC version 7.0.28 and install version 6.12.34 has not changed anything. This is the third computer with this bug in our team, which can not be solved by any means from user side.

So it's the third copmuter out of how many? Those computers must have something in common, when on all of them the same error occur. The BOINC version might be indeed not responsible this time, in the results of this host there are wingmen, who completed the WUs successfully with v7.



Now, when people in your team have caught the bug we will immediately recommend to switch to a different dc project and do not waste time trying to resolve it.

Well, that's for sure better than wasting days or weeks of CPU time on errors.
.
ID: 73806 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mad_Max

Send message
Joined: 31 Dec 09
Posts: 209
Credit: 25,992,337
RAC: 12,090
Message 73823 - Posted: 13 Sep 2012, 1:13:38 UTC

BOINC 6.10.18 not help too: https://boinc.bakerlab.org/rosetta/result.php?resultid=531273684
Even new user account registration not working: https://boinc.bakerlab.org/rosetta/results.php?hostid=1564473

> Those computers must have something in common
Something really must be common. But, we can not figure out what it might be.
While it is clear that this is NOT a BOINC version(try lot of them) and NOT the type / version of the operating system (this BUG seen on Windows 7, Windows XP and Linux).
ID: 73823 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Link
Avatar

Send message
Joined: 4 May 07
Posts: 356
Credit: 382,349
RAC: 0
Message 73827 - Posted: 13 Sep 2012, 10:20:19 UTC - in response to Message 73823.  

Something really must be common. But, we can not figure out what it might be.
While it is clear that this is NOT a BOINC version(try lot of them) and NOT the type / version of the operating system (this BUG seen on Windows 7, Windows XP and Linux).

Well, there might be something to try... not a real solution, but it would be good to know.
.
ID: 73827 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mad_Max

Send message
Joined: 31 Dec 09
Posts: 209
Credit: 25,992,337
RAC: 12,090
Message 73832 - Posted: 13 Sep 2012, 21:45:00 UTC
Last modified: 13 Sep 2012, 21:48:07 UTC

Yeah, try to suspend calculations of all other projects, it will be one of the following steps before detach comps from R@H. And try to install the 32bit version of BOINC (still the only general param for all computers with the bugs that we found out - 64-bit versions of the OS and Intel processor).

While I wrote this message list of computers with the bug has grown - now are 4 of them (i count only in our team now): https://boinc.bakerlab.org/rosetta/results.php?hostid=1564583

And it was a just bought new computer first time attached to BOINC...
ID: 73832 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Mattmon

Send message
Joined: 22 Mar 06
Posts: 1
Credit: 8,365,527
RAC: 0
Message 73836 - Posted: 15 Sep 2012, 11:30:44 UTC

I too am having this problem with getting client errors for all my tasks.

https://boinc.bakerlab.org/results.php?hostid=1551216

What is most troubling about this problem, is that on the CLIENT, it shows that it completed successfully with "Ready to report". It doesn't even show that it resulted in an error at all! It is only after checking my Tasks that I see that it was Client error. For all I know, this problem could have been happening for MONTHS, and I only found out about it now because I just happened to check my tasks! I now have this project set for no new tasks.
ID: 73836 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1895
Credit: 9,169,305
RAC: 3,078
Message 73837 - Posted: 15 Sep 2012, 11:43:18 UTC - in response to Message 73836.  

I too am having this problem with getting client errors for all my tasks.

https://boinc.bakerlab.org/results.php?hostid=1551216

What is most troubling about this problem, is that on the CLIENT, it shows that it completed successfully with "Ready to report". It doesn't even show that it resulted in an error at all! It is only after checking my Tasks that I see that it was Client error. For all I know, this problem could have been happening for MONTHS, and I only found out about it now because I just happened to check my tasks! I now have this project set for no new tasks.


THAT is the symptom and what makes this so hard for us users to solve, the units finish up just fine but then crash and burn when they are sent to Rosetta! The Raplh units, the beta side of Rosetta, does not have this problem. Rosetta says they have no idea and that they are getting enough good units that the bad ones make no difference to them. They ARE looking into it, but have been for months now and haven't found a thing! Lots of us think it is the Server causing the problems but Rosetta keeps saying they have no idea!

On the personal side I have removed all of my pc's that have gpu's in them for crunching, and have downgraded to the Boinc version 6 series for those that are still here and I am fine again. At some point my pc's WILL be using their gpu's again and/or I will go back to projects that require version 7 of Boinc, and Rosetta will be cut off for me again. It is disappointing to me that Rosetta would do this but it IS their project not mine!
ID: 73837 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mad_Max

Send message
Joined: 31 Dec 09
Posts: 209
Credit: 25,992,337
RAC: 12,090
Message 73882 - Posted: 24 Sep 2012, 11:54:34 UTC
Last modified: 24 Sep 2012, 11:54:57 UTC

One more computer with BUG: https://boinc.bakerlab.org/rosetta/results.php?hostid=1565305
Now 5 comps in our team can not crunch R@H at all due this bug.

To devs: you already losing power of 32 CPU cores due this bug in our team only!
Not sure if it representative to all cruncers, but if it is then you losing power of near ~800 CPU cores total (extrapolation based on ratio of computing power of our team = ~4% and the total for the entire project)

P.S.
Turning off other BOINC projects not help too.
ID: 73882 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Chilean
Avatar

Send message
Joined: 16 Oct 05
Posts: 711
Credit: 26,694,507
RAC: 0
Message 73983 - Posted: 9 Oct 2012, 1:22:54 UTC
Last modified: 9 Oct 2012, 1:25:18 UTC

[url]Same here: https://boinc.bakerlab.org/rosetta/results.php?hostid=1569699[/url]

Brand new PC btw.
ID: 73983 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bjarke

Send message
Joined: 14 Feb 06
Posts: 5
Credit: 1,634,479
RAC: 0
Message 74017 - Posted: 14 Oct 2012, 10:26:59 UTC

Two of my PC's seem to suddenly have gotten this error.
Host 1569102
Host 1569084

Both PC's have worked fine with Rosetta before. I will just switch to another project where the programmers know what they are doing and where they are actually putting an effort in solving such problems...
ID: 74017 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 74022 - Posted: 15 Oct 2012, 19:36:17 UTC

So sorry for the long hiatus. Been busy. I'll try to figure out what is going on. Just downloaded the client on a new computer and will see what happens. Thank you Mod.Sense for alerting us.
ID: 74022 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 74023 - Posted: 15 Oct 2012, 19:47:19 UTC

My new client (7.0.31 on Mac) finished a task successfully. I'll try different platforms. Are people getting these errors on the most current boinc client version?
ID: 74023 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1895
Credit: 9,169,305
RAC: 3,078
Message 74029 - Posted: 16 Oct 2012, 11:40:20 UTC - in response to Message 74023.  

My new client (7.0.31 on Mac) finished a task successfully. I'll try different platforms. Are people getting these errors on the most current boinc client version?


I am a Windows guy, not a Linux guy, but in Windows yes even the 7.0.27 version of Boinc had problems. It works fine under the 6.?.? versions of Boinc, but even then ONLY if you are NOT using a gpu for another project. There are too many gpu projects out there now to ignore them and solely crunch for Rosetta, especially for those of us with multiple pc's. We were told that Rosetta is not interested in supporting the version 7.?.? series of Boinc as they were happy with the number of crunchers they had. These errors started a LONG time ago and even the people at the Beta project worked on it but couldn't find any problems. In fact the Beta had NO problems, but here at Rosie there are nothing but problems. I also crunch for the Project Albert, it REQUIRES Boinc version 7.0.27 or above, that makes crunching for Rosie impossible! I HOPE you can fix this problem, LOTS of people are dropping Rosie from their list of Boinc projects and if you can't get new people coming in eventually Rosie WILL CLOSE DOWN.
ID: 74029 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mad_Max

Send message
Joined: 31 Dec 09
Posts: 209
Credit: 25,992,337
RAC: 12,090
Message 74033 - Posted: 16 Oct 2012, 17:25:09 UTC - in response to Message 74023.  

My new client (7.0.31 on Mac) finished a task successfully. I'll try different platforms. Are people getting these errors on the most current boinc client version?

Read this entire thread first (if you have not done it already).
Suumary: bug seen with different versions of BOINC (6.12.34, 7.0.25, 7.0.27, 7.0.28) on different platforms (Windows XP, Windows 7, Windows 8, Linux). Resetting project or reinstalling BOINC - not help.
On 2 comps with BUG even reintalling OS and тurning off other BOINC projects not help too.
we could not figure out what all computer affected to this error have in common. Very strange bug...

P.S.
Links to task exaples not working because computers has been switched to other projects long ago.
ID: 74033 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : Client error for ALL tasks since a month. (Linux 64 bits boinc 7.0.27)



©2024 University of Washington
https://www.bakerlab.org