Message boards : Number crunching : Miscellaneous Work Unit Errors
Previous · 1 · 2 · 3 · 4 · 5 . . . 10 · Next
Author | Message |
---|---|
Moderator9 Volunteer moderator Send message Joined: 22 Jan 06 Posts: 1014 Credit: 0 RAC: 0 |
I've just had another failure on my dothan machine using version 4.82. XS Vietnam Soldier I am going to delete your original post so as to eliminate your e-mail address in the form in which you posted it to prevent "sniffers" from picking it up and sending you a lot of junk mail. I have modified it in this post to prevent that problem. I will contact you off line if you wish on this issue, but the short of it is that there are a number of science differences between 4.81 and 4.82. Some of the problems you have seen are Work Unit related and are being fixed already. A few of the others should be taken care of in 4.83. Because of this it would actually cause you more trouble to downgrade at this point because the work units are also changing to fit the new software. In any case, unless specific steps are taken, any downgrade would be replaced by the server at the first opportunity. This is how new versions are distributed. The only method to successfully downgrade stops the automatic software upgrades from occurring, and this will result in even more errors as the new work units are issued. Moderator9 ROSETTA@home FAQ Moderator Contact |
1fast6 Send message Joined: 20 Feb 06 Posts: 8 Credit: 6,982,405 RAC: 0 |
new to the project about 24 hours.... unfortunately I have more client errors than successes to report... https://boinc.bakerlab.org/rosetta/result.php?resultid=11903545 https://boinc.bakerlab.org/rosetta/result.php?resultid=11869249 https://boinc.bakerlab.org/rosetta/result.php?resultid=11859808 https://boinc.bakerlab.org/rosetta/result.php?resultid=11859451 https://boinc.bakerlab.org/rosetta/result.php?resultid=11859033 https://boinc.bakerlab.org/rosetta/result.php?resultid=11857947 https://boinc.bakerlab.org/rosetta/result.php?resultid=11855989 https://boinc.bakerlab.org/rosetta/result.php?resultid=11834370 https://boinc.bakerlab.org/rosetta/result.php?resultid=11833252 https://boinc.bakerlab.org/rosetta/result.php?resultid=11831111 https://boinc.bakerlab.org/rosetta/result.php?resultid=11830550 https://boinc.bakerlab.org/rosetta/result.php?resultid=11830128 its good to be king... |
Lee Carre Send message Joined: 6 Oct 05 Posts: 96 Credit: 79,331 RAC: 0 |
|
Morten Starkeby Send message Joined: 18 Feb 06 Posts: 10 Credit: 472,142 RAC: 0 |
I got the following error today: 22/02/2006 07:40:42|rosetta@home|Unrecoverable error for result PRODUCTION_ABINITIO_INCREASECYCLES50_1opd__317_273_0 ( - exit code -1073741811 (0xc000000d)) https://boinc.bakerlab.org/rosetta/result.php?resultid=11852924 on computer https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=167418 Leave in memory is set to YES Application switch time: 120 min Client version: 5.3.19 (BBC CCE version) |
Kevin Send message Joined: 15 Jan 06 Posts: 21 Credit: 109,496 RAC: 0 |
|
Angus Send message Joined: 17 Sep 05 Posts: 412 Credit: 321,053 RAC: 0 |
I just had an entire daily quota of those error out -as fast as they could download. Proudly Banned from Predictator@Home and now Cosmology@home as well. Added SETI to the list today. Temporary ban only - so need to work harder :) "You can't fix stupid" (Ron White) |
Christian Hagen Send message Joined: 26 Sep 05 Posts: 5 Credit: 46,795 RAC: 0 |
Same here: 2006-02-22 08:37:42 [rosetta@home] Unrecoverable error for result 1btn_fullatom_dec04_3_318_15_2 (process exited with code 1 (0x1)) |
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1018 Credit: 4,334,829 RAC: 0 |
I just cancelled this batch. There is definitely something wrong with this batch and I alerted the person in our lab who submitted it. |
ecafkid Send message Joined: 5 Oct 05 Posts: 40 Credit: 15,177,319 RAC: 0 |
Here is a list of errors i have gotten today. 2/22/2006 3:15:02 AM||Couldn't connect to hostname [boinc.bakerlab.org] 2/22/2006 3:15:05 AM|rosetta@home|Scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi failed with a return value of -106 2/22/2006 3:15:05 AM|rosetta@home|No schedulers responded 2/22/2006 3:15:10 AM|rosetta@home|Deferring communication with project for 55 seconds 2/22/2006 3:16:05 AM|rosetta@home|Sending scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi 2/22/2006 3:16:05 AM|rosetta@home|Reason: To report results 2/22/2006 3:16:05 AM|rosetta@home|Reporting 5 results 2/22/2006 3:16:28 AM||Couldn't connect to hostname [boinc.bakerlab.org] 2/22/2006 3:16:30 AM|rosetta@home|Scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi failed with a return value of -106 2/22/2006 3:16:30 AM|rosetta@home|No schedulers responded 2/22/2006 3:16:35 AM|rosetta@home|Deferring communication with project for 54 seconds 2/22/2006 3:17:30 AM|rosetta@home|Sending scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi 2/22/2006 3:17:30 AM|rosetta@home|Reason: To report results 2/22/2006 3:17:30 AM|rosetta@home|Reporting 5 results 2/22/2006 3:17:52 AM||Couldn't connect to hostname [boinc.bakerlab.org] 2/22/2006 3:17:55 AM|rosetta@home|Scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi failed with a return value of -106 2/22/2006 3:17:55 AM|rosetta@home|No schedulers responded 2/22/2006 3:18:00 AM|rosetta@home|Deferring communication with project for 55 seconds 2/22/2006 3:18:56 AM|rosetta@home|Sending scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi 2/22/2006 3:18:56 AM|rosetta@home|Reason: To report results 2/22/2006 3:18:56 AM|rosetta@home|Reporting 5 results 2/22/2006 3:19:18 AM||Couldn't connect to hostname [boinc.bakerlab.org] 2/22/2006 3:19:21 AM|rosetta@home|Scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi failed with a return value of -106 2/22/2006 3:19:21 AM|rosetta@home|No schedulers responded 2/22/2006 3:19:26 AM|rosetta@home|Deferring communication with project for 2 minutes and 22 seconds 2/22/2006 3:21:51 AM|rosetta@home|Sending scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi 2/22/2006 3:21:51 AM|rosetta@home|Reason: To report results 2/22/2006 3:21:51 AM|rosetta@home|Reporting 5 results 2/22/2006 3:22:14 AM||Couldn't connect to hostname [boinc.bakerlab.org] 2/22/2006 3:22:16 AM|rosetta@home|Scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi failed with a return value of -106 2/22/2006 3:22:16 AM|rosetta@home|No schedulers responded 2/22/2006 3:22:21 AM|rosetta@home|Deferring communication with project for 1 minutes and 52 seconds 2/22/2006 3:24:16 AM|rosetta@home|Sending scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi 2/22/2006 3:24:16 AM|rosetta@home|Reason: To report results 2/22/2006 3:24:16 AM|rosetta@home|Reporting 5 results 2/22/2006 3:24:38 AM||Couldn't connect to hostname [boinc.bakerlab.org] 2/22/2006 3:24:41 AM|rosetta@home|Scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi failed with a return value of -106 2/22/2006 3:24:41 AM|rosetta@home|No schedulers responded 2/22/2006 3:24:46 AM|rosetta@home|Deferring communication with project for 12 minutes and 25 seconds 2/22/2006 3:37:12 AM|rosetta@home|Sending scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi 2/22/2006 3:37:12 AM|rosetta@home|Reason: To report results 2/22/2006 3:37:12 AM|rosetta@home|Reporting 5 results 2/22/2006 3:37:34 AM||Couldn't connect to hostname [boinc.bakerlab.org] 2/22/2006 3:37:37 AM|rosetta@home|Scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi failed with a return value of -106 2/22/2006 3:37:37 AM|rosetta@home|No schedulers responded 2/22/2006 3:37:42 AM|rosetta@home|Deferring communication with project for 27 minutes and 2 seconds 2/22/2006 4:04:48 AM|rosetta@home|Sending scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi 2/22/2006 4:04:48 AM|rosetta@home|Reason: To report results 2/22/2006 4:04:48 AM|rosetta@home|Reporting 5 results 2/22/2006 4:05:10 AM||Couldn't connect to hostname [boinc.bakerlab.org] 2/22/2006 4:05:13 AM|rosetta@home|Scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi failed with a return value of -106 2/22/2006 4:05:13 AM|rosetta@home|No schedulers responded 2/22/2006 4:05:18 AM|rosetta@home|Deferring communication with project for 2 hours, 14 minutes, and 36 seconds 2/22/2006 5:05:21 AM|rosetta@home|Deferring communication with project for 1 hours, 14 minutes, and 34 seconds 2/22/2006 6:05:24 AM|rosetta@home|Deferring communication with project for 14 minutes and 31 seconds 2/22/2006 6:20:00 AM|rosetta@home|Sending scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi 2/22/2006 6:20:00 AM|rosetta@home|Reason: To report results 2/22/2006 6:20:00 AM|rosetta@home|Reporting 5 results 2/22/2006 6:20:05 AM|rosetta@home|Scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi succeeded 2/22/2006 7:22:44 AM|rosetta@home|Unrecoverable error for result FAST_ABINITIO_CENTROID_PACKING_2ci2I_304_817_1 ( - exit code -1073741811 (0xc000000d)) 2/22/2006 7:22:44 AM||request_reschedule_cpus: process exited 2/22/2006 7:22:44 AM|rosetta@home|Computation for result FAST_ABINITIO_CENTROID_PACKING_2ci2I_304_817_1 finished 2/22/2006 7:22:44 AM|rosetta@home|Starting result FAST_ABINITIO_CENTROID_PACKING_1kpeA_305_837_1 using rosetta version 482 2/22/2006 7:22:46 AM|rosetta@home|Unrecoverable error for result FAST_ABINITIO_CENTROID_PACKING_1ptq__304_97_2 ( - exit code -1073741811 (0xc000000d)) 2/22/2006 7:22:46 AM||request_reschedule_cpus: process exited 2/22/2006 7:22:46 AM|rosetta@home|Computation for result FAST_ABINITIO_CENTROID_PACKING_1ptq__304_97_2 finished 2/22/2006 7:22:46 AM|rosetta@home|Starting result PRODUCTION_ABINITIO_QUADRUPLELONGRANGEANTIPARALLEL_2chf__311_1155_2 using rosetta version 482 2/22/2006 7:22:48 AM|rosetta@home|Deferring communication with project for 57 seconds 2/22/2006 3:02:41 PM|rosetta@home|Unrecoverable error for result FAST_ABINITIO_CENTROID_PACKING_1kpeA_305_837_1 ( - exit code -1073741811 (0xc000000d)) |
Astro Send message Joined: 2 Oct 05 Posts: 987 Credit: 500,253 RAC: 0 |
From the looks of your message log, it appears the servers were down for a while. I'm not sure though it that really occured. Here's what the Wiki says about 106 ERR_IO -106 system I/O A failure to read/write from the Disk Drive, or in the case of network transmissions, a failure to send and receive data. A failure to send and receive data to and from a Project Server generally means that during the transmission to or from the Project Server, or a router along the way has reset the TCP connection between the BOINC Client Software and the Project Server. This can happen for a number of different reasons, such as the Project Server being overloaded, or packets being lost in route. Note: This is probably the most common error after an outage of the Project. |
SallyH Send message Joined: 4 Nov 05 Posts: 6 Credit: 4,799,395 RAC: 0 |
I am having more errors than success since 4.82. please look at my errors for machine #166709 Thanks... |
Morten Starkeby Send message Joined: 18 Feb 06 Posts: 10 Credit: 472,142 RAC: 0 |
Received another client error: Result ID: 11875565 <core_client_version>5.3.19</core_client_version> <message> - exit code -1073741811 (0xc000000d) </message> <stderr_txt> # random seed: 1054947 # cpu_run_time_pref: 28800 </stderr_txt> |
Interboy Send message Joined: 28 Sep 05 Posts: 3 Credit: 730,102 RAC: 0 |
Received this error: Result ID: https://boinc.bakerlab.org/rosetta/result.php?resultid=11838015 <core_client_version>5.2.12</core_client_version> <message> - exit code -1073741819 (0xc0000005) </message> <stderr_txt> # cpu_run_time_pref: 28800 # random seed: 1072606 # DONE :: 1 starting structures built 15 (nstruct) times # This process generated 15 decoys from 15 attempts ***UNHANDLED EXCEPTION**** Reason: Access Violation (0xc0000005) at address 0x7C83AEC6 write attempt to address 0x00000004 </stderr_txt> |
Hermes Send message Joined: 17 Sep 05 Posts: 2 Credit: 113,946 RAC: 0 |
|
Beezlebub Send message Joined: 18 Oct 05 Posts: 40 Credit: 260,375 RAC: 0 |
I have no WU Errors since starting this computer but the completion times are getting out of hand: 11869025 9562409 21 Feb 2006 7:24:03 UTC 22 Feb 2006 14:43:46 UTC Over Success Done 86,361.25 200.03 200.03 11839823 9534950 21 Feb 2006 10:03:07 UTC 23 Feb 2006 12:38:51 UTC Over Success Done 86,134.84 199.50 199.50 11789045 9520604 19 Feb 2006 15:42:12 UTC 20 Feb 2006 2:48:30 UTC Over Success Done 28,705.73 57.98 57.98 11738428 9498702 19 Feb 2006 7:07:55 UTC 19 Feb 2006 15:42:12 UTC Over Success Done 27,955.66 57.77 57.77 11738306 9498580 19 Feb 2006 7:07:55 UTC 19 Feb 2006 18:06:22 UTC Over Success Done 29,220.14 60.38 60.38 11716364 9491951 19 Feb 2006 1:55:07 UTC 19 Feb 2006 7:07:55 UTC Over Success Done 11,418.19 23.60 23.60 11716363 9491950 19 Feb 2006 1:55:07 UTC 19 Feb 2006 7:07:55 UTC Over Success Done 10,428.11 21.55 21.55 11708415 9484498 18 Feb 2006 23:32:31 UTC 19 Feb 2006 4:19:22 UTC Over Success Done 8,232.70 17.01 17.01 When I started: 11317355 9181424 14 Feb 2006 16:17:00 UTC 15 Feb 2006 0:53:42 UTC Over Success Done 1,835.91 3.79 3.79 11317354 9181423 14 Feb 2006 16:17:00 UTC 15 Feb 2006 0:53:42 UTC Over Success Done 4,243.00 8.77 8.77 11270626 9137943 14 Feb 2006 11:27:14 UTC 14 Feb 2006 15:03:21 UTC Over Success Done 4,213.20 8.93 8.93 11270602 9137919 14 Feb 2006 11:27:14 UTC 14 Feb 2006 15:03:21 UTC Over Success Done 3,303.31 7.00 7.00 11257355 9125508 14 Feb 2006 10:17:26 UTC 14 Feb 2006 15:03:21 UTC Over Success Done 4,391.36 9.30 9.30 11251768 9120391 14 Feb 2006 9:31:17 UTC 14 Feb 2006 11:27:14 UTC Over Success Done 2,127.61 4.51 4.51 11195037 9075044 14 Feb 2006 3:46:26 UTC 14 Feb 2006 6:34:39 UTC Over Success Done 1,288.11 2.73 2.73 11195025 5605330 14 Feb 2006 3:46:26 UTC 14 Feb 2006 15:03:21 UTC Over Success Done 8,683.08 18.40 18.40 11194984 9074992 14 Feb 2006 3:46:26 UTC 14 Feb 2006 6:34:39 UTC Over Success Done 4,997.23 10.59 10.59 e6600 quad @ 2.5ghz 2418 floating point 5227 integer e6750 dual @ 3.71ghz 3598 floating point 7918 integer |
Hoelder1in Send message Joined: 30 Sep 05 Posts: 169 Credit: 3,915,947 RAC: 0 |
I have no WU Errors since starting this computer but the completion times are getting out of hand: Your WU completion times are very close to one day (86400 sec ;-) because you set your 'target CPU run time' in the Rosetta@home preferences to 1 day - set it to, e.g., two hours if you prefer 'short' WUs. |
Beezlebub Send message Joined: 18 Oct 05 Posts: 40 Credit: 260,375 RAC: 0 |
Dummy me :) wasn't paying attention to the new work times,,,,forgot all about it. Sorry e6600 quad @ 2.5ghz 2418 floating point 5227 integer e6750 dual @ 3.71ghz 3598 floating point 7918 integer |
doc :) Send message Joined: 4 Oct 05 Posts: 47 Credit: 1,106,102 RAC: 0 |
|
SallyH Send message Joined: 4 Nov 05 Posts: 6 Credit: 4,799,395 RAC: 0 |
Here are the errors that I am getting on 2 AMD opteron systems. I have a X2 4600 that is not getting these errors. I have rebuilt these systems twice in the past few days and still getting an above 90 percent failure rate. MY RAC has dropped from 7700 to now below 6600 since 4.82. 11966026 Name PRODUCTION_ABINITIO_RANDOMFRAG_1pgx__309_163_2 Workunit 9313855 Created 23 Feb 2006 12:50:18 UTC Sent 23 Feb 2006 22:17:57 UTC Received 24 Feb 2006 9:48:18 UTC Server state Over Outcome Client error Client state Done Exit status -1073741811 (0xc000000d) Computer ID 168708 Report deadline 2 Mar 2006 22:17:57 UTC CPU time 14284.203125 stderr out <core_client_version>5.2.13</core_client_version> <message> - exit code -1073741811 (0xc000000d) </message> <stderr_txt> # random seed: 1522481 # cpu_run_time_pref: 14400 # DONE :: 1 starting structures built 99 (nstruct) times # This process generated 99 decoys from 103 attempts </stderr_txt> Validate state Invalid Claimed credit 115.888749997702 Granted credit 0 application version 4.82 |
Moderator9 Volunteer moderator Send message Joined: 22 Jan 06 Posts: 1014 Credit: 0 RAC: 0 |
For people having many work Unit Errors!! I have received an e-mail from Dr. Baker with information for any of you who are having a lot of Work Unit errors. "Could you help us to recommend to people having problems with lots of WU to set the target run time to a smaller value like 2 hours. We think there aren't any new bugs, just with longer run times it is more likely for a WU to have problems." So if you are having a lot of errors please reset your Time setting to 2 hours and see if that helps. Moderator9 ROSETTA@home FAQ Moderator Contact |
Message boards :
Number crunching :
Miscellaneous Work Unit Errors
©2024 University of Washington
https://www.bakerlab.org