Message boards : Number crunching : Miscellaneous Work Unit Errors
Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · Next
Author | Message |
---|---|
Laurenu2 Send message Joined: 6 Nov 05 Posts: 57 Credit: 3,818,778 RAC: 0 |
David What is up with all the BAD W/U I must have close to 500 4/8/2006 3:22:39 PM|rosetta@home|Unrecoverable error for result HBLR_1.0_1hz6_426_5085_0 ( - exit code -1073741819 (0xc0000005)) messages on all of my nodes running Rosetta. You Asked for my/our help here Well you must realize that if you continue to give out W/U that stall our PC, or that can not complete the DC'ers here WILL lose faith in this project and the quality of the data we produce. As I see it you must do some or more in-house testing before a new Ver.# or WU batch. And as stated below you should NOT do any releases at a time that you can not be a full staff to make fixes. This project cost Me/us a lot of money to run Not counting my time. And I do expect a lot more the 1 week of good W/U's to keep running Please remove the bad W/U's or Ver# that is causing this so we/I can stop spinning or Cooling fans If You Want The Best You Must forget The Rest ---------------And Join Free-DC---------------- |
Moderator9 Volunteer moderator Send message Joined: 22 Jan 06 Posts: 1014 Credit: 0 RAC: 0 |
I just got this message from David Kim who is currently addressing this problem. "I just reverted back to the previous app. You should notice a version 4.98 now, which is really version 4.83 for windows and mac, and 4.82 for linux." You all should see some relief very soon. If you force an update it should load the new version once the server is set up. Moderator9 ROSETTA@home FAQ Moderator Contact |
rbpeake Send message Joined: 25 Sep 05 Posts: 168 Credit: 247,828 RAC: 0 |
David What is up with all the BAD W/U I must have close to 500 This is what hurts the project when no one is around to respond, like on a weekend. As I mentioned below, new releases should be made when the project team is immediately available and on high alert to respond quickly to any problems (they probably should be on extra high alert the first day or so of any new release, even one that has been beta tested). With so many people making so many contributions in energy, time, and effort, a project needs to be extra prudent with any changes. Just my 2 cents! :) Regards, Bob P. |
Dimitris Hatzopoulos Send message Joined: 5 Jan 06 Posts: 336 Credit: 80,939 RAC: 0 |
OK, "new" v4.98 downloaded for Win and running already. Meanwile, for the last 4 hours, I had set WU-runtime = 1hr and all WUs completed OK without errors on Win too. Btw, I just re-connected to RALPH (Rosetta's ALPHA test project), which I had set to "No new work" recently, as my PCs never had any problems for the past 2 months anyway LOL Best UFO Resources Wikipedia R@h How-To: Join Distributed Computing projects that benefit humanity |
Dorphas Send message Joined: 14 Feb 06 Posts: 2 Credit: 60,275 RAC: 0 |
since last night ver .97 is causing TONS of errors on all my computers. this is why people avoid this project. i have had about enough of these bugs myself. about to cross the line and crunch for something else more stable. |
Nuadormrac Send message Joined: 27 Sep 05 Posts: 37 Credit: 202,469 RAC: 0 |
I am new here and using version 4.97. I too have almost all my WU's failing with similar codes. ***unrecoverable error for result HBLR_1.0_2reb_426_1061_0 (-exit code -1073741819 (0xc0000005))*** Yeah, sorry we didn't catch something sooner, but on the WU types we were testing earlier, everything was going up and validating successfully. That was until we got the HBLR units in the morning, and only then did failures start comming out... A couple HBLR units did validate on my machine here (over on RALPH), but the vast majority were a no go, with many of them getting 3 failures, and some 2 with 1 success. Others didn't have a report then, so not sure what became of them... Until we got the newer WU types, wasn't able to report on any problems with them obviously, and could only report on the older types... Sorry us testers weren't able to catch this problem before it started rolling out... |
Klaws Send message Joined: 23 Nov 05 Posts: 1 Credit: 0 RAC: 0 |
Yesterday, my machine (W2K SP4) displayed a message box which told me that there's no CD in the CD drive (drive F:). The message box was caused by Boinc (which runs nothing else but Rosetta). I tried the "Abort" button, the message box popped up again, and I hit it again. The Boinc manager showed me the follwing line: 08.04.2006 04:23:46|rosetta@home|Unrecoverable error for result HBLR_1.0_1hz6_425_4925_0 ( - exit code -1073741819 (0xc0000005)) The next time the message box appeared, I tried "Continue". Same effect. I then placed a CD in the drive. No more message boxes, but the Boinc manager insists on: 08.04.2006 21:39:33|rosetta@home|Unrecoverable error for result FARELAX_NOFILTERS_1ptq__427_57_0 ( - exit code -1073741819 (0xc0000005)) Seems every work unit fails. HOWEVER, the question remains: why does Boinc/Rosetta attempt to access my CD drive? Boinc is installed on drive E: (C: is the "Windows drive", D: is the swap drive, E: is my "app drive", F: is the CD burner (no media inside when the first errors occured), G: is the DVD burner (no media inside at any "error time"), H: is my "scratch drive", I: is a (historic) FAT drive for MS-DOS dual boot...yup, J: is a FAT USB Flash disc (was not present when the errors occured), K: is a removable 120GB HD, L: is a removeable FAT HD (for backup purposes), M: is a remote 120GB "Firewire" HD (powered down and not accessible at this time), P: is a removable 200GB HD, U: is a virtal DVD ROM, drives V:-Z: are network drives, some available, some not...and the missing drive letters correspond to currently removed HDs). The reason for my elaborate listing of all drive letters is that Boinc/Rosetta appears to be interested only in drive F: (the CD drive, now with CD inside), and doesn't proceed to drive G:, even if drive F: now works. That relieves me somehow from the suspicion that the software is trying to spy out my machine! Still, I consider the bahavior suspicious. At least it produces errors, so something should get fixed ASAP, IMHO! Best regards, KLaus - Klaws |
KWSN Sir Clark Send message Joined: 18 Sep 05 Posts: 46 Credit: 387,432 RAC: 0 |
WU 13577480 produced: <core_client_version>5.2.13</core_client_version> <message> - exit code -1073741819 (0xc0000005) </message> <stderr_txt> # random seed: 1359430 # cpu_run_time_pref: 7200 No heartbeat from core client for 31 sec - exiting # cpu_run_time_pref: 7200 ***UNHANDLED EXCEPTION**** Reason: Access Violation (0xc0000005) at address 0x007022EA read attempt to address 0x0EB4FC6C Dump of the Worker(offending) thread: 1: 04/09/06 15:06:25 Dump of the Timer thread: 2: 04/09/06 15:06:25 Dump of the Graphics thread: 3: 04/09/06 15:06:25 Exiting... </stderr_txt> Only BOINC running, apps sent to be pre-empted. Errored out three times now so WU has cancelled. |
Dimitris Hatzopoulos Send message Joined: 5 Jan 06 Posts: 336 Credit: 80,939 RAC: 0 |
since last night ver .97 is causing TONS of errors on all my computers. this is why people avoid this project. i have had about enough of these bugs myself. about to cross the line and crunch for something else more stable. Before rushing to condemn, let's summarise yesterday's event: it was a problem with v4.97 under Windows (as far as I could tell from my machines, Linux v4.97 had no problems) which lasted about 16 hours, until D.Kim rolled-back the executable to the previous stable version. IMHO it was much less of a *real* issue than the "stuck at 1%" issue (which btw I encountered just ONCE in 3 months of crunching on 3x P4 PCs, but I realise it occurs much more often for other people), because yesterday WUs simply errored out for several hours, but NO manual intervention whatsoever was/is needed by the operator. End result: probably about half a day of crunching lost for Win PCs. So, my Win PCs spent about half a day crunching for nothing. But, I look it this way: Currently all other BOINC projects use a "quorum" of 3 or 4, and initial replication 3-5, i.e. send the very same WU to 3-5 (!!!) PCs, effectively using just 1/3rd to 1/5th (and sometimes even less) of raw donated CPU time donated by us the BOINC donors. PS: Ofcourse I'm unhappy that the project did an upgrade this way, without someone monitoring it closely for the next 6-12 hours. But it's Murphy's law and I try to keep things in perspective. Best UFO Resources Wikipedia R@h How-To: Join Distributed Computing projects that benefit humanity |
Carlos_Pfitzner Send message Joined: 22 Dec 05 Posts: 71 Credit: 138,867 RAC: 0 |
Until we got the newer WU types, wasn't able to report on any problems with them obviously, and could only report on the older types... Sorry us testers weren't able to catch this problem before it started rolling out... I disagree of the quoted above I started the thread at ralph@home announcing the new version 4.97 to test on 7 April at 09:37 UTC by 12:52 UTC 7 April, I have already reported this error on Windows 4.97 by 00:24 UTC 8 April, Son Goku posted that 4.97 was working fine After that time ... 8 April, is that 4.97 go to rosetta@home ... I wonder why 4.95 that was working very well ... fixed several problems. is not what was placed into rosetta@home instead -:( http://www.fadbeens.co.uk/phpBB2/viewtopic.php?t=53&start=165 Now, that was rolled back to 4.83, I know why I crunched all the day w/o completing only one WU of 4.98 into two of my pcs -> see my signature ... my rac is failing down on rosetta -:( So, I will STOP crunching to rosetta again, until a new version that checkpoint enough to allow swapping apps removing from ram, comes in. 4.95 was that version !!! 4.96 too Click signature for global team stats |
Moderator9 Volunteer moderator Send message Joined: 22 Jan 06 Posts: 1014 Credit: 0 RAC: 0 |
Until we got the newer WU types, wasn't able to report on any problems with them obviously, and could only report on the older types... Sorry us testers weren't able to catch this problem before it started rolling out... In fact the information that I have is that Rosetta version 4.97 WAS in fact RALPH version 4.95. The version number was all that changed when it was implemented for Rosetta. What was not known at the time was how the newer WUs would react in the production environment. What is interesting here is that the RALPH testers are usually running BOINC version 5.2.32 and most Rosetta users are running BOINC 5.2.13. This may be part of the issue. In any case you are wrong about what was implemented in Rosetta. While the version number for the Rosetta application is different, it is the same application that was working well in RALPH. RALPH version 4.97 is not what was deployed in Rosetta. The workunit testing in RALPH did not show any problems with the newer workunits, however RALPH is a VERY limited subset of the types of systems and configurations running in Rosetta. Because of this fact it is not possible to test every possible issue before new work unit types are deployed. As has been pointed out on may occasions, a number of your systems are below the minimum memory requirements for the project. The ones that are not, are reporting a significant portion of the memory as not available for Rosetta to use. This single fact has been and will continue to be the largest problem facing you in running Rosetta or Ralph. There are almost no problems reported for systems running with more memory unless a batch of bad work units comes along, and that will happen from time to time. Moderator9 ROSETTA@home FAQ Moderator Contact |
Astro Send message Joined: 2 Oct 05 Posts: 987 Credit: 500,253 RAC: 0 |
5.2.32? I can't find that build anywhere. I know I never tested it. 5.3.31 is the latest alpha build. Here's a list of all V5 builds from Boinc to date and release dates/times (note: I trimmed mac, apple, and pdbs' from the list to shorten it): boinc_5.1.1_i686-pc-linux-gnu.sh 30-Aug-2005 21:37 2.8M boinc_5.1.1_windows_intelx86.exe 30-Aug-2005 21:37 9.9M boinc_5.1.2_i686-pc-linux-gnu.sh 07-Sep-2005 11:35 2.8M boinc_5.1.2_windows_intelx86.exe 07-Sep-2005 11:26 10M boinc_5.1.3_i686-pc-linux-gnu.sh 09-Sep-2005 11:16 2.8M boinc_5.1.3_windows_intelx86.exe 09-Sep-2005 11:17 10M boinc_5.1.4_i686-pc-linux-gnu.sh 20-Sep-2005 14:48 2.8M boinc_5.1.4_windows_intelx86.exe 20-Sep-2005 14:48 10M boinc_5.1.5_i686-pc-linux-gnu.sh 29-Sep-2005 00:46 2.8M boinc_5.1.5_windows_intelx86.exe 29-Sep-2005 00:46 10M boinc_5.1.6_i686-pc-linux-gnu.sh 01-Oct-2005 02:53 2.8M boinc_5.1.6_windows_intelx86.exe 01-Oct-2005 02:54 10M boinc_5.1.8_i686-pc-linux-gnu.sh 05-Oct-2005 19:28 2.8M boinc_5.1.8_windows_intelx86.exe 05-Oct-2005 19:28 10M boinc_5.1.9_i686-pc-linux-gnu.sh 09-Oct-2005 18:42 2.9M boinc_5.1.9_windows_intelx86.exe 09-Oct-2005 18:42 10M boinc_5.1.10_i686-pc-linux-gnu.sh 10-Oct-2005 12:27 2.9M boinc_5.1.10_windows_intelx86.exe 10-Oct-2005 12:27 10M boinc_5.2.0_i686-pc-linux-gnu.sh 10-Oct-2005 16:48 2.9M boinc_5.2.0_windows_intelx86.exe 10-Oct-2005 16:48 10M boinc_5.2.1_i686-pc-linux-gnu.sh 10-Oct-2005 19:55 2.9M boinc_5.2.1_windows_intelx86.exe 10-Oct-2005 19:56 10M boinc_5.2.2_i686-pc-linux-gnu.sh 17-Oct-2005 14:33 2.9M boinc_5.2.2_windows_intelx86.exe 17-Oct-2005 14:34 10M boinc_5.2.3_i686-pc-linux-gnu.sh 19-Oct-2005 19:59 3.4M boinc_5.2.4_i686-pc-linux-gnu.sh 19-Oct-2005 23:22 3.4M boinc_5.2.5_i686-pc-linux-gnu.sh 27-Oct-2005 14:01 3.4M boinc_5.2.5_windows_intelx86.exe 27-Oct-2005 14:02 10M boinc_5.2.6_i686-pc-linux-gnu.sh 31-Oct-2005 17:05 3.4M boinc_5.2.6_windows_intelx86.exe 31-Oct-2005 17:06 10M boinc_5.2.7_i686-pc-linux-gnu.sh 08-Nov-2005 00:52 3.4M boinc_5.2.7_windows_intelx86.exe 08-Nov-2005 00:53 10M boinc_5.2.8_i686-pc-linux-gnu.sh 22-Nov-2005 17:47 3.4M boinc_5.2.8_windows_intelx86.exe 22-Nov-2005 17:48 10M boinc_5.2.9_windows_intelx86.exe 25-Nov-2005 20:06 10M boinc_5.2.10_windows_intelx86.exe 26-Nov-2005 01:08 10M boinc_5.2.11_windows_intelx86.exe 26-Nov-2005 03:48 10M boinc_5.2.12_windows_intelx86.exe 26-Nov-2005 18:22 10M boinc_5.2.13_i686-pc-linux-gnu.sh 29-Nov-2005 02:46 3.5M boinc_5.2.13_windows_intelx86.exe 29-Nov-2005 02:47 10M boinc_5.2.14_windows_intelx86.exe 04-Dec-2005 03:51 10M boinc_5.2.15_i686-pc-linux-gnu.sh 28-Dec-2005 06:14 3.5M boinc_5.2.15_windows_intelx86.exe 28-Dec-2005 06:14 10M boinc_5.3.2_windows_intelx86.exe 06-Dec-2005 03:32 10M boinc_5.3.3_windows_intelx86.exe 19-Dec-2005 05:59 10M boinc_5.3.6_windows_intelx86.exe 28-Dec-2005 05:50 10M boinc_5.3.15_i686-pc-linux-gnu.sh 30-Jan-2006 15:27 3.5M boinc_5.3.15_windows_intelx86.exe 27-Jan-2006 13:22 10M boinc_5.3.16_i686-pc-linux-gnu.sh 30-Jan-2006 19:19 3.5M boinc_5.3.16_windows_intelx86.exe 30-Jan-2006 19:09 10M boinc_5.3.17_windows_intelx86.exe 02-Feb-2006 13:19 10M boinc_5.3.20_i686-pc-linux-gnu.sh 23-Feb-2006 00:57 3.5M boinc_5.3.20_windows_intelx86.exe 23-Feb-2006 00:49 10M boinc_5.3.21_i686-pc-linux-gnu.sh 24-Feb-2006 00:21 3.5M boinc_5.3.21_windows_intelx86.exe 24-Feb-2006 00:24 10M boinc_5.3.22_i686-pc-linux-gnu.sh 24-Feb-2006 17:35 3.5M boinc_5.3.22_windows_intelx86.exe 24-Feb-2006 18:00 10M boinc_5.3.23_i686-pc-linux-gnu.sh 01-Mar-2006 03:08 3.5M boinc_5.3.23_windows_intelx86.exe 01-Mar-2006 03:05 10M boinc_5.3.24_i686-pc-linux-gnu.sh 06-Mar-2006 13:14 3.5M boinc_5.3.24_windows_intelx86.exe 06-Mar-2006 12:39 10M boinc_5.3.26_i686-pc-linux-gnu.sh 14-Mar-2006 01:21 3.5M boinc_5.3.26_windows_intelx86.exe 14-Mar-2006 01:11 10M boinc_5.3.27_i686-pc-linux-gnu.sh 17-Mar-2006 02:18 3.6M boinc_5.3.27_windows_intelx86.exe 17-Mar-2006 02:14 10M boinc_5.3.28_i686-pc-linux-gnu.sh 21-Mar-2006 15:20 3.6M boinc_5.3.28_windows_intelx86.exe 21-Mar-2006 15:03 10M boinc_5.3.29_i686-pc-linux-gnu.sh 28-Mar-2006 00:09 3.6M boinc_5.3.29_windows_intelx86.exe 28-Mar-2006 00:32 10M boinc_5.3.30_i686-pc-linux-gnu.sh 28-Mar-2006 23:26 3.6M boinc_5.3.30_windows_intelx86.exe 29-Mar-2006 00:02 10M boinc_5.3.31_i686-pc-linux-gnu.sh 30-Mar-2006 18:37 3.6M boinc_5.3.31_windows_intelx86.exe 30-Mar-2006 19:54 8.6M Perhaps you've some new source I'm not aware of, and I'd like to try it. I've seen you mention it a couple times. |
Dimitris Hatzopoulos Send message Joined: 5 Jan 06 Posts: 336 Credit: 80,939 RAC: 0 |
It seems like BOINC v5.3.31 is latest, to see the BETA versions one can use the URL http://boinc.berkeley.edu/download.php?dev=1 whereas the "official" stable versions are at http://boinc.berkeley.edu/download.php Best UFO Resources Wikipedia R@h How-To: Join Distributed Computing projects that benefit humanity |
Moderator9 Volunteer moderator Send message Joined: 22 Jan 06 Posts: 1014 Credit: 0 RAC: 0 |
It seems like BOINC v5.3.31 is latest, to see the BETA versions one can use the URL I guess I need to turn on the lights when I type. I meant to say BOINC 5.2.28. Rom had asked all the RALPH testers top upgrade to this version for improved error reporting. I think almost all of them performed the upgrade. In any case it would be interesting to see if this had any impact on what has happened over the last day or so. Moderator9 ROSETTA@home FAQ Moderator Contact |
Astro Send message Joined: 2 Oct 05 Posts: 987 Credit: 500,253 RAC: 0 |
I hate to post this, but do you mean 5.3.28? The highest release with a recommended even version number two (2) is 5.2.15. |
Carlos_Pfitzner Send message Joined: 22 Dec 05 Posts: 71 Credit: 138,867 RAC: 0 |
Until we got the newer WU types, wasn't able to report on any problems with them obviously, and could only report on the older types... Sorry us testers weren't able to catch this problem before it started rolling out... Read here, scroll down to end http://ralph.bakerlab.org/forum_thread.php?id=155 Thanks, Click signature for global team stats |
Fuzzy Hollynoodles Send message Joined: 7 Oct 05 Posts: 234 Credit: 15,020 RAC: 0 |
I guess you mean the 5.3.28 version? :-) Maybe some more light is needed? ;-) That was the one Rom asked us to upgrade to. And it is pretty stable, runs fine on my computer. [b]"I'm trying to maintain a shred of dignity in this world." - Me[/b] |
STE\/E Send message Joined: 17 Sep 05 Posts: 125 Credit: 4,100,301 RAC: 84 |
I guess you mean the 5.3.28 version? :-) Maybe some more light is needed? ;-) v5.3.28 Blows as far as I'm concerned. The BOINC Manager will use between 2-3% of the CPU even when it isn't open. The only way to get it to stop using the 2-3% is to Close the Manager completly. It will also use 5-50% of the CPU if you open the Manager and the Work or now called Task Window, it acts real jerky at times too when adjusting the Windows ... I never seen any of this with v5.2.15 the previous version I was using ... |
Moderator9 Volunteer moderator Send message Joined: 22 Jan 06 Posts: 1014 Credit: 0 RAC: 0 |
OK, Ok! I get so used to typing the same version number over and over, it is only natural that from time to time I will mess it up. ;>) But I get it now, you just want to pick on the sleepy moderator sitting in the dark room. Yes what I meant to say was BOINC version 5.3.28 So Tony, you and "Fuzzy" leave me alone to sulk. Moderator9 ROSETTA@home FAQ Moderator Contact |
Robinski Send message Joined: 7 Mar 06 Posts: 51 Credit: 85,383 RAC: 0 |
It seems like BOINC v5.3.31 is latest, to see the BETA versions one can use the URL I did run some Ralph WU's on a 5.2.13 Boinc Client and they finished fine. It was however on a machine that I hadn't running this weekend, when the 4.97 Problems hit. Member of the Dutch Power Cows Trying to get the world on IPv6, do you have it? check here: IPv6.RHarmsen.nl |
Message boards :
Number crunching :
Miscellaneous Work Unit Errors
©2024 University of Washington
https://www.bakerlab.org