Message boards : Number crunching : Multiple Computation Errors
Author | Message |
---|---|
Mark Send message Joined: 1 Dec 12 Posts: 10 Credit: 20,184 RAC: 0 |
I noticed that my pile of work all errored out. Apparently once it started, each task I had ended in error until all of the units were gone. I found: 1/6/2013 8:23:09 PM | rosetta@home | [error] Signature verification failed for minirosetta_graphics_3.43_windows_x86_64.exe in the message log. I performed a backup with Acronis True Image Home to an external drive that might have been at that time. Could the backup cause the errors? I've never had this problem with Seti@Home. Thanks, Mark |
Polian Send message Joined: 21 Sep 05 Posts: 152 Credit: 10,141,266 RAC: 0 |
I noticed that my pile of work all errored out. Apparently once it started, each task I had ended in error until all of the units were gone. I found: Looks to me like the screensaver/graphics file was corrupted. Doing a project reset should download another copy. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Right, a project reset will fetch a fresh copy of what appears to be a corrupted file, however the root cause of the corruption might be a firewall or anti-virus application. You may need to "white list" the BOINC and the Rosetta application. Rosetta Moderator: Mod.Sense |
Mark Send message Joined: 1 Dec 12 Posts: 10 Credit: 20,184 RAC: 0 |
I reset the Project. Nothing has downloaded since no work comes in. For some reason my client only asks for ATI work and only rarely for CPU work. We'll see eventually if the reset worked. Thanks for the quick response! |
Mark Send message Joined: 1 Dec 12 Posts: 10 Credit: 20,184 RAC: 0 |
I've received and successfully crunched several units. Looks like the reset worked. Thanks. |
BONNSaR Send message Joined: 3 Nov 05 Posts: 3 Credit: 8,983,633 RAC: 0 |
I have a different cause for all tasks ending up Computation Error. I'm running a Windows 7 64bit PC using an I5 3570K overclocked. If I overclock over 4.5GHz then all tasks Computation Error - at this over clock the PC is otherwise stable and runs other applications in a stable fashion without apparent errors. If I throttle back to 4.4Ghz the Rosetta tasks run normally without errors. Any ideas about the Computation Errors at 4.5GHz overclock ????? |
Stephen Miller Send message Joined: 18 Sep 05 Posts: 13 Credit: 16,294,215 RAC: 0 |
I have a different cause for all tasks ending up Computation Error. I'm running a Windows 7 64bit PC using an I5 3570K overclocked. If I overclock over 4.5GHz then all tasks Computation Error - at this over clock the PC is otherwise stable and runs other applications in a stable fashion without apparent errors. Try running Prime95 torture test from this website: http://www.mersenne.org/freesoft/ I remember reading from their website that while an overclocked computer appears to be stable, it outputs garbage for scientific work. Hence the torture test. If you can't get Prime95 to run flawless for hours/days, you are unstable. Once you run the torture test for hours/days, then you have a stable system. In my testing, running successfully for 4+ hours is a leading indicator of stability. Longer is better, especially if your ambient temperatures vary over time; that is, runs while room is cool, fails when room is hot. |
BONNSaR Send message Joined: 3 Nov 05 Posts: 3 Credit: 8,983,633 RAC: 0 |
I have a different cause for all tasks ending up Computation Error. I'm running a Windows 7 64bit PC using an I5 3570K overclocked. If I overclock over 4.5GHz then all tasks Computation Error - at this over clock the PC is otherwise stable and runs other applications in a stable fashion without apparent errors. Hi Stephen Thanks for the advice, as expected Prime95 did fail at 4.5GHz. I've managed with a few adjustments to get the PC rock stable on Prime95 at 4.5GHz but still Rosetta shows Computation Error on all tasks at this speed. I have gone back to 4.4GHz and Rosetta runs ok. Not to worry 4.4 is a significant gain in RAC over stock non OC so I'm happy at this. |
dcdc Send message Joined: 3 Nov 05 Posts: 1831 Credit: 119,627,225 RAC: 9,274 |
Which Prime95 tests were you running? If you were running the "small" or "large" tests then it might be that the memory (inc L3 cache I think) is struggling at 4.5 because Rosetta taxes that quite heavily but P95 doesn't on the first two. Blend is probably more appropriate (although of course it should be stable on all of them!). HTH Danny |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,169,305 RAC: 3,078 |
Right, a project reset will fetch a fresh copy of what appears to be a corrupted file, however the root cause of the corruption might be a firewall or anti-virus application. You may need to "white list" the BOINC and the Rosetta application. I too am getting LOTS of 'computation errors, on several machines that previously had no problems at all. I have moved a couple machines elsewhere, the rest are only getting one or two errors, not whole batches of them. |
Abriata Send message Joined: 19 Jan 13 Posts: 2 Credit: 953,454 RAC: 0 |
Right, a project reset will fetch a fresh copy of what appears to be a corrupted file, however the root cause of the corruption might be a firewall or anti-virus application. You may need to "white list" the BOINC and the Rosetta application. I am getting tons of erros recently too! On all my computers, what's more when I check the work unit I see that the same task has ended with computation error on the other computer where it was processed. What's going on? I'm losing CPU time AND credits |
Kenneth DePrizio Send message Joined: 15 Jul 07 Posts: 15 Credit: 3,123,915 RAC: 0 |
There appear to be a bunch of bad workunits in the queue. All these "cryo" units for example are erroring out. Just gotta wait for them to cycle through. |
288VKYUjwsXfAaTXn6SFJC4LVPRf Send message Joined: 16 Dec 05 Posts: 31 Credit: 153,110 RAC: 0 |
Same computation errors here. My computer almost freezes because of these WU's. Something serious wrong there. |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,169,305 RAC: 3,078 |
There appear to be a bunch of bad workunits in the queue. All these "cryo" units for example are erroring out. Just gotta wait for them to cycle through. Mine that start with 'E6' are the ones erroring out, is that the 'cryo' ones? |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,169,305 RAC: 3,078 |
There appear to be a bunch of bad workunits in the queue. All these "cryo" units for example are erroring out. Just gotta wait for them to cycle through. Never mind I found the units starting with "cryo" are the ones having MAJOR problems for me too, I have gone thru and aborted all of them! |
Col323 Send message Joined: 12 Apr 13 Posts: 2 Credit: 1,213,458 RAC: 0 |
I just joined and found my machine giving a lot of errors. At first I was worried about the machine, then I found this thread. When checking the logs of the units which give a compute error, they are the cryo units and they always contain the lines: Unhandled Exception Detected... - Unhandled Exception Record - Reason: Out Of Memory I watched one run, and while most Rosetta units use 300-500MB, this one hit 1.7GB before crashing. This is on a box with 7 GB for 4 cores. I'm trying Rosetta now on a box with 28GB and 4 cores to see if they complete. Of course, I probably won't get any cryo units sent to this machine. |
Col323 Send message Joined: 12 Apr 13 Posts: 2 Credit: 1,213,458 RAC: 0 |
I did not get to watch, but the 28GB machine choked on a cryo unit as well. Same "out of memory" message. I guess I'll just let them keep bombing away and abort them if I see them. |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,169,305 RAC: 3,078 |
I did not get to watch, but the 28GB machine choked on a cryo unit as well. Same "out of memory" message. I am still aborting mine as I see them! |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,169,305 RAC: 3,078 |
I did not get to watch, but the 28GB machine choked on a cryo unit as well. Same "out of memory" message. Does anyone know how to STOP getting the cryo tasks? I abort 5 and they send me 5 right back again! I do NOT want to stop running Rosetta but do NOT want to be wasting my time either!! So far I have no problems running the other types, but EVERY cryo unit fails!!! |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
One idea would be to set to a large cache of work, pull down a pile of tasks, and then remove the ones you don't want. Then you'll at least have a longer period of time where you can run without feeling the need to check task names again. Rosetta Moderator: Mod.Sense |
Message boards :
Number crunching :
Multiple Computation Errors
©2024 University of Washington
https://www.bakerlab.org