Message boards : Number crunching : Multiple Computation Errors
Previous · 1 · 2
Author | Message |
---|---|
Polian Send message Joined: 21 Sep 05 Posts: 152 Credit: 10,141,266 RAC: 0 |
|
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,169,305 RAC: 3,078 |
One idea would be to set to a large cache of work, pull down a pile of tasks, and then remove the ones you don't want. Then you'll at least have a longer period of time where you can run without feeling the need to check task names again. Thanks, you too Polian! |
BONNSaR Send message Joined: 3 Nov 05 Posts: 3 Credit: 8,983,633 RAC: 0 |
Which Prime95 tests were you running? If you were running the "small" or "large" tests then it might be that the memory (inc L3 cache I think) is struggling at 4.5 because Rosetta taxes that quite heavily but P95 doesn't on the first two. Blend is probably more appropriate (although of course it should be stable on all of them!). Hi Danny I used all 3 tests to see what different results occurred. At 4.5GHz all the prime95 tasks were stable. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
saw a new chain of the cryo tasks load up on my system. now it is on chain b and w. turned off my OC program to see if this is the problem. if not then its bug in their tasks and as usual they don't read the message boards. |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,169,305 RAC: 3,078 |
saw a new chain of the cryo tasks load up on my system. now it is on chain b and w. turned off my OC program to see if this is the problem. if not then its bug in their tasks and as usual they don't read the message boards. I abort 5 and get 2 upon the update, I abort them and get another, it is a never ending process to abort them!!! I am about outa here!!! ALL tasks are important, why is there NOT a way to select which tasks I wish to crunch here?!!!!! |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
saw a new chain of the cryo tasks load up on my system. now it is on chain b and w. turned off my OC program to see if this is the problem. if not then its bug in their tasks and as usual they don't read the message boards. and these bombed out as well without any overclocking software being used. what a joke! now it is the save_all_out stuff that is showing up..maybe this will work? |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
checked my memory running with no overclocking with a short run of prime95 and came up clean. will try it with the overclocking engaged on a overnight run and see what happens. I personally think these cryo tasks are junk. |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,169,305 RAC: 3,078 |
checked my memory running with no overclocking with a short run of prime95 and came up clean. will try it with the overclocking engaged on a overnight run and see what happens. I personally think these cryo tasks are junk. They may work on a Mac if you have one, I do not, but ModSense, in another thread, said that it appears they work fine on Mac computers. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
checked my memory running with no overclocking with a short run of prime95 and came up clean. will try it with the overclocking engaged on a overnight run and see what happens. I personally think these cryo tasks are junk. What the heck? Mac only tasks being sent to Windows machines or someone fudged the coding. Really strange. I ran Prime95 over night with overclocking on. no errors. So it IS something with the tasks from cryo. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
I didn't intend to say they were "Mac-only tasks". I was simply making the observation that the only successful results on the cryo tasks I've seen are from Macs. [edit] I should point out that what "I've seen" is the same display of WUs we all can do via the website. I am *not* a part of the Project Team in BakerLab where I might be querying a database full of all of the results :( I've EMailed DK and asked he look in to the cryo tasks if he wasn't already. Rosetta Moderator: Mod.Sense |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,169,305 RAC: 3,078 |
I didn't intend to say they were "Mac-only tasks". I was simply making the observation that the only successful results on the cryo tasks I've seen are from Macs. I am sorry if I misrepresented what you said, it was NOT my intention nor understanding!! I KNEW you meant they were currently only producing good results on Macs, NOT that they were Mac units per say. I am PURELY GUESSING but it could be they didn't do adequate testing on a non Mac type pc. THANK YOU for emailing DK, hopefully he will get right on it!! |
Polian Send message Joined: 21 Sep 05 Posts: 152 Credit: 10,141,266 RAC: 0 |
I thought I had a successfull one for my PC, I didn't find one. But I found a successful one for Linux: :) Yeah, they seem to be running okay, more or less, on Linux. One of the old boxes on my desk has an old 36.7GB SCSI drive in it (lol). I heard it banging away the other day, logged into it, ran top, and found that those tasks were using ~2GB each. Disk was being thrashed for virtual memory swaps. I just logged out and let it go, heh. Similar tasks are failing with Out of memory unhandled exceptions on Windows 7 and 8. |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,169,305 RAC: 3,078 |
I thought I had a successfull one for my PC, I didn't find one. But I found a successful one for Linux: :) I run the units, when I did and when I miss then, on Windows pc's with anywhere from 3 to 16gb of ram and they just fail! So ram isn't the issue for me, also I have PLENTY of hard drive space available on each machine, well over 10gb each, and they all fail, so it isn't hard drive space for me either. I will continue to abort any I see, or keep moving pc's elsewhere, if they can't figure out how to fix the problem! |
Alun Send message Joined: 27 Feb 10 Posts: 5 Credit: 69,418 RAC: 0 |
Ho hum - rosetta broken again? Cryo WU's: failing for an out of memory error, on an 8gb system which has >6gb free for BOINC tasks. Latest release version of BOINC (7.0.64) if that's of any relevance. At this point there's been less than a month since Christmas where Rosetta itself hasn't been broken at the project end. Does anyone do any testing on these work units before throwing them at us to waste time and resources (and ultimately money) on? Project suspended for now. |
BarryAZ Send message Joined: 27 Dec 05 Posts: 153 Credit: 30,843,285 RAC: 0 |
Take a look at the traffic on the lead number crunching thread regarding problems. There are at least five problems at the moment. The Cryo errors, the errors with a second group of work units, uploading problems, downloading problems and a build up of pendings. Added to this is a relative lack of project acknowledgement and feedback on the problems. The volunteers (they are the ones saddled with the message base handling) are doing the best they can. They have forwarded the reports to the project. We've been told that grad students have been tasked to monitor the traffic as well (though I've seen no comments or acknowledgements from them as of yet). My project response sequence: 1) Look on message boards to find company for problems 2) Respond to known bad apps (Cryo) by killing those (and amplify user reports) 3) Look for response from project 4) Absent response move to 'no new work'. 5) Encounter additional reports of problems (uploads/downloads, etc.) 6) Absent response move to 'suspend' 7) Observe response or non response for a period of days up to a week. 8) Absent response and resolution, start detaching Rosetta and focus on other projects. 9) Check back weekly to see if the project has returned to its previous level of effectiveness (and hope for this and an enhanced level of user responsiveness). Personally, I have started stage 7 here, this set of problems is only a few days old so far. I figure to move to stage 8 next week absent project positive performance -- there are other worth projects out there. Ho hum - rosetta broken again? |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,169,305 RAC: 3,078 |
Take a look at the traffic on the lead number crunching thread regarding problems. There are at least five problems at the moment. The Cryo errors, the errors with a second group of work units, uploading problems, downloading problems and a build up of pendings. Added to this is a relative lack of project acknowledgement and feedback on the problems. The volunteers (they are the ones saddled with the message base handling) are doing the best they can. They have forwarded the reports to the project. We've been told that grad students have been tasked to monitor the traffic as well (though I've seen no comments or acknowledgements from them as of yet). Personally I reached Stage 9 YESTERDAY but have modified it slightly, my time frame is NO LESS then every 30 days!! If a project can't run for at least 30 days in a row without having a major breakdown, then it isn't for me! Now IF I could chose to NOT get the cryo units, as an example, then perhaps I would consider coming back sooner, but my TIME is worth more then zip to me, and that is what I get when crunching the cryo units, zip, bupkus, nada, etc, etc, ETC!!! OH I almost forgot I DO get something...a higher electric bill!!! |
Polian Send message Joined: 21 Sep 05 Posts: 152 Credit: 10,141,266 RAC: 0 |
I thought I had a successfull one for my PC, I didn't find one. But I found a successful one for Linux: :) Mikey, Just to clarify, 32-bit applications running on 64-bit Windows have a maximum memory allocation of 2GB per process I believe. Since the cryo units were calling more than 2GB of memory, that resulted in the out of memory errors indicated in stderr_outs of Windows machines. I think this also explains why they were running on OS X and Linux machines. Quite poorly on mine though, since a few of my boxes have 4GB of memory and up to 8 concurrent threads, heh. As usual, I could be wrong. Someone please correct me if I am. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Mikey, ...and there is also a maximum memory usage configured into each work unit. If this is exceeded, BOINC Manager detects it and ends the task. I'm not clear on how to distinguish between the various possibilities based on the reported output. Rosetta Moderator: Mod.Sense |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,169,305 RAC: 3,078 |
I thought I had a successfull one for my PC, I didn't find one. But I found a successful one for Linux: :) Well that makes sense why ALL mine failed then, I was only running 64bit Win7 machines here! |
Message boards :
Number crunching :
Multiple Computation Errors
©2024 University of Washington
https://www.bakerlab.org