Multiple Computation Errors

Author	Message
Polian Send message Joined: 21 Sep 05 Posts: 152 Credit: 10,141,266 RAC: 0	Message 75387 - Posted: 17 Apr 2013, 18:26:30 UTC They only run for a couple-few minutes before they fail. No sense in getting wrapped around the axle about it. I'm sure the tasks will either run through soon or they'll pull them out and reconfigure them ID: 75387 · Rating: 0 · rate: / Reply Quote

mikey Send message Joined: 5 Jan 06 Posts: 1900 Credit: 12,902,147 RAC: 0	Message 75388 - Posted: 17 Apr 2013, 20:04:24 UTC - in response to Message 75385. One idea would be to set to a large cache of work, pull down a pile of tasks, and then remove the ones you don't want. Then you'll at least have a longer period of time where you can run without feeling the need to check task names again. Thanks, you too Polian! ID: 75388 · Rating: 0 · rate: / Reply Quote

BONNSaR Send message Joined: 3 Nov 05 Posts: 3 Credit: 8,983,633 RAC: 0	Message 75389 - Posted: 18 Apr 2013, 6:46:03 UTC - in response to Message 75341. Which Prime95 tests were you running? If you were running the "small" or "large" tests then it might be that the memory (inc L3 cache I think) is struggling at 4.5 because Rosetta taxes that quite heavily but P95 doesn't on the first two. Blend is probably more appropriate (although of course it should be stable on all of them!). HTH Danny Hi Danny I used all 3 tests to see what different results occurred. At 4.5GHz all the prime95 tasks were stable. ID: 75389 · Rating: 0 · rate: / Reply Quote

Greg_BE Send message Joined: 30 May 06 Posts: 5774 Credit: 6,139,760 RAC: 0	Message 75390 - Posted: 19 Apr 2013, 21:02:14 UTC saw a new chain of the cryo tasks load up on my system. now it is on chain b and w. turned off my OC program to see if this is the problem. if not then its bug in their tasks and as usual they don't read the message boards. ID: 75390 · Rating: 0 · rate: / Reply Quote

mikey Send message Joined: 5 Jan 06 Posts: 1900 Credit: 12,902,147 RAC: 0	Message 75395 - Posted: 20 Apr 2013, 12:39:46 UTC - in response to Message 75390. saw a new chain of the cryo tasks load up on my system. now it is on chain b and w. turned off my OC program to see if this is the problem. if not then its bug in their tasks and as usual they don't read the message boards. I abort 5 and get 2 upon the update, I abort them and get another, it is a never ending process to abort them!!! I am about outa here!!! ALL tasks are important, why is there NOT a way to select which tasks I wish to crunch here?!!!!! ID: 75395 · Rating: 0 · rate: / Reply Quote

Greg_BE Send message Joined: 30 May 06 Posts: 5774 Credit: 6,139,760 RAC: 0	Message 75397 - Posted: 20 Apr 2013, 16:45:26 UTC - in response to Message 75390. Last modified: 20 Apr 2013, 16:46:24 UTC saw a new chain of the cryo tasks load up on my system. now it is on chain b and w. turned off my OC program to see if this is the problem. if not then its bug in their tasks and as usual they don't read the message boards. and these bombed out as well without any overclocking software being used. what a joke! now it is the save_all_out stuff that is showing up..maybe this will work? ID: 75397 · Rating: 0 · rate: / Reply Quote

Greg_BE Send message Joined: 30 May 06 Posts: 5774 Credit: 6,139,760 RAC: 0	Message 75401 - Posted: 20 Apr 2013, 21:26:14 UTC checked my memory running with no overclocking with a short run of prime95 and came up clean. will try it with the overclocking engaged on a overnight run and see what happens. I personally think these cryo tasks are junk. ID: 75401 · Rating: 0 · rate: / Reply Quote

mikey Send message Joined: 5 Jan 06 Posts: 1900 Credit: 12,902,147 RAC: 0	Message 75409 - Posted: 21 Apr 2013, 11:05:11 UTC - in response to Message 75401. checked my memory running with no overclocking with a short run of prime95 and came up clean. will try it with the overclocking engaged on a overnight run and see what happens. I personally think these cryo tasks are junk. They may work on a Mac if you have one, I do not, but ModSense, in another thread, said that it appears they work fine on Mac computers. ID: 75409 · Rating: 0 · rate: / Reply Quote

Greg_BE Send message Joined: 30 May 06 Posts: 5774 Credit: 6,139,760 RAC: 0	Message 75410 - Posted: 21 Apr 2013, 11:08:38 UTC - in response to Message 75409. checked my memory running with no overclocking with a short run of prime95 and came up clean. will try it with the overclocking engaged on a overnight run and see what happens. I personally think these cryo tasks are junk. They may work on a Mac if you have one, I do not, but ModSense, in another thread, said that it appears they work fine on Mac computers. What the heck? Mac only tasks being sent to Windows machines or someone fudged the coding. Really strange. I ran Prime95 over night with overclocking on. no errors. So it IS something with the tasks from cryo. ID: 75410 · Rating: 0 · rate: / Reply Quote

Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0	Message 75411 - Posted: 21 Apr 2013, 14:48:01 UTC Last modified: 21 Apr 2013, 14:52:51 UTC I didn't intend to say they were "Mac-only tasks". I was simply making the observation that the only successful results on the cryo tasks I've seen are from Macs. [edit] I should point out that what "I've seen" is the same display of WUs we all can do via the website. I am not a part of the Project Team in BakerLab where I might be querying a database full of all of the results :( I've EMailed DK and asked he look in to the cryo tasks if he wasn't already. Rosetta Moderator: Mod.Sense ID: 75411 · Rating: 0 · rate: / Reply Quote

mikey Send message Joined: 5 Jan 06 Posts: 1900 Credit: 12,902,147 RAC: 0	Message 75417 - Posted: 22 Apr 2013, 10:55:10 UTC - in response to Message 75411. I didn't intend to say they were "Mac-only tasks". I was simply making the observation that the only successful results on the cryo tasks I've seen are from Macs. [edit] I should point out that what "I've seen" is the same display of WUs we all can do via the website. I am not a part of the Project Team in BakerLab where I might be querying a database full of all of the results :( I've EMailed DK and asked he look in to the cryo tasks if he wasn't already. I am sorry if I misrepresented what you said, it was NOT my intention nor understanding!! I KNEW you meant they were currently only producing good results on Macs, NOT that they were Mac units per say. I am PURELY GUESSING but it could be they didn't do adequate testing on a non Mac type pc. THANK YOU for emailing DK, hopefully he will get right on it!! ID: 75417 · Rating: 0 · rate: / Reply Quote

Polian Send message Joined: 21 Sep 05 Posts: 152 Credit: 10,141,266 RAC: 0	Message 75421 - Posted: 22 Apr 2013, 18:39:59 UTC - in response to Message 75420. I thought I had a successfull one for my PC, I didn't find one. But I found a successful one for Linux: :) https://boinc.bakerlab.org/rosetta/workunit.php?wuid=523443602 Yeah, they seem to be running okay, more or less, on Linux. One of the old boxes on my desk has an old 36.7GB SCSI drive in it (lol). I heard it banging away the other day, logged into it, ran top, and found that those tasks were using ~2GB each. Disk was being thrashed for virtual memory swaps. I just logged out and let it go, heh. Similar tasks are failing with Out of memory unhandled exceptions on Windows 7 and 8. ID: 75421 · Rating: 0 · rate: / Reply Quote

mikey Send message Joined: 5 Jan 06 Posts: 1900 Credit: 12,902,147 RAC: 0	Message 75431 - Posted: 23 Apr 2013, 11:17:57 UTC - in response to Message 75421. I thought I had a successfull one for my PC, I didn't find one. But I found a successful one for Linux: :) https://boinc.bakerlab.org/rosetta/workunit.php?wuid=523443602 Yeah, they seem to be running okay, more or less, on Linux. One of the old boxes on my desk has an old 36.7GB SCSI drive in it (lol). I heard it banging away the other day, logged into it, ran top, and found that those tasks were using ~2GB each. Disk was being thrashed for virtual memory swaps. I just logged out and let it go, heh. Similar tasks are failing with Out of memory unhandled exceptions on Windows 7 and 8. I run the units, when I did and when I miss then, on Windows pc's with anywhere from 3 to 16gb of ram and they just fail! So ram isn't the issue for me, also I have PLENTY of hard drive space available on each machine, well over 10gb each, and they all fail, so it isn't hard drive space for me either. I will continue to abort any I see, or keep moving pc's elsewhere, if they can't figure out how to fix the problem! ID: 75431 · Rating: 0 · rate: / Reply Quote

Alun Send message Joined: 27 Feb 10 Posts: 5 Credit: 69,418 RAC: 0	Message 75453 - Posted: 24 Apr 2013, 17:21:28 UTC Ho hum - rosetta broken again? Cryo WU's: failing for an out of memory error, on an 8gb system which has >6gb free for BOINC tasks. Latest release version of BOINC (7.0.64) if that's of any relevance. At this point there's been less than a month since Christmas where Rosetta itself hasn't been broken at the project end. Does anyone do any testing on these work units before throwing them at us to waste time and resources (and ultimately money) on? Project suspended for now. ID: 75453 · Rating: 0 · rate: / Reply Quote

BarryAZ Send message Joined: 27 Dec 05 Posts: 153 Credit: 30,845,917 RAC: 0	Message 75455 - Posted: 24 Apr 2013, 17:38:35 UTC - in response to Message 75453. Take a look at the traffic on the lead number crunching thread regarding problems. There are at least five problems at the moment. The Cryo errors, the errors with a second group of work units, uploading problems, downloading problems and a build up of pendings. Added to this is a relative lack of project acknowledgement and feedback on the problems. The volunteers (they are the ones saddled with the message base handling) are doing the best they can. They have forwarded the reports to the project. We've been told that grad students have been tasked to monitor the traffic as well (though I've seen no comments or acknowledgements from them as of yet). My project response sequence: 1) Look on message boards to find company for problems 2) Respond to known bad apps (Cryo) by killing those (and amplify user reports) 3) Look for response from project 4) Absent response move to 'no new work'. 5) Encounter additional reports of problems (uploads/downloads, etc.) 6) Absent response move to 'suspend' 7) Observe response or non response for a period of days up to a week. 8) Absent response and resolution, start detaching Rosetta and focus on other projects. 9) Check back weekly to see if the project has returned to its previous level of effectiveness (and hope for this and an enhanced level of user responsiveness). Personally, I have started stage 7 here, this set of problems is only a few days old so far. I figure to move to stage 8 next week absent project positive performance -- there are other worth projects out there. Ho hum - rosetta broken again? Cryo WU's: failing for an out of memory error, on an 8gb system which has >6gb free for BOINC tasks. Latest release version of BOINC (7.0.64) if that's of any relevance. At this point there's been less than a month since Christmas where Rosetta itself hasn't been broken at the project end. Does anyone do any testing on these work units before throwing them at us to waste time and resources (and ultimately money) on? Project suspended for now. ID: 75455 · Rating: 0 · rate: / Reply Quote

mikey Send message Joined: 5 Jan 06 Posts: 1900 Credit: 12,902,147 RAC: 0	Message 75472 - Posted: 25 Apr 2013, 11:31:42 UTC - in response to Message 75455. Take a look at the traffic on the lead number crunching thread regarding problems. There are at least five problems at the moment. The Cryo errors, the errors with a second group of work units, uploading problems, downloading problems and a build up of pendings. Added to this is a relative lack of project acknowledgement and feedback on the problems. The volunteers (they are the ones saddled with the message base handling) are doing the best they can. They have forwarded the reports to the project. We've been told that grad students have been tasked to monitor the traffic as well (though I've seen no comments or acknowledgements from them as of yet). My project response sequence: 1) Look on message boards to find company for problems 2) Respond to known bad apps (Cryo) by killing those (and amplify user reports) 3) Look for response from project 4) Absent response move to 'no new work'. 5) Encounter additional reports of problems (uploads/downloads, etc.) 6) Absent response move to 'suspend' 7) Observe response or non response for a period of days up to a week. 8) Absent response and resolution, start detaching Rosetta and focus on other projects. 9) Check back weekly to see if the project has returned to its previous level of effectiveness (and hope for this and an enhanced level of user responsiveness). Personally, I have started stage 7 here, this set of problems is only a few days old so far. I figure to move to stage 8 next week absent project positive performance -- there are other worth projects out there. Personally I reached Stage 9 YESTERDAY but have modified it slightly, my time frame is NO LESS then every 30 days!! If a project can't run for at least 30 days in a row without having a major breakdown, then it isn't for me! Now IF I could chose to NOT get the cryo units, as an example, then perhaps I would consider coming back sooner, but my TIME is worth more then zip to me, and that is what I get when crunching the cryo units, zip, bupkus, nada, etc, etc, ETC!!! OH I almost forgot I DO get something...a higher electric bill!!! ID: 75472 · Rating: 0 · rate: / Reply Quote

Polian Send message Joined: 21 Sep 05 Posts: 152 Credit: 10,141,266 RAC: 0	Message 75474 - Posted: 25 Apr 2013, 15:22:33 UTC - in response to Message 75431. I thought I had a successfull one for my PC, I didn't find one. But I found a successful one for Linux: :) https://boinc.bakerlab.org/rosetta/workunit.php?wuid=523443602 Yeah, they seem to be running okay, more or less, on Linux. One of the old boxes on my desk has an old 36.7GB SCSI drive in it (lol). I heard it banging away the other day, logged into it, ran top, and found that those tasks were using ~2GB each. Disk was being thrashed for virtual memory swaps. I just logged out and let it go, heh. Similar tasks are failing with Out of memory unhandled exceptions on Windows 7 and 8. I run the units, when I did and when I miss then, on Windows pc's with anywhere from 3 to 16gb of ram and they just fail! So ram isn't the issue for me, also I have PLENTY of hard drive space available on each machine, well over 10gb each, and they all fail, so it isn't hard drive space for me either. I will continue to abort any I see, or keep moving pc's elsewhere, if they can't figure out how to fix the problem! Mikey, Just to clarify, 32-bit applications running on 64-bit Windows have a maximum memory allocation of 2GB per process I believe. Since the cryo units were calling more than 2GB of memory, that resulted in the out of memory errors indicated in stderr_outs of Windows machines. I think this also explains why they were running on OS X and Linux machines. Quite poorly on mine though, since a few of my boxes have 4GB of memory and up to 8 concurrent threads, heh. As usual, I could be wrong. Someone please correct me if I am. ID: 75474 · Rating: 0 · rate: / Reply Quote

Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0	Message 75479 - Posted: 26 Apr 2013, 0:03:59 UTC - in response to Message 75474. Mikey, Just to clarify, 32-bit applications running on 64-bit Windows have a maximum memory allocation of 2GB per process I believe. Since the cryo units were calling more than 2GB of memory, that resulted in the out of memory errors indicated in stderr_outs of Windows machines. I think this also explains why they were running on OS X and Linux machines. Quite poorly on mine though, since a few of my boxes have 4GB of memory and up to 8 concurrent threads, heh. As usual, I could be wrong. Someone please correct me if I am. ...and there is also a maximum memory usage configured into each work unit. If this is exceeded, BOINC Manager detects it and ends the task. I'm not clear on how to distinguish between the various possibilities based on the reported output. Rosetta Moderator: Mod.Sense ID: 75479 · Rating: 0 · rate: / Reply Quote

mikey Send message Joined: 5 Jan 06 Posts: 1900 Credit: 12,902,147 RAC: 0	Message 75483 - Posted: 26 Apr 2013, 10:58:43 UTC - in response to Message 75474. I thought I had a successfull one for my PC, I didn't find one. But I found a successful one for Linux: :) https://boinc.bakerlab.org/rosetta/workunit.php?wuid=523443602 Yeah, they seem to be running okay, more or less, on Linux. One of the old boxes on my desk has an old 36.7GB SCSI drive in it (lol). I heard it banging away the other day, logged into it, ran top, and found that those tasks were using ~2GB each. Disk was being thrashed for virtual memory swaps. I just logged out and let it go, heh. Similar tasks are failing with Out of memory unhandled exceptions on Windows 7 and 8. I run the units, when I did and when I miss then, on Windows pc's with anywhere from 3 to 16gb of ram and they just fail! So ram isn't the issue for me, also I have PLENTY of hard drive space available on each machine, well over 10gb each, and they all fail, so it isn't hard drive space for me either. I will continue to abort any I see, or keep moving pc's elsewhere, if they can't figure out how to fix the problem! Mikey, Just to clarify, 32-bit applications running on 64-bit Windows have a maximum memory allocation of 2GB per process I believe. Since the cryo units were calling more than 2GB of memory, that resulted in the out of memory errors indicated in stderr_outs of Windows machines. I think this also explains why they were running on OS X and Linux machines. Quite poorly on mine though, since a few of my boxes have 4GB of memory and up to 8 concurrent threads, heh. As usual, I could be wrong. Someone please correct me if I am. Well that makes sense why ALL mine failed then, I was only running 64bit Win7 machines here! ID: 75483 · Rating: 0 · rate: / Reply Quote