Message boards : Number crunching : Report Problems With BOINC SERVER UPGRADE
Previous · 1 · 2 · 3 · 4 · Next
Author | Message |
---|---|
Charles Dennett Send message Joined: 27 Sep 05 Posts: 102 Credit: 2,081,660 RAC: 345 |
I have a number of computers with less than 512Megs of ram, and since the upgrade I have been getting a number of error messages on those computers like this: I'd like to second this post. I have 2 old windows machines - a 300MHz PII with 384 meg of memory running Win98SE and an 667 MHZ PIII with only 128 meg of memory running Win2K. Both have been crunching just fine (the Win 98SE machine seems also to not have the problem where the app does not report cpu time any more. It's been several days since I saw that one. Did that get fixed?) Anyway, the PIII is now reporting this message (with different nuumbers) a couple of dozen times when it contacts the server. It then backs off for 24 hours. It does not stop communicating. I can manually update the project. I saw this once before last week and a manual update worked. Now it does not. It reports back the same error about not enough memory, and backs off 24 hours. If this continues, it should switch over to my backup project, SIMAP once the RAH workunits complete. I run 99% RAH and 1% SIMAP on all three of my machines (third one is an AMD XP2600+ running Linux and 1 gig of memory.) I increaed the swap area on the PIII and yes, it does swap a bit, but it is still is able to crunch the workunits. I would really like to have it work on RAH if at all possible. I could spring to the PC100 memory it needs, but I didn't want to sink any money into the old machine. Maybe I'll have to. Charlie -Charlie |
Charles Dennett Send message Joined: 27 Sep 05 Posts: 102 Credit: 2,081,660 RAC: 345 |
I just wanted to emphasize that when the system does contact the RAH servers, whether it does it itself or I manually force an update, I get the message about not enought memory repeated many many times, not just once. The last time was 184 times. The time before that was 177 times. Charlie -Charlie |
Charles Dennett Send message Joined: 27 Sep 05 Posts: 102 Credit: 2,081,660 RAC: 345 |
Just tried another manual update. This time there were no messages about not enough memory and it downloaded a new WU. Strange! Charlie -Charlie |
AnRM Send message Joined: 18 Sep 05 Posts: 123 Credit: 1,355,486 RAC: 0 |
I have a number of computers with less than 512Megs of ram, and since the upgrade I have been getting a number of error messages on those computers like this: >We have received the same message on a machine with 469Meg RAM.....had to manually upgrade to get more work.....seems to be a growing problem with machines that were happily crunching R@H before the upgrade.....Cheers, Rog. |
MolAnO Send message Joined: 30 Jan 06 Posts: 3 Credit: 402,784 RAC: 0 |
I have a number of computers with less than 512Megs of ram, and since the upgrade I have been getting a number of error messages on those computers like this: I would also like to second this post. I have an 8-way server with only 256MB RAM running Linux. It crunched great before the upgrade, but now I have also a lot of errors indicating I have not enough memory and then the communication stops for 24 hours. Manually forcing an update does some help, but I do not get enough WU for all my processors... |
Charles Dennett Send message Joined: 27 Sep 05 Posts: 102 Credit: 2,081,660 RAC: 345 |
Just tried another manual update. This time there were no messages about not enough memory and it downloaded a new WU. Strange! This morning when I checked the PIII it had asked for a new WU overnight and had received an uncounted number of messages about not enough memory and had backed off for 24 hours. I tried a manual update. I got a couple dozen messages about not enough memory and then it started downloading a new WU. Here is what I think is happening. It contacts the server asking for a new WU. The server starts going through a list of available WUs. If my machine does not meet the memory requirements, it gets sent one of these error messages. This continues until either the list is exhausted, in which case it causes a 24 hour backoff, or it happens to hit a WU in the list that has small memory requirement. In this latter case the WU is sent as usual. Charlie -Charlie |
AnRM Send message Joined: 18 Sep 05 Posts: 123 Credit: 1,355,486 RAC: 0 |
Here is what I think is happening. It contacts the server asking for a new WU. The server starts going through a list of available WUs. If my machine does not meet the memory requirements, it gets sent one of these error messages. This continues until either the list is exhausted, in which case it causes a 24 hour backoff, or it happens to hit a WU in the list that has small memory requirement. In this latter case the WU is sent as usual. >You could very well be right, Charlie. There was some talk of assigning certain WUs to machines with 1G memory before the upgrade and maybe this problem follows from that......I don't know what has happened to the usual excellent communication from the Admins (CASP7?) but it would be nice to know what's up so we can upgrade RAM or whatever....Cheers, Rog. |
m.mitch Send message Joined: 10 Feb 06 Posts: 34 Credit: 1,928,904 RAC: 0 |
When I try to update my "Message board preferences" I get the following error: Couldn't update forum preferences. Unknown column 'minimum_wrap_postcount' in 'field list' Is there a fix listed for this? Click here to join the #1 Aussie Alliance on Rosetta |
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1018 Credit: 4,334,829 RAC: 0 |
I'll notify Rhiju and Bin about this issue and see what we can do if this is indeed what is happening. |
MolAnO Send message Joined: 30 Jan 06 Posts: 3 Credit: 402,784 RAC: 0 |
I'll notify Rhiju and Bin about this issue and see what we can do if this is indeed what is happening. Thanks for taking care of this. On a side note, Linux-pc's have normally enough swap-space available to handle those large WU's. As said before, I got an old server with 8 CPU's and only 256MB of RAM, which needs to be shared among those 8 CPU's and I have never had 1 error on a WU. Right now, my RAC is dropping like hell! :) Just my 2 cents |
Rhiju Volunteer moderator Send message Joined: 8 Jan 06 Posts: 223 Credit: 3,546 RAC: 0 |
Bin and I are on it. Its exactly the scenario that Charles described below. We'll figure out a solution! I'll notify Rhiju and Bin about this issue and see what we can do if this is indeed what is happening. |
Charles Dennett Send message Joined: 27 Sep 05 Posts: 102 Credit: 2,081,660 RAC: 345 |
Bin and I are on it. Its exactly the scenario that Charles described below. We'll figure out a solution! Great! Thanks for all the hard work! I increaed the swap space on my old Windows boxes a while ago so total memory (physical + virtual) on them should be more than enough. I don't care if they swap like heck every once in a while. One is my music server (feeding a NetGear MP101 attached to my entertainment center) and the other is only around to run Microsoft Money for tracking family finances and I can turn off the crunching when I need to use them. Keep up the good work! Charlie -Charlie |
AnRM Send message Joined: 18 Sep 05 Posts: 123 Credit: 1,355,486 RAC: 0 |
Bin and I are on it. Its exactly the scenario that Charles described below. We'll figure out a solution! >Thanks, Guys.....Cheers, Rog. |
Moderator9 Volunteer moderator Send message Joined: 22 Jan 06 Posts: 1014 Credit: 0 RAC: 0 |
|
Charles Dennett Send message Joined: 27 Sep 05 Posts: 102 Credit: 2,081,660 RAC: 345 |
Bin and I are on it. Its exactly the scenario that Charles described below. We'll figure out a solution! Possibly a new development? It appears now that I only get a single message that I do not have enough memory rather than a large batch of messages. However, it still backs off for 24 hours. Any chance of lowering that to an hour or two or going into an exponential backoff starting at a small time period and getting progressivly larger up to a max of a hour or two? Right now my queue length on this machine is 0.5 days and the WU run time is set for 4 hours. I may have to adjust that upwards so that I have more than a day's worth of work in my queue so as not to run out when it goes into a 24 hour backoff. Again, thanks for all the hard work. Charlie -Charlie |
m.mitch Send message Joined: 10 Feb 06 Posts: 34 Credit: 1,928,904 RAC: 0 |
Worked perfectly. That was very quick, thanks. Click here to join the #1 Aussie Alliance on Rosetta |
charmed Send message Joined: 2 Nov 05 Posts: 11 Credit: 1,780,440 RAC: 0 |
I'm in the same boat, just the one line saying not enough memory then the box inexplicably sits idle for 24 hours unless I manually intervene. |
charmed Send message Joined: 2 Nov 05 Posts: 11 Credit: 1,780,440 RAC: 0 |
I'm in the same boat, just the one line saying not enough memory then the box inexplicably sits idle for 24 hours unless I manually intervene. Is anyone working on this? I found 4 of my boxes sitting idle this morning again due to the same problem. What a waste of time and resources. When this happens I'm getting about half the work I should be. Multiply that by the thousands of computers ruuning Rosetta and this must be a huge problem. |
Charles Dennett Send message Joined: 27 Sep 05 Posts: 102 Credit: 2,081,660 RAC: 345 |
I'm in the same boat, just the one line saying not enough memory then the box inexplicably sits idle for 24 hours unless I manually intervene. You may want to consider a backup project if all you do now is RAH and you'd like your machines to not run out of any kind of work. I figured SIMAP was somewhat related (it deals with protein similarities - something that RAH uses in its algorithms) so I set up my RAH resource share at 9900 and my SIMAP share at 100. That gives me 99.0% RAH and 1.0% SIMAP. When you first do this, your BIONC client may download several SIMAP workunits and very slowly crunch away on them. As they near their deadline the client will most likely go into earliest deadline first mode and finish off any remaining SIMAP workunits. Then, as long as there are RAH workunits available, you'll crunch only those for the next several weeks until the long term debt for SIMAP is lowered enough to start the process over. Of course, if RAH cannot supply work, SIMAP will take over until RAH can once again supply work. If you choose to go this route, just give BOINC a chance to work by itself. Resist the temptation to play with the scheduling and all will work out well. I've used SIMAP here because that is what I use. You may prefer some other project as a backup. Also, there is nothing to prevent you from using more that one project as a backup. Just give them all a low resource share. Other people prefer to join multiple projects and give them all equal resource shares or somethng other than very low shares. Meanwhile, I have one old machine that has this problem of low memory. I moved it to a different venue (I typically use Home but switched this one over to School) and set the time to contact the server to 2 days for the School venue (Home is set at 0.5 days). At least that way when it does contact the server, either by itself or if I force a manual update, it will have enough work so that it hopefully will not run out until the next time I can check on it. Charlie -Charlie |
Moderator9 Volunteer moderator Send message Joined: 22 Jan 06 Posts: 1014 Credit: 0 RAC: 0 |
I'm in the same boat, just the one line saying not enough memory then the box inexplicably sits idle for 24 hours unless I manually intervene. I a Post lower down in this thread, Rhiju and Bin are in fact aware of and working on this issue. However the problem is not as wide spread as you might believe. Most systems that are seeing this error message are also still getting work. Moderator9 ROSETTA@home FAQ Moderator Contact |
Message boards :
Number crunching :
Report Problems With BOINC SERVER UPGRADE
©2024 University of Washington
https://www.bakerlab.org