Report Problems With BOINC SERVER UPGRADE

Message boards : Number crunching : Report Problems With BOINC SERVER UPGRADE

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
Profile Charles Dennett
Avatar

Send message
Joined: 27 Sep 05
Posts: 102
Credit: 2,081,660
RAC: 345
Message 18546 - Posted: 13 Jun 2006, 0:54:00 UTC - in response to Message 18470.  

I have a number of computers with less than 512Megs of ram, and since the upgrade I have been getting a number of error messages on those computers like this:

6/10/2006 5:30:21 PM|rosetta@home|Message from server: Your computer has only 266780672 bytes of memory; workunit requires 233219328 more bytes

The work units download and complete without any problems, but the computers then stop communicating with the server. So the results are never sent in for credit and the computers stop requesting work.

I realize that the minimum requirement is 512megs, but these computers have been crunching without any problems for 6 months. Are you now forcing computers to have 512megs before allowing them to take part in the project?


I'd like to second this post. I have 2 old windows machines - a 300MHz PII with 384 meg of memory running Win98SE and an 667 MHZ PIII with only 128 meg of memory running Win2K. Both have been crunching just fine (the Win 98SE machine seems also to not have the problem where the app does not report cpu time any more. It's been several days since I saw that one. Did that get fixed?)

Anyway, the PIII is now reporting this message (with different nuumbers) a couple of dozen times when it contacts the server. It then backs off for 24 hours. It does not stop communicating. I can manually update the project. I saw this once before last week and a manual update worked. Now it does not. It reports back the same error about not enough memory, and backs off 24 hours. If this continues, it should switch over to my backup project, SIMAP once the RAH workunits complete. I run 99% RAH and 1% SIMAP on all three of my machines (third one is an AMD XP2600+ running Linux and 1 gig of memory.)

I increaed the swap area on the PIII and yes, it does swap a bit, but it is still is able to crunch the workunits. I would really like to have it work on RAH if at all possible. I could spring to the PC100 memory it needs, but I didn't want to sink any money into the old machine. Maybe I'll have to.

Charlie


-Charlie
ID: 18546 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Charles Dennett
Avatar

Send message
Joined: 27 Sep 05
Posts: 102
Credit: 2,081,660
RAC: 345
Message 18547 - Posted: 13 Jun 2006, 2:13:13 UTC - in response to Message 18546.  



Anyway, the PIII is now reporting this message (with different nuumbers) a couple of dozen times when it contacts the server.



I just wanted to emphasize that when the system does contact the RAH servers, whether it does it itself or I manually force an update, I get the message about not enought memory repeated many many times, not just once. The last time was 184 times. The time before that was 177 times.

Charlie


-Charlie
ID: 18547 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Charles Dennett
Avatar

Send message
Joined: 27 Sep 05
Posts: 102
Credit: 2,081,660
RAC: 345
Message 18549 - Posted: 13 Jun 2006, 2:49:24 UTC - in response to Message 18547.  

Just tried another manual update. This time there were no messages about not enough memory and it downloaded a new WU. Strange!

Charlie

-Charlie
ID: 18549 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
AnRM
Avatar

Send message
Joined: 18 Sep 05
Posts: 123
Credit: 1,355,486
RAC: 0
Message 18550 - Posted: 13 Jun 2006, 4:15:23 UTC - in response to Message 18470.  

I have a number of computers with less than 512Megs of ram, and since the upgrade I have been getting a number of error messages on those computers like this:

6/10/2006 5:30:21 PM|rosetta@home|Message from server: Your computer has only 266780672 bytes of memory; workunit requires 233219328 more bytes

The work units download and complete without any problems, but the computers then stop communicating with the server. So the results are never sent in for credit and the computers stop requesting work.

I realize that the minimum requirement is 512megs, but these computers have been crunching without any problems for 6 months. Are you now forcing computers to have 512megs before allowing them to take part in the project?

>We have received the same message on a machine with 469Meg RAM.....had to manually upgrade to get more work.....seems to be a growing problem with machines that were happily crunching R@H before the upgrade.....Cheers, Rog.
ID: 18550 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
MolAnO

Send message
Joined: 30 Jan 06
Posts: 3
Credit: 402,784
RAC: 0
Message 18552 - Posted: 13 Jun 2006, 10:30:48 UTC - in response to Message 18550.  

I have a number of computers with less than 512Megs of ram, and since the upgrade I have been getting a number of error messages on those computers like this:

6/10/2006 5:30:21 PM|rosetta@home|Message from server: Your computer has only 266780672 bytes of memory; workunit requires 233219328 more bytes

The work units download and complete without any problems, but the computers then stop communicating with the server. So the results are never sent in for credit and the computers stop requesting work.

I realize that the minimum requirement is 512megs, but these computers have been crunching without any problems for 6 months. Are you now forcing computers to have 512megs before allowing them to take part in the project?

>We have received the same message on a machine with 469Meg RAM.....had to manually upgrade to get more work.....seems to be a growing problem with machines that were happily crunching R@H before the upgrade.....Cheers, Rog.


I would also like to second this post. I have an 8-way server with only 256MB RAM running Linux. It crunched great before the upgrade, but now I have also a lot of errors indicating I have not enough memory and then the communication stops for 24 hours.
Manually forcing an update does some help, but I do not get enough WU for all my processors...

ID: 18552 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Charles Dennett
Avatar

Send message
Joined: 27 Sep 05
Posts: 102
Credit: 2,081,660
RAC: 345
Message 18553 - Posted: 13 Jun 2006, 11:40:03 UTC - in response to Message 18549.  

Just tried another manual update. This time there were no messages about not enough memory and it downloaded a new WU. Strange!


This morning when I checked the PIII it had asked for a new WU overnight and had received an uncounted number of messages about not enough memory and had backed off for 24 hours. I tried a manual update. I got a couple dozen messages about not enough memory and then it started downloading a new WU.

Here is what I think is happening. It contacts the server asking for a new WU. The server starts going through a list of available WUs. If my machine does not meet the memory requirements, it gets sent one of these error messages. This continues until either the list is exhausted, in which case it causes a 24 hour backoff, or it happens to hit a WU in the list that has small memory requirement. In this latter case the WU is sent as usual.

Charlie
-Charlie
ID: 18553 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
AnRM
Avatar

Send message
Joined: 18 Sep 05
Posts: 123
Credit: 1,355,486
RAC: 0
Message 18573 - Posted: 13 Jun 2006, 16:07:49 UTC - in response to Message 18553.  
Last modified: 13 Jun 2006, 16:13:13 UTC

Here is what I think is happening. It contacts the server asking for a new WU. The server starts going through a list of available WUs. If my machine does not meet the memory requirements, it gets sent one of these error messages. This continues until either the list is exhausted, in which case it causes a 24 hour backoff, or it happens to hit a WU in the list that has small memory requirement. In this latter case the WU is sent as usual.

Charlie

>You could very well be right, Charlie. There was some talk of assigning certain WUs to machines with 1G memory before the upgrade and maybe this problem follows from that......I don't know what has happened to the usual excellent communication from the Admins (CASP7?) but it would be nice to know what's up so we can upgrade RAM or whatever....Cheers, Rog.
ID: 18573 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile m.mitch
Avatar

Send message
Joined: 10 Feb 06
Posts: 34
Credit: 1,928,904
RAC: 0
Message 18575 - Posted: 13 Jun 2006, 16:34:07 UTC


When I try to update my "Message board preferences" I get the following error:
Couldn't update forum preferences.
Unknown column 'minimum_wrap_postcount' in 'field list'

Is there a fix listed for this?




Click here to join the #1 Aussie Alliance on Rosetta
ID: 18575 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 18577 - Posted: 13 Jun 2006, 17:19:07 UTC

I'll notify Rhiju and Bin about this issue and see what we can do if this is indeed what is happening.
ID: 18577 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
MolAnO

Send message
Joined: 30 Jan 06
Posts: 3
Credit: 402,784
RAC: 0
Message 18585 - Posted: 13 Jun 2006, 18:45:52 UTC - in response to Message 18577.  

I'll notify Rhiju and Bin about this issue and see what we can do if this is indeed what is happening.


Thanks for taking care of this.

On a side note, Linux-pc's have normally enough swap-space available to handle those large WU's. As said before, I got an old server with 8 CPU's and only 256MB of RAM, which needs to be shared among those 8 CPU's and I have never had 1 error on a WU.

Right now, my RAC is dropping like hell! :)

Just my 2 cents
ID: 18585 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rhiju
Volunteer moderator

Send message
Joined: 8 Jan 06
Posts: 223
Credit: 3,546
RAC: 0
Message 18586 - Posted: 13 Jun 2006, 18:53:45 UTC - in response to Message 18585.  

Bin and I are on it. Its exactly the scenario that Charles described below. We'll figure out a solution!

I'll notify Rhiju and Bin about this issue and see what we can do if this is indeed what is happening.


Thanks for taking care of this.

On a side note, Linux-pc's have normally enough swap-space available to handle those large WU's. As said before, I got an old server with 8 CPU's and only 256MB of RAM, which needs to be shared among those 8 CPU's and I have never had 1 error on a WU.

Right now, my RAC is dropping like hell! :)

Just my 2 cents


ID: 18586 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Charles Dennett
Avatar

Send message
Joined: 27 Sep 05
Posts: 102
Credit: 2,081,660
RAC: 345
Message 18587 - Posted: 13 Jun 2006, 19:00:11 UTC - in response to Message 18586.  

Bin and I are on it. Its exactly the scenario that Charles described below. We'll figure out a solution!

I'll notify Rhiju and Bin about this issue and see what we can do if this is indeed what is happening.


Thanks for taking care of this.

On a side note, Linux-pc's have normally enough swap-space available to handle those large WU's. As said before, I got an old server with 8 CPU's and only 256MB of RAM, which needs to be shared among those 8 CPU's and I have never had 1 error on a WU.

Right now, my RAC is dropping like hell! :)

Just my 2 cents




Great! Thanks for all the hard work! I increaed the swap space on my old Windows boxes a while ago so total memory (physical + virtual) on them should be more than enough. I don't care if they swap like heck every once in a while. One is my music server (feeding a NetGear MP101 attached to my entertainment center) and the other is only around to run Microsoft Money for tracking family finances and I can turn off the crunching when I need to use them.

Keep up the good work!

Charlie


-Charlie
ID: 18587 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
AnRM
Avatar

Send message
Joined: 18 Sep 05
Posts: 123
Credit: 1,355,486
RAC: 0
Message 18601 - Posted: 14 Jun 2006, 0:37:42 UTC - in response to Message 18586.  

Bin and I are on it. Its exactly the scenario that Charles described below. We'll figure out a solution!

>Thanks, Guys.....Cheers, Rog.

ID: 18601 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Moderator9
Volunteer moderator

Send message
Joined: 22 Jan 06
Posts: 1014
Credit: 0
RAC: 0
Message 18607 - Posted: 14 Jun 2006, 2:51:51 UTC - in response to Message 18575.  


When I try to update my "Message board preferences" I get the following error:
Couldn't update forum preferences.
Unknown column 'minimum_wrap_postcount' in 'field list'

Is there a fix listed for this?


David Kim reports that this issue has been fixed.
Moderator9
ROSETTA@home FAQ
Moderator Contact
ID: 18607 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Charles Dennett
Avatar

Send message
Joined: 27 Sep 05
Posts: 102
Credit: 2,081,660
RAC: 345
Message 18629 - Posted: 14 Jun 2006, 11:51:36 UTC - in response to Message 18586.  

Bin and I are on it. Its exactly the scenario that Charles described below. We'll figure out a solution!


Possibly a new development? It appears now that I only get a single message that I do not have enough memory rather than a large batch of messages. However, it still backs off for 24 hours. Any chance of lowering that to an hour or two or going into an exponential backoff starting at a small time period and getting progressivly larger up to a max of a hour or two?

Right now my queue length on this machine is 0.5 days and the WU run time is set for 4 hours. I may have to adjust that upwards so that I have more than a day's worth of work in my queue so as not to run out when it goes into a 24 hour backoff.

Again, thanks for all the hard work.

Charlie


-Charlie
ID: 18629 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile m.mitch
Avatar

Send message
Joined: 10 Feb 06
Posts: 34
Credit: 1,928,904
RAC: 0
Message 18637 - Posted: 14 Jun 2006, 13:45:12 UTC - in response to Message 18607.  


When I try to update my "Message board preferences" I get the following error:
Couldn't update forum preferences.
Unknown column 'minimum_wrap_postcount' in 'field list'

Is there a fix listed for this?


David Kim reports that this issue has been fixed.


Worked perfectly. That was very quick, thanks.




Click here to join the #1 Aussie Alliance on Rosetta
ID: 18637 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
charmed

Send message
Joined: 2 Nov 05
Posts: 11
Credit: 1,780,440
RAC: 0
Message 18655 - Posted: 14 Jun 2006, 19:50:05 UTC - in response to Message 18637.  

I'm in the same boat, just the one line saying not enough memory then the box inexplicably sits idle for 24 hours unless I manually intervene.
ID: 18655 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
charmed

Send message
Joined: 2 Nov 05
Posts: 11
Credit: 1,780,440
RAC: 0
Message 18698 - Posted: 15 Jun 2006, 10:41:24 UTC - in response to Message 18655.  

I'm in the same boat, just the one line saying not enough memory then the box inexplicably sits idle for 24 hours unless I manually intervene.



Is anyone working on this? I found 4 of my boxes sitting idle this morning again due to the same problem. What a waste of time and resources. When this happens I'm getting about half the work I should be. Multiply that by the thousands of computers ruuning Rosetta and this must be a huge problem.
ID: 18698 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Charles Dennett
Avatar

Send message
Joined: 27 Sep 05
Posts: 102
Credit: 2,081,660
RAC: 345
Message 18699 - Posted: 15 Jun 2006, 11:51:08 UTC - in response to Message 18698.  

I'm in the same boat, just the one line saying not enough memory then the box inexplicably sits idle for 24 hours unless I manually intervene.



Is anyone working on this? I found 4 of my boxes sitting idle this morning again due to the same problem. What a waste of time and resources. When this happens I'm getting about half the work I should be. Multiply that by the thousands of computers ruuning Rosetta and this must be a huge problem.


You may want to consider a backup project if all you do now is RAH and you'd like your machines to not run out of any kind of work. I figured SIMAP was somewhat related (it deals with protein similarities - something that RAH uses in its algorithms) so I set up my RAH resource share at 9900 and my SIMAP share at 100. That gives me 99.0% RAH and 1.0% SIMAP. When you first do this, your BIONC client may download several SIMAP workunits and very slowly crunch away on them. As they near their deadline the client will most likely go into earliest deadline first mode and finish off any remaining SIMAP workunits. Then, as long as there are RAH workunits available, you'll crunch only those for the next several weeks until the long term debt for SIMAP is lowered enough to start the process over. Of course, if RAH cannot supply work, SIMAP will take over until RAH can once again supply work.

If you choose to go this route, just give BOINC a chance to work by itself. Resist the temptation to play with the scheduling and all will work out well. I've used SIMAP here because that is what I use. You may prefer some other project as a backup. Also, there is nothing to prevent you from using more that one project as a backup. Just give them all a low resource share. Other people prefer to join multiple projects and give them all equal resource shares or somethng other than very low shares.

Meanwhile, I have one old machine that has this problem of low memory. I moved it to a different venue (I typically use Home but switched this one over to School) and set the time to contact the server to 2 days for the School venue (Home is set at 0.5 days). At least that way when it does contact the server, either by itself or if I force a manual update, it will have enough work so that it hopefully will not run out until the next time I can check on it.

Charlie

-Charlie
ID: 18699 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Moderator9
Volunteer moderator

Send message
Joined: 22 Jan 06
Posts: 1014
Credit: 0
RAC: 0
Message 18702 - Posted: 15 Jun 2006, 13:46:51 UTC - in response to Message 18698.  
Last modified: 15 Jun 2006, 13:49:58 UTC

I'm in the same boat, just the one line saying not enough memory then the box inexplicably sits idle for 24 hours unless I manually intervene.



Is anyone working on this? I found 4 of my boxes sitting idle this morning again due to the same problem. What a waste of time and resources. When this happens I'm getting about half the work I should be. Multiply that by the thousands of computers ruuning Rosetta and this must be a huge problem.

I a Post lower down in this thread, Rhiju and Bin are in fact aware of and working on this issue. However the problem is not as wide spread as you might believe. Most systems that are seeing this error message are also still getting work.

Moderator9
ROSETTA@home FAQ
Moderator Contact
ID: 18702 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · Next

Message boards : Number crunching : Report Problems With BOINC SERVER UPGRADE



©2024 University of Washington
https://www.bakerlab.org