Memory requirements

Message boards : Number crunching : Memory requirements

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile River~~
Avatar

Send message
Joined: 15 Dec 05
Posts: 761
Credit: 285,578
RAC: 0
Message 29678 - Posted: 20 Oct 2006, 5:54:42 UTC

The Rosetta server just refused this machine work with a message saying I don't have enough memory. Machine has 256Mb which is OK according to the system requirements page.

In fairness this is a 2cpu box, so maybe the server wanted double ram?

But the really puzzling thing is that this box has been downloading work OK since December 2005, and soon after that message, when it next asked for work the scheduler issued it.

So is this a random glitch or do some WU ask for more memory than others? I get worried when computers do things that don't seem to be repeatable.

River~~
ID: 29678 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 29679 - Posted: 20 Oct 2006, 6:30:54 UTC

Hi River.

Yes they have don't know by how much.


David Baker Posted 16 Oct 2006 3:44:27 UTC
we will increase the minimum memory requirement for larger jobs to avoid causing problems with people with smaller memory machines.






ID: 29679 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
FluffyChicken
Avatar

Send message
Joined: 1 Nov 05
Posts: 1260
Credit: 369,635
RAC: 0
Message 29680 - Posted: 20 Oct 2006, 7:52:48 UTC

As above I guess there are no standard memory jobs left if you cannot get any.#
As for your dual core/cpu and memory, they are currently developing an improved memory management for this sort of issue.
I asked it in last weeks Q&A with Rom Walton
http://www.romwnet.org/dasblogce/PermaLink,guid,826504c3-b084-40b9-b299-96214cbe5941.aspx


P.S. From what I remember reading Seti currently uses an adaptive memory technique and shuts down various speedups if there is low memory on the computer, for instance the chaching technique is not used if there is not enough memory to support it, various function speedups are not used if not enough memory, this allows it to still run on the lower memory computer, just slower. Maybe if there are constantly larger memory requirments here, they could do something similir if the program allows it (with still usefull results)
Team mauisun.org
ID: 29680 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile River~~
Avatar

Send message
Joined: 15 Dec 05
Posts: 761
Credit: 285,578
RAC: 0
Message 29683 - Posted: 20 Oct 2006, 8:10:29 UTC - in response to Message 29680.  


P.S. From what I remember reading Seti currently uses an adaptive memory technique and shuts down various speedups if there is low memory on the computer, [...] Maybe if there are constantly larger memory requirments here, they could do something similir if the program allows it (with still usefull results)


Yes if the larger mem requirements become constant.

If they apply to some WU and not others then letting the scheduler deal with it seems the best idea - keep the large jobs for machines that can run them without slowing down.

I am not sure tho just how clever the scheduler is.

If it is planning to give me task X then finds it is too big for my box, does the scheduler actually go on and look for one that fits? Or does it just say no, relying on my box to come back and ask again in five minutes? The latter strategy could make sense in that it would save a huge database search by the scheduler.

R~~
ID: 29683 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1831
Credit: 119,627,225
RAC: 11,586
Message 29685 - Posted: 20 Oct 2006, 8:17:49 UTC

wouldn't it reduce the memory requirement for multi-core/multi-cpu machines if they were all running different decoys on the same WU?
ID: 29685 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1831
Credit: 119,627,225
RAC: 11,586
Message 29687 - Posted: 20 Oct 2006, 8:20:35 UTC - in response to Message 29683.  

If it is planning to give me task X then finds it is too big for my box, does the scheduler actually go on and look for one that fits? Or does it just say no, relying on my box to come back and ask again in five minutes? The latter strategy could make sense in that it would save a huge database search by the scheduler.


If so it would make sense for BOINC to be changed to come back immediately with another request if the reason was 'not enough memory'. That'd be the dumb way to do it i guess. Of course the smart way would be requesting jobs based on the computer's average available memory.

ID: 29687 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile River~~
Avatar

Send message
Joined: 15 Dec 05
Posts: 761
Credit: 285,578
RAC: 0
Message 29692 - Posted: 20 Oct 2006, 8:48:56 UTC - in response to Message 29687.  

If it is planning to give me task X then finds it is too big for my box, does the scheduler actually go on and look for one that fits? Or does it just say no, relying on my box to come back and ask again in five minutes? The latter strategy could make sense in that it would save a huge database search by the scheduler.


If so it would make sense for BOINC to be changed to come back immediately with another request if the reason was 'not enough memory'. That'd be the dumb way to do it i guess.


At first sight yes, but on looking further, ooh er no. If there really were no more small jobs, then that would clutter up the bandwidth too much.

Of course the smart way would be requesting jobs based on the computer's average available memory.


To avoid a possible huge load on the database, the server would have to be changed so that it had a hit list of suitable jobs in different memory use bands in the shmem area (a shared memory area used (on at least some projects) to serve the next few wu quickly). If that were done, then you could also avoid giving small jobs to a large computer when there was a large job available.

But what you do not want to do is to have to go back to the database to find a suitable sized job every time. Server to database requests are already the slowest link in the system.

R~~
ID: 29692 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile jaxom1
Avatar

Send message
Joined: 5 Jun 06
Posts: 180
Credit: 1,586,889
RAC: 0
Message 29697 - Posted: 20 Oct 2006, 14:21:33 UTC
Last modified: 20 Oct 2006, 15:05:30 UTC

I would like it if you could change a setting so my computers would look for LARGE/Memory Hog jobs first. I am running this on several servers that have 2-4GB of RAM, and my laptop has 2GB as well. I think it would be a good idea, so I could save the smaller requirement jobs for people that have less RAM.

Just a thought...
ID: 29697 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
FluffyChicken
Avatar

Send message
Joined: 1 Nov 05
Posts: 1260
Credit: 369,635
RAC: 0
Message 29705 - Posted: 20 Oct 2006, 15:53:06 UTC - in response to Message 29697.  

I would like it if you could change a setting so my computers would look for LARGE/Memory Hog jobs first. I am running this on several servers that have 2-4GB of RAM, and my laptop has 2GB as well. I think it would be a good idea, so I could save the smaller requirement jobs for people that have less RAM.

Just a thought...


Nice to see you still here btw ;-)
Team mauisun.org
ID: 29705 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile jaxom1
Avatar

Send message
Joined: 5 Jun 06
Posts: 180
Credit: 1,586,889
RAC: 0
Message 29709 - Posted: 20 Oct 2006, 16:17:03 UTC

took a month off. decided the science is more important than the bickering...

ID: 29709 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Charles Dennett
Avatar

Send message
Joined: 27 Sep 05
Posts: 102
Credit: 2,081,660
RAC: 513
Message 29722 - Posted: 20 Oct 2006, 21:45:51 UTC
Last modified: 20 Oct 2006, 21:46:37 UTC

One thing I've noticed about the high memory requirements of some workunits.

I have a couple of old, low memory PCs. They will contact the server to get new work. The server will respond that the workunit requires more memory than the machine has. However, it will download a workunit or two. (Can't tell if it's the one with the high memory requirement or not.) Then, it backs off a full 24 hours. In other words, it will not contact the server again for another 24 hours. If I want it to report any completd workunits, I need to manually update.

If it downloads a workunit, why is there a need to back off?

If it backs off, why does it go for 24 hours? Why not 1 hour?

(Also back after a short hiatus.)

Charlie
-Charlie
ID: 29722 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile River~~
Avatar

Send message
Joined: 15 Dec 05
Posts: 761
Credit: 285,578
RAC: 0
Message 29774 - Posted: 21 Oct 2006, 17:32:44 UTC - in response to Message 29722.  
Last modified: 21 Oct 2006, 17:34:19 UTC

...The server will respond that the workunit requires more memory than the machine has. However, it will download a workunit or two. (Can't tell if it's the one with the high memory requirement or not.) Then, it backs off a full 24 hours. ...


Thanks for pointing that out Charles.

This explains something that has been puzzling me - I had not spotted the back off, but had spotted that my boxes have been unaccountably downloading WCG WU. (WCG has a resource share of a zillionth of a percent and is intended as a back up project only).

Looking back at the messages tabs, C is right, it is caused by the 24hr backoff after the memory message.

In the last two weeks WCG has gone from around 6000 in my stats to whatever you see below, and still rising. This is >2000 credits crunching that is lost to this project (tho no doubt welcome over the fence).

River~~

ID: 29774 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1831
Credit: 119,627,225
RAC: 11,586
Message 29817 - Posted: 22 Oct 2006, 11:53:41 UTC

i think this explains why some of my low-mem remotes are struggling atm too. Is there a straight-forward resolution available for this?
ID: 29817 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 29826 - Posted: 22 Oct 2006, 17:28:51 UTC

Isn't the solution simply to have WUs available for both memory footprints? Perhaps increasing the queue size would help assure there are WUs available for both large and small memory work.

...or are you saying that the client doesn't realize the WU is for a large memory machine until it does a download, and then once it realizes this, it backs off 24hrs? (boy that would not be a good system... sounds like improvements are need to BOINC if that is the case). It would REALLY be bad if you were a dial-up user, and finished a 5MB download, only to find it has been built for a large memory machine, and you can't run it.
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 29826 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Charles Dennett
Avatar

Send message
Joined: 27 Sep 05
Posts: 102
Credit: 2,081,660
RAC: 513
Message 29831 - Posted: 22 Oct 2006, 19:42:09 UTC - in response to Message 29826.  

Isn't the solution simply to have WUs available for both memory footprints? Perhaps increasing the queue size would help assure there are WUs available for both large and small memory work.

...or are you saying that the client doesn't realize the WU is for a large memory machine until it does a download, and then once it realizes this, it backs off 24hrs? (boy that would not be a good system... sounds like improvements are need to BOINC if that is the case). It would REALLY be bad if you were a dial-up user, and finished a 5MB download, only to find it has been built for a large memory machine, and you can't run it.


I'm not sure what it is doing. However, here's what I'm guessing it is probably doing...maybe. (Trying to be really wishy-washy here.)

The client contacts the server for new work. The server grabs a WU from its queue and figures it needs more memory than the client has. It sends a message to the client telling it that this WU needs more memory. It does not send the WU to the client but instead puts it back into the queue. It then pulls the next one from the queue. If the memory requirements for that are ok, it sends it. However, since the first WU needed more memory than the client had, either the server tells the client to back off 24 hours or the client decides to do it for itself.

Another scenario is that the server does indeed send the high memory WU to the client and then tells it to back off 24 hours.

In either case, something seems wrong. If it is not sending the high memory WU, then why does the client need to back off 24 hours (or at all)? If no low memory WU were available, then either start an exponential backoff or simply back off for 1 hour.

If it is is sending the high memory WU, then that seems wrong.

Of course, I could be totally wrong. As I said, I'm just taking a guess.

Charlie

-Charlie
ID: 29831 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile River~~
Avatar

Send message
Joined: 15 Dec 05
Posts: 761
Credit: 285,578
RAC: 0
Message 29863 - Posted: 23 Oct 2006, 10:47:14 UTC - in response to Message 29831.  



...or are you saying that the client doesn't realize the WU is for a large memory machine until it does a download...


I'm not sure what it is doing. However, here's what I'm guessing it is probably doing...maybe. (Trying to be really wishy-washy here.)

The client contacts the server for new work. The server grabs a WU from its queue and figures it needs more memory than the client has. It sends a message to the client telling it that this WU needs more memory. It does not send the WU to the client but instead puts it back into the queue. It then pulls the next one from the queue. If the memory requirements for that are ok, it sends it. However, since the first WU needed more memory than the client had, either the server tells the client to back off 24 hours or the client decides to do it for itself.
...


My guess is close but different. I think it grabs a WU, sends it if it is small enough and goes on to the next WU. As soon as it hits the first big WU I think it is giving up and at that point it tells the client to back off 24.

Evidence for this is that sometimes I get work when I get the memory message and sometimes I don't. On Charles's guess work would always come.

On my guess, the fact that work sometimes comes is explained by the small work being earlier in the queue on the server, and the fact that work sometimes doesn't come by the times when the very next WU is the big one.

R~~
ID: 29863 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Charles Dennett
Avatar

Send message
Joined: 27 Sep 05
Posts: 102
Credit: 2,081,660
RAC: 513
Message 29869 - Posted: 23 Oct 2006, 12:02:06 UTC - in response to Message 29863.  



...or are you saying that the client doesn't realize the WU is for a large memory machine until it does a download...


I'm not sure what it is doing. However, here's what I'm guessing it is probably doing...maybe. (Trying to be really wishy-washy here.)

The client contacts the server for new work. The server grabs a WU from its queue and figures it needs more memory than the client has. It sends a message to the client telling it that this WU needs more memory. It does not send the WU to the client but instead puts it back into the queue. It then pulls the next one from the queue. If the memory requirements for that are ok, it sends it. However, since the first WU needed more memory than the client had, either the server tells the client to back off 24 hours or the client decides to do it for itself.
...


My guess is close but different. I think it grabs a WU, sends it if it is small enough and goes on to the next WU. As soon as it hits the first big WU I think it is giving up and at that point it tells the client to back off 24.

Evidence for this is that sometimes I get work when I get the memory message and sometimes I don't. On Charles's guess work would always come.

On my guess, the fact that work sometimes comes is explained by the small work being earlier in the queue on the server, and the fact that work sometimes doesn't come by the times when the very next WU is the big one.

R~~


Well, your analysis certainly makes sense based on your observations. Good catch.

Charlie

-Charlie
ID: 29869 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
FluffyChicken
Avatar

Send message
Joined: 1 Nov 05
Posts: 1260
Credit: 369,635
RAC: 0
Message 29870 - Posted: 23 Oct 2006, 13:55:34 UTC

River I went to see what BOINC version you where running, but you have a large amount of slow computers :-D

Could you put 5.6.5/5.6.4 on there to see if it still has the same problem ?
Team mauisun.org
ID: 29870 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile River~~
Avatar

Send message
Joined: 15 Dec 05
Posts: 761
Credit: 285,578
RAC: 0
Message 29874 - Posted: 23 Oct 2006, 14:33:49 UTC - in response to Message 29870.  
Last modified: 23 Oct 2006, 14:35:21 UTC

River I went to see what BOINC version you where running, but you have a large amount of slow computers :-D


Slow? Three of them are almost 0.9GHz.

Each :-)

For more details see this LHC posting their Boinc Farms thread.

Mostly running either 5.4.11 (win) or 5.4.9 (linux). A couple still on 5.2.x where I seem to have missed those boxes out of my last upgrade. Must have got distracted.

Could you put 5.6.5/5.6.4 on there to see if it still has the same problem ?

will do later this week - it takes a while to update 11 boxes.

... and the issue is intermittent so will take a few days to know anyway.

R~~
ID: 29874 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
FluffyChicken
Avatar

Send message
Joined: 1 Nov 05
Posts: 1260
Credit: 369,635
RAC: 0
Message 29880 - Posted: 23 Oct 2006, 16:51:36 UTC - in response to Message 29874.  
Last modified: 23 Oct 2006, 16:59:22 UTC

River, if any of them would take a uFCPGA [CPUz identifies it worng as socket BGA2] (never konw, they are small form factor. It a laptop socket.)
I may be able to get you a large super duper fast upgrade to a whole 1000MHz.
Team mauisun.org
ID: 29880 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
1 · 2 · Next

Message boards : Number crunching : Memory requirements



©2024 University of Washington
https://www.bakerlab.org