Message boards : Number crunching : Memory requirements
Author | Message |
---|---|
River~~ Send message Joined: 15 Dec 05 Posts: 761 Credit: 285,578 RAC: 0 |
The Rosetta server just refused this machine work with a message saying I don't have enough memory. Machine has 256Mb which is OK according to the system requirements page. In fairness this is a 2cpu box, so maybe the server wanted double ram? But the really puzzling thing is that this box has been downloading work OK since December 2005, and soon after that message, when it next asked for work the scheduler issued it. So is this a random glitch or do some WU ask for more memory than others? I get worried when computers do things that don't seem to be repeatable. River~~ |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
Hi River. Yes they have don't know by how much.
|
FluffyChicken Send message Joined: 1 Nov 05 Posts: 1260 Credit: 369,635 RAC: 0 |
As above I guess there are no standard memory jobs left if you cannot get any.# As for your dual core/cpu and memory, they are currently developing an improved memory management for this sort of issue. I asked it in last weeks Q&A with Rom Walton http://www.romwnet.org/dasblogce/PermaLink,guid,826504c3-b084-40b9-b299-96214cbe5941.aspx P.S. From what I remember reading Seti currently uses an adaptive memory technique and shuts down various speedups if there is low memory on the computer, for instance the chaching technique is not used if there is not enough memory to support it, various function speedups are not used if not enough memory, this allows it to still run on the lower memory computer, just slower. Maybe if there are constantly larger memory requirments here, they could do something similir if the program allows it (with still usefull results) Team mauisun.org |
River~~ Send message Joined: 15 Dec 05 Posts: 761 Credit: 285,578 RAC: 0 |
Yes if the larger mem requirements become constant. If they apply to some WU and not others then letting the scheduler deal with it seems the best idea - keep the large jobs for machines that can run them without slowing down. I am not sure tho just how clever the scheduler is. If it is planning to give me task X then finds it is too big for my box, does the scheduler actually go on and look for one that fits? Or does it just say no, relying on my box to come back and ask again in five minutes? The latter strategy could make sense in that it would save a huge database search by the scheduler. R~~ |
dcdc Send message Joined: 3 Nov 05 Posts: 1831 Credit: 119,627,225 RAC: 11,586 |
wouldn't it reduce the memory requirement for multi-core/multi-cpu machines if they were all running different decoys on the same WU? |
dcdc Send message Joined: 3 Nov 05 Posts: 1831 Credit: 119,627,225 RAC: 11,586 |
If it is planning to give me task X then finds it is too big for my box, does the scheduler actually go on and look for one that fits? Or does it just say no, relying on my box to come back and ask again in five minutes? The latter strategy could make sense in that it would save a huge database search by the scheduler. If so it would make sense for BOINC to be changed to come back immediately with another request if the reason was 'not enough memory'. That'd be the dumb way to do it i guess. Of course the smart way would be requesting jobs based on the computer's average available memory. |
River~~ Send message Joined: 15 Dec 05 Posts: 761 Credit: 285,578 RAC: 0 |
If it is planning to give me task X then finds it is too big for my box, does the scheduler actually go on and look for one that fits? Or does it just say no, relying on my box to come back and ask again in five minutes? The latter strategy could make sense in that it would save a huge database search by the scheduler. At first sight yes, but on looking further, ooh er no. If there really were no more small jobs, then that would clutter up the bandwidth too much. Of course the smart way would be requesting jobs based on the computer's average available memory. To avoid a possible huge load on the database, the server would have to be changed so that it had a hit list of suitable jobs in different memory use bands in the shmem area (a shared memory area used (on at least some projects) to serve the next few wu quickly). If that were done, then you could also avoid giving small jobs to a large computer when there was a large job available. But what you do not want to do is to have to go back to the database to find a suitable sized job every time. Server to database requests are already the slowest link in the system. R~~ |
jaxom1 Send message Joined: 5 Jun 06 Posts: 180 Credit: 1,586,889 RAC: 0 |
I would like it if you could change a setting so my computers would look for LARGE/Memory Hog jobs first. I am running this on several servers that have 2-4GB of RAM, and my laptop has 2GB as well. I think it would be a good idea, so I could save the smaller requirement jobs for people that have less RAM. Just a thought... |
FluffyChicken Send message Joined: 1 Nov 05 Posts: 1260 Credit: 369,635 RAC: 0 |
I would like it if you could change a setting so my computers would look for LARGE/Memory Hog jobs first. I am running this on several servers that have 2-4GB of RAM, and my laptop has 2GB as well. I think it would be a good idea, so I could save the smaller requirement jobs for people that have less RAM. Nice to see you still here btw ;-) Team mauisun.org |
jaxom1 Send message Joined: 5 Jun 06 Posts: 180 Credit: 1,586,889 RAC: 0 |
took a month off. decided the science is more important than the bickering... |
Charles Dennett Send message Joined: 27 Sep 05 Posts: 102 Credit: 2,081,660 RAC: 513 |
One thing I've noticed about the high memory requirements of some workunits. I have a couple of old, low memory PCs. They will contact the server to get new work. The server will respond that the workunit requires more memory than the machine has. However, it will download a workunit or two. (Can't tell if it's the one with the high memory requirement or not.) Then, it backs off a full 24 hours. In other words, it will not contact the server again for another 24 hours. If I want it to report any completd workunits, I need to manually update. If it downloads a workunit, why is there a need to back off? If it backs off, why does it go for 24 hours? Why not 1 hour? (Also back after a short hiatus.) Charlie -Charlie |
River~~ Send message Joined: 15 Dec 05 Posts: 761 Credit: 285,578 RAC: 0 |
...The server will respond that the workunit requires more memory than the machine has. However, it will download a workunit or two. (Can't tell if it's the one with the high memory requirement or not.) Then, it backs off a full 24 hours. ... Thanks for pointing that out Charles. This explains something that has been puzzling me - I had not spotted the back off, but had spotted that my boxes have been unaccountably downloading WCG WU. (WCG has a resource share of a zillionth of a percent and is intended as a back up project only). Looking back at the messages tabs, C is right, it is caused by the 24hr backoff after the memory message. In the last two weeks WCG has gone from around 6000 in my stats to whatever you see below, and still rising. This is >2000 credits crunching that is lost to this project (tho no doubt welcome over the fence). River~~ |
dcdc Send message Joined: 3 Nov 05 Posts: 1831 Credit: 119,627,225 RAC: 11,586 |
i think this explains why some of my low-mem remotes are struggling atm too. Is there a straight-forward resolution available for this? |
Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0 |
Isn't the solution simply to have WUs available for both memory footprints? Perhaps increasing the queue size would help assure there are WUs available for both large and small memory work. ...or are you saying that the client doesn't realize the WU is for a large memory machine until it does a download, and then once it realizes this, it backs off 24hrs? (boy that would not be a good system... sounds like improvements are need to BOINC if that is the case). It would REALLY be bad if you were a dial-up user, and finished a 5MB download, only to find it has been built for a large memory machine, and you can't run it. Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ |
Charles Dennett Send message Joined: 27 Sep 05 Posts: 102 Credit: 2,081,660 RAC: 513 |
Isn't the solution simply to have WUs available for both memory footprints? Perhaps increasing the queue size would help assure there are WUs available for both large and small memory work. I'm not sure what it is doing. However, here's what I'm guessing it is probably doing...maybe. (Trying to be really wishy-washy here.) The client contacts the server for new work. The server grabs a WU from its queue and figures it needs more memory than the client has. It sends a message to the client telling it that this WU needs more memory. It does not send the WU to the client but instead puts it back into the queue. It then pulls the next one from the queue. If the memory requirements for that are ok, it sends it. However, since the first WU needed more memory than the client had, either the server tells the client to back off 24 hours or the client decides to do it for itself. Another scenario is that the server does indeed send the high memory WU to the client and then tells it to back off 24 hours. In either case, something seems wrong. If it is not sending the high memory WU, then why does the client need to back off 24 hours (or at all)? If no low memory WU were available, then either start an exponential backoff or simply back off for 1 hour. If it is is sending the high memory WU, then that seems wrong. Of course, I could be totally wrong. As I said, I'm just taking a guess. Charlie -Charlie |
River~~ Send message Joined: 15 Dec 05 Posts: 761 Credit: 285,578 RAC: 0 |
My guess is close but different. I think it grabs a WU, sends it if it is small enough and goes on to the next WU. As soon as it hits the first big WU I think it is giving up and at that point it tells the client to back off 24. Evidence for this is that sometimes I get work when I get the memory message and sometimes I don't. On Charles's guess work would always come. On my guess, the fact that work sometimes comes is explained by the small work being earlier in the queue on the server, and the fact that work sometimes doesn't come by the times when the very next WU is the big one. R~~ |
Charles Dennett Send message Joined: 27 Sep 05 Posts: 102 Credit: 2,081,660 RAC: 513 |
Well, your analysis certainly makes sense based on your observations. Good catch. Charlie -Charlie |
FluffyChicken Send message Joined: 1 Nov 05 Posts: 1260 Credit: 369,635 RAC: 0 |
River I went to see what BOINC version you where running, but you have a large amount of slow computers :-D Could you put 5.6.5/5.6.4 on there to see if it still has the same problem ? Team mauisun.org |
River~~ Send message Joined: 15 Dec 05 Posts: 761 Credit: 285,578 RAC: 0 |
River I went to see what BOINC version you where running, but you have a large amount of slow computers :-D Slow? Three of them are almost 0.9GHz. Each :-) For more details see this LHC posting their Boinc Farms thread. Mostly running either 5.4.11 (win) or 5.4.9 (linux). A couple still on 5.2.x where I seem to have missed those boxes out of my last upgrade. Must have got distracted. Could you put 5.6.5/5.6.4 on there to see if it still has the same problem ? will do later this week - it takes a while to update 11 boxes. ... and the issue is intermittent so will take a few days to know anyway. R~~ |
FluffyChicken Send message Joined: 1 Nov 05 Posts: 1260 Credit: 369,635 RAC: 0 |
River, if any of them would take a uFCPGA [CPUz identifies it worng as socket BGA2] (never konw, they are small form factor. It a laptop socket.) I may be able to get you a large super duper fast upgrade to a whole 1000MHz. Team mauisun.org |
Message boards :
Number crunching :
Memory requirements
©2024 University of Washington
https://www.bakerlab.org