Message boards : Number crunching : scheduling and target time working correctly?
Author | Message |
---|---|
Insidious Send message Joined: 10 Nov 05 Posts: 49 Credit: 604,937 RAC: 0 |
On one of my computers (AMD X2) I am sharing 100:100 Rosetta with BBC-CCE (which has something on the order of an 8 month completion time per WU) I am having to reduce my Rosetta target time down from the default to keep BOINC out of Earliest Deadline Mode which prevents the BBC project from running. Would it be possible for Rosetta to adjust their deadlines (or something) to allow me to run Rosetta optimally (default time) without ruining my wishes to run two projects on my X2? I also had to delete over 60 WUs (yesterday) of which ALL had only one day left to their deadline on a machine crunching Rosetta and SIMAP (another X2 machine). Those work units were the 4.81 flavor. While an older 20 minute unit was crunching, I got a cache of nearly 75 units which each were taking >4 hours and totally fouled up the works... not allowing SIMAP to crunch. -Sid Proudly crunching with TeAm Anandtech |
Insidious Send message Joined: 10 Nov 05 Posts: 49 Credit: 604,937 RAC: 0 |
Screen shot of my queue sorry, I forgot to put this in my first post -Sid Proudly crunching with TeAm Anandtech |
Snake Doctor Send message Joined: 17 Sep 05 Posts: 182 Credit: 6,401,938 RAC: 0 |
On one of my computers (AMD X2) I am sharing 100:100 Rosetta with BBC-CCE (which has something on the order of an 8 month completion time per WU) you can adjust the run length of the WUS yourself. If you read the FAQs list it will tell you how. but you can make them all run in 2 hours or less depending on the type of WU theyare. Reagrds Phil We Must look for intelligent life on other planets as, it is becoming increasingly apparent we will not find any on our own. |
Insidious Send message Joined: 10 Nov 05 Posts: 49 Credit: 604,937 RAC: 0 |
... As I mentioned, I am already doing that. My question is that I would like to let them run at their defaults for better science, but I can not because I am being overloaded by BOINC/Rosetta scheduling. -Sid Proudly crunching with TeAm Anandtech |
Moderator9 Volunteer moderator Send message Joined: 22 Jan 06 Posts: 1014 Credit: 0 RAC: 0 |
... Please see this post reguarding run times for Rosetta Work Units. Moderator9 ROSETTA@home FAQ Moderator Contact |
Insidious Send message Joined: 10 Nov 05 Posts: 49 Credit: 604,937 RAC: 0 |
... Thank you! I have just lowered my run times from 4 hours to 2 hours per that thread. I would like to meniton that the 4 hour target time was being ignored. (still seeing 8-10 hour completion times in BoincView (2.5GHz) I will try detatching and re-attaching to clear my queues and see if that helps -Sid EDIT: I don't think you are really understanding my issue. Rosetta is downloading more work units than it can complete by the deadline without disabling the other project that works on this machine (it is dual core). Proudly crunching with TeAm Anandtech |
uioped1 Send message Joined: 9 Feb 06 Posts: 15 Credit: 1,058,481 RAC: 0 |
...
I was also having this problem. The error comes because the 'base' workunit time is created when the workunit is created, not when you download it and apply your preferences to it. So, the first time you download a workunit, the scheduler sees that last time you ran a workunit of this type, it ran in X seconds, so it assumes that you will continue to do so when calculating how many to give you. It doesn't take very long for the system to figure out that something's changed and the workunits are now going to take a lot longer to complete. On my system, it was just two WUs. So, if you let your system run in EDF mode for a while, it will get straightened out. If you're worried that some of the WUs downloaded are going to go over deadline, you can temporarily set your proc time down to two hours, and only set it back when you're about to start the last two or three of your downloaded WUs. Or just abort some of them. Hope that helps. [edit] all this is my best guess at reverse-engineering how the system works, so some of the specific details might be wrong. Please correct those bits, someone.[/edit] |
BennyRop Send message Joined: 17 Dec 05 Posts: 555 Credit: 140,800 RAC: 0 |
One of the suggestions made about reducing bandwidth consumption was to be able to request certain amount of work, send your machine's RAC, and be given that amount of work to perform. Unfortunately, BOINC isn't setup (yet) to send useful information back. (But your machine's RAC is stored on the server, so I don't follow that it's a problem with BOINC.) Currently, we ask the server for a certain amount of seconds of work (default is currently 8 hours worth). We're supposedly sent enough of these work units to fill up our WU cache. Then it starts plugging along at each WU. Each WU takes a different amount of time to create one model; little ones take 15-30 minutes to create one model - the big ones can take 8-12 hours to create a single model on some "typical" machine. For the little guys.. it'll keep creating models until it gets near the 8 hour WU mark and decides it can't finish another model in the time left. (If you've set that to 2, 4, 6, 12, 24, etc hours, then that's the limit that it works up to.) For the big guys.. it'll create one model no matter what your setting. If you're set for 2 hours maximum cpu time, and it takes 12 hours to create the first model, then it takes 12 hours and creates the first model. If I was you, I'd leave the 8 hour max cpu time at default, and reduce the cache to 8 to 12 hours.. and it'll run through those you've already got (and take a few days) and only keep one spare when the system finally balances out for you, and then move the cache back up to it's current setting. |
Moderator9 Volunteer moderator Send message Joined: 22 Jan 06 Posts: 1014 Credit: 0 RAC: 0 |
|
Rebel Alliance Send message Joined: 4 Nov 05 Posts: 50 Credit: 3,579,531 RAC: 0 |
I got one sent to me on the same day as the deadline. I've spend some three hours screwing around removing dead work units from machines. 24 Feb 2006 8:19:12 UTC 24 Feb 2006 19:06:42 UTC |
Insidious Send message Joined: 10 Nov 05 Posts: 49 Credit: 604,937 RAC: 0 |
Thank you all for the great information. I guess I'm not the only one dealing with screwy scheduling... interesting solutions you have shared here. My solution was much less elegant, but it worked. I detatched Rosetta from the machines that shared another project. Then re-attached with Rosetta being the second project installed and that seemed to curb the schedulers to reasonable levels and now I am happily sharing (one project on each core.. as it should be) I would like to ask the Rosetta team to consider the effects of having such wildly different crunching times on the various work units and understand how it affects caches. My problems came when version 4.81 workunits that take ~20 minutes were crunching, but I was being loaded up with newer units that take ~8 hours to crunch... but I was getting a quantity of them that they would also have to be done in ~20 minutes each to have a sensible cache size. This is why detatching solved my problem... no 20 minute WUs to confuse the issue. -Sid Proudly crunching with TeAm Anandtech |
Moderator9 Volunteer moderator Send message Joined: 22 Jan 06 Posts: 1014 Credit: 0 RAC: 0 |
This is very close to what is happening. The olny detail that changes it is that the BOINC system only makes adjustments at the project level, not the individual WU level. By that I mean it will adjust each time a WU runs but the adjustment is just up or down for that entire project. It does not know or care what type of WU is running to make the change and it does not keep account of the time for various types of WU. The adjustments are small, and almost "trend" based in nature. By that I mean BOINC will not make a large adjustment all at once based on a single WU run. That is why it has to run a number of WUs to make the adjustment. Your idea to make a FAQ and/or sticky about this is a good one, and I have asked the Project team to review the text of just such a posting to be certain they are happy with it. As you may know David Baker has asked people who are having a lot of errors to set the time adjustment to 2 hours to reduce their problem. The advice in the information you and I have posted does not take that instruction into account. But I think his instruction is temporary, and what we are trying to put togeather is a more long term set of explanations. Thank you for your time and contributions to improving the information. Moderator9 ROSETTA@home FAQ Moderator Contact |
Message boards :
Number crunching :
scheduling and target time working correctly?
©2024 University of Washington
https://www.bakerlab.org