scheduling and target time working correctly?

Message boards : Number crunching : scheduling and target time working correctly?

To post messages, you must log in.

AuthorMessage
Insidious
Avatar

Send message
Joined: 10 Nov 05
Posts: 49
Credit: 604,937
RAC: 0
Message 11284 - Posted: 24 Feb 2006, 2:38:12 UTC

On one of my computers (AMD X2) I am sharing 100:100 Rosetta with BBC-CCE (which has something on the order of an 8 month completion time per WU)

I am having to reduce my Rosetta target time down from the default to keep BOINC out of Earliest Deadline Mode which prevents the BBC project from running.

Would it be possible for Rosetta to adjust their deadlines (or something) to allow me to run Rosetta optimally (default time) without ruining my wishes to run two projects on my X2?

I also had to delete over 60 WUs (yesterday) of which ALL had only one day left to their deadline on a machine crunching Rosetta and SIMAP (another X2 machine). Those work units were the 4.81 flavor. While an older 20 minute unit was crunching, I got a cache of nearly 75 units which each were taking >4 hours and totally fouled up the works... not allowing SIMAP to crunch.

-Sid
Proudly crunching with TeAm Anandtech
ID: 11284 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Insidious
Avatar

Send message
Joined: 10 Nov 05
Posts: 49
Credit: 604,937
RAC: 0
Message 11285 - Posted: 24 Feb 2006, 2:51:01 UTC - in response to Message 11284.  

Screen shot of my queue

sorry, I forgot to put this in my first post

-Sid
Proudly crunching with TeAm Anandtech
ID: 11285 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Snake Doctor
Avatar

Send message
Joined: 17 Sep 05
Posts: 182
Credit: 6,401,938
RAC: 0
Message 11290 - Posted: 24 Feb 2006, 5:02:47 UTC - in response to Message 11284.  

On one of my computers (AMD X2) I am sharing 100:100 Rosetta with BBC-CCE (which has something on the order of an 8 month completion time per WU)

I am having to reduce my Rosetta target time down from the default to keep BOINC out of Earliest Deadline Mode which prevents the BBC project from running.

Would it be possible for Rosetta to adjust their deadlines (or something) to allow me to run Rosetta optimally (default time) without ruining my wishes to run two projects on my X2?

I also had to delete over 60 WUs (yesterday) of which ALL had only one day left to their deadline on a machine crunching Rosetta and SIMAP (another X2 machine). Those work units were the 4.81 flavor. While an older 20 minute unit was crunching, I got a cache of nearly 75 units which each were taking >4 hours and totally fouled up the works... not allowing SIMAP to crunch.

-Sid


you can adjust the run length of the WUS yourself. If you read the FAQs list it will tell you how. but you can make them all run in 2 hours or less depending on the type of WU theyare.

Reagrds
Phil


We Must look for intelligent life on other planets as,
it is becoming increasingly apparent we will not find any on our own.
ID: 11290 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Insidious
Avatar

Send message
Joined: 10 Nov 05
Posts: 49
Credit: 604,937
RAC: 0
Message 11305 - Posted: 24 Feb 2006, 10:19:16 UTC - in response to Message 11290.  

...
I am having to reduce my Rosetta target time down from the default to keep BOINC out of Earliest Deadline Mode which prevents the BBC project from running.

Would it be possible for Rosetta to adjust their deadlines (or something) to allow me to run Rosetta optimally (default time) without ruining my wishes to run two projects on my X2?

...


you can adjust the run length of the WUS yourself. If you read the FAQs list it will tell you how. but you can make them all run in 2 hours or less depending on the type of WU theyare.

Reagrds
Phil


As I mentioned, I am already doing that. My question is that I would like to let them run at their defaults for better science, but I can not because I am being overloaded by BOINC/Rosetta scheduling.

-Sid

Proudly crunching with TeAm Anandtech
ID: 11305 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Moderator9
Volunteer moderator

Send message
Joined: 22 Jan 06
Posts: 1014
Credit: 0
RAC: 0
Message 11335 - Posted: 24 Feb 2006, 16:08:25 UTC - in response to Message 11305.  
Last modified: 24 Feb 2006, 20:04:42 UTC

...
As I mentioned, I am already doing that. My question is that I would like to let them run at their defaults for better science, but I can not because I am being overloaded by BOINC/Rosetta scheduling.

-Sid



Please see this post reguarding run times for Rosetta Work Units.

Moderator9
ROSETTA@home FAQ
Moderator Contact
ID: 11335 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Insidious
Avatar

Send message
Joined: 10 Nov 05
Posts: 49
Credit: 604,937
RAC: 0
Message 11339 - Posted: 24 Feb 2006, 16:28:49 UTC - in response to Message 11335.  
Last modified: 24 Feb 2006, 16:55:15 UTC

...
As I mentioned, I am already doing that. My question is that I would like to let them run at their defaults for better science, but I can not because I am being overloaded by BOINC/Rosetta scheduling.

-Sid



Please see this uurl=https://boinc.bakerlab.org/rosetta/forum_thread.php?id=669#11193]post[/url] reguarding run times for Rosetta Work Units.


Thank you!

I have just lowered my run times from 4 hours to 2 hours per that thread.

I would like to meniton that the 4 hour target time was being ignored. (still seeing 8-10 hour completion times in BoincView (2.5GHz)

I will try detatching and re-attaching to clear my queues and see if that helps

-Sid

EDIT: I don't think you are really understanding my issue. Rosetta is downloading more work units than it can complete by the deadline without disabling the other project that works on this machine (it is dual core).
Proudly crunching with TeAm Anandtech
ID: 11339 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
uioped1
Avatar

Send message
Joined: 9 Feb 06
Posts: 15
Credit: 1,058,481
RAC: 0
Message 11344 - Posted: 24 Feb 2006, 19:12:31 UTC - in response to Message 11339.  
Last modified: 24 Feb 2006, 19:14:03 UTC

...
As I mentioned, I am already doing that. My question is that I would like to let them run at their defaults for better science, but I can not because I am being overloaded by BOINC/Rosetta scheduling.



EDIT: I don't think you are really understanding my issue. Rosetta is downloading more work units than it can complete by the deadline without disabling the other project that works on this machine (it is dual core).


I was also having this problem. The error comes because the 'base' workunit time is created when the workunit is created, not when you download it and apply your preferences to it. So, the first time you download a workunit, the scheduler sees that last time you ran a workunit of this type, it ran in X seconds, so it assumes that you will continue to do so when calculating how many to give you.

It doesn't take very long for the system to figure out that something's changed and the workunits are now going to take a lot longer to complete. On my system, it was just two WUs. So, if you let your system run in EDF mode for a while, it will get straightened out.

If you're worried that some of the WUs downloaded are going to go over deadline, you can temporarily set your proc time down to two hours, and only set it back when you're about to start the last two or three of your downloaded WUs. Or just abort some of them.

Hope that helps.

[edit] all this is my best guess at reverse-engineering how the system works, so some of the specific details might be wrong. Please correct those bits, someone.[/edit]
ID: 11344 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
BennyRop

Send message
Joined: 17 Dec 05
Posts: 555
Credit: 140,800
RAC: 0
Message 11354 - Posted: 24 Feb 2006, 23:49:06 UTC

One of the suggestions made about reducing bandwidth consumption was to be able to request certain amount of work, send your machine's RAC, and be given that amount of work to perform. Unfortunately, BOINC isn't setup (yet) to send useful information back. (But your machine's RAC is stored on the server, so I don't follow that it's a problem with BOINC.)
Currently, we ask the server for a certain amount of seconds of work (default is currently 8 hours worth). We're supposedly sent enough of these work units to fill up our WU cache. Then it starts plugging along at each WU.

Each WU takes a different amount of time to create one model; little ones take 15-30 minutes to create one model - the big ones can take 8-12 hours to create a single model on some "typical" machine.

For the little guys.. it'll keep creating models until it gets near the 8 hour WU mark and decides it can't finish another model in the time left. (If you've set that to 2, 4, 6, 12, 24, etc hours, then that's the limit that it works up to.)

For the big guys.. it'll create one model no matter what your setting. If you're set for 2 hours maximum cpu time, and it takes 12 hours to create the first model, then it takes 12 hours and creates the first model.

If I was you, I'd leave the 8 hour max cpu time at default, and reduce the cache to 8 to 12 hours.. and it'll run through those you've already got (and take a few days) and only keep one spare when the system finally balances out for you, and then move the cache back up to it's current setting.


ID: 11354 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Moderator9
Volunteer moderator

Send message
Joined: 22 Jan 06
Posts: 1014
Credit: 0
RAC: 0
Message 11356 - Posted: 25 Feb 2006, 0:23:52 UTC
Last modified: 25 Feb 2006, 0:26:18 UTC



See if this post helps you at all.

Moderator9
ROSETTA@home FAQ
Moderator Contact
ID: 11356 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rebel Alliance

Send message
Joined: 4 Nov 05
Posts: 50
Credit: 3,579,531
RAC: 0
Message 11357 - Posted: 25 Feb 2006, 0:24:32 UTC

I got one sent to me on the same day as the deadline. I've spend some three hours screwing around removing dead work units from machines.
24 Feb 2006 8:19:12 UTC 24 Feb 2006 19:06:42 UTC
ID: 11357 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Insidious
Avatar

Send message
Joined: 10 Nov 05
Posts: 49
Credit: 604,937
RAC: 0
Message 11365 - Posted: 25 Feb 2006, 3:40:22 UTC

Thank you all for the great information. I guess I'm not the only one dealing with screwy scheduling... interesting solutions you have shared here.

My solution was much less elegant, but it worked. I detatched Rosetta from the machines that shared another project. Then re-attached with Rosetta being the second project installed and that seemed to curb the schedulers to reasonable levels and now I am happily sharing (one project on each core.. as it should be)

I would like to ask the Rosetta team to consider the effects of having such wildly different crunching times on the various work units and understand how it affects caches.
My problems came when version 4.81 workunits that take ~20 minutes were crunching, but I was being loaded up with newer units that take ~8 hours to crunch... but I was getting a quantity of them that they would also have to be done in ~20 minutes each to have a sensible cache size. This is why detatching solved my problem... no 20 minute WUs to confuse the issue.

-Sid
Proudly crunching with TeAm Anandtech
ID: 11365 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Moderator9
Volunteer moderator

Send message
Joined: 22 Jan 06
Posts: 1014
Credit: 0
RAC: 0
Message 11394 - Posted: 25 Feb 2006, 19:29:20 UTC - in response to Message 11344.  


I was also having this problem. The error comes because the 'base' workunit time is created when the workunit is created, not when you download it and apply your preferences to it. So, the first time you download a workunit, the scheduler sees that last time you ran a workunit of this type, it ran in X seconds, so it assumes that you will continue to do so when calculating how many to give you.

It doesn't take very long for the system to figure out that something's changed and the workunits are now going to take a lot longer to complete. On my system, it was just two WUs. So, if you let your system run in EDF mode for a while, it will get straightened out.

If you're worried that some of the WUs downloaded are going to go over deadline, you can temporarily set your proc time down to two hours, and only set it back when you're about to start the last two or three of your downloaded WUs. Or just abort some of them.

Hope that helps.

all this is my best guess at reverse-engineering how the system works, so some of the specific details might be wrong. Please correct those bits, someone.


This is very close to what is happening. The olny detail that changes it is that the BOINC system only makes adjustments at the project level, not the individual WU level. By that I mean it will adjust each time a WU runs but the adjustment is just up or down for that entire project. It does not know or care what type of WU is running to make the change and it does not keep account of the time for various types of WU. The adjustments are small, and almost "trend" based in nature. By that I mean BOINC will not make a large adjustment all at once based on a single WU run. That is why it has to run a number of WUs to make the adjustment.

Your idea to make a FAQ and/or sticky about this is a good one, and I have asked the Project team to review the text of just such a posting to be certain they are happy with it. As you may know David Baker has asked people who are having a lot of errors to set the time adjustment to 2 hours to reduce their problem.

The advice in the information you and I have posted does not take that instruction into account. But I think his instruction is temporary, and what we are trying to put togeather is a more long term set of explanations.

Thank you for your time and contributions to improving the information.


Moderator9
ROSETTA@home FAQ
Moderator Contact
ID: 11394 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : scheduling and target time working correctly?



©2024 University of Washington
https://www.bakerlab.org