Message boards : Number crunching : Any objections to reducing the maximum run time to 12-16 hours?
Author | Message |
---|---|
David Baker Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 17 Sep 05 Posts: 705 Credit: 559,847 RAC: 0 |
Rom is hot on the trail of the remaining bugs, and I'm optimistic that he will make rapid progress. In the meantime, I see from the stuck at 1% thread that some people are having processes stuck for a very long time which is very annoying. We can solve this problem temporarily by reducing the maximum run time. We set this quite high at the time we introduced the user adjustible run time setting so that dial up users could have very long runs, but inadvertently exacerbated the 1% problem. Rom recommends we reduce the maximum time to 12 hours or so. The only drawback is for dialup users who want very long work units. Rom suggests they load up on work units each time they log in, and with 12 hour work units this could last for a while. So the question is: are there any objections to reducing the maximum time to 12 hours. Unfortunately, this is a work unit level parameter, and cannot be changed by the user. |
Darren Send message Joined: 6 Oct 05 Posts: 27 Credit: 43,535 RAC: 0 |
So the question is: are there any objections to reducing the maximum time to 12 hours. Unfortunately, this is a work unit level parameter, and cannot be changed by the user. I'm one of those (probably few) who use long runtimes. I would just ask that you keep us updated as to the time frame - specifically, let us know when the option for long runtimes is back. My preference for it is a little different from that of the dial-up users in that my internet connection is by cellular modem. The amount of data transfered isn't a problem, but rosetta for some reason gives me much more grief than other projects when there are gaps in the data flow. If a particular file times out on other projects it starts right back up 60 seconds later and the work unit runs fine after it all finally gets here. When any individual file in the download times out with rosetta, the download always starts back up but once it all gets here the work unit immediately reports a download error. This requires me to "babysit" the connection and manually suspend network access during any data flow gaps while rosetta is downloading work. Oddly, if I manually suspend it, it picks right back up without causing an error - but if I let it timeout and resume it doesn't work. With that, I'll probably just suspend rosetta unless I know I'll be home long enough to babysit the connection through a few downloads - thus my request that you be sure to let us know when the option is available again. Thanks. |
BennyRop Send message Joined: 17 Dec 05 Posts: 555 Credit: 140,800 RAC: 0 |
If it will help lessen the hardship of those suffering the stuck at 1% problems and their loss of cpu time, then I'd be willing to switch down to only 12-16 hour max cpu time. Unfortunately, I no longer need low bandwidth usage; so my vote doesn't really count. |
Nite Owl Send message Joined: 2 Nov 05 Posts: 87 Credit: 3,019,449 RAC: 0 |
|
Johnathon Send message Joined: 5 Nov 05 Posts: 120 Credit: 138,226 RAC: 0 |
I'm running 1 machine on dial up, and I've set it to 48hr runtimes - it is just so much easier for me, uploading 1 job, downloading 1 job every 2 days, instead of baby sitting the machine daily for each network run, daily. That machine would probably just get shut down permenantly again if the run times go down. |
Marky-UK Send message Joined: 1 Nov 05 Posts: 73 Credit: 1,689,495 RAC: 0 |
I'm a little confused - how is reducing the maximum run time going to help with the 1% problem? I have my maximum run time set to 2 hours, but the WUs that get stuck on 1% go way past this - sometimes several days past. So it's fairly evident that stuck WUs are ignoring the run time setting anyway, so changing the maximum possible setting isn't going to make much difference. If you're adding extra code to catch stuck WUs that's great. |
nozi Send message Joined: 15 Nov 05 Posts: 11 Credit: 566,793 RAC: 116 |
I would think a limit of 24 hours may be good enough. My setting is 4 hours but since i am not capable of reacching every machine every day the 1 % bug took up to 3 days . But upload / download never was a problem for me. 12 or 16 hours seems too short for some. The only really good solution would still be to eleminate the bug . In this way you can on one hand loose several hours cpu time from all users . With the other way you force several completely out of rosetta. So it is your decision to find a sufficient compromise. IMHO i would prefer not to loose anyone comletely. Communication Basic No.1 : Freedom is always the Freedom to dissent. |
Moderator9 Volunteer moderator Send message Joined: 22 Jan 06 Posts: 1014 Credit: 0 RAC: 0 |
I'm a little confused - how is reducing the maximum run time going to help with the 1% problem? I have my maximum run time set to 2 hours, but the WUs that get stuck on 1% go way past this - sometimes several days past. So it's fairly evident that stuck WUs are ignoring the run time setting anyway, so changing the maximum possible setting isn't going to make much difference. The user adjustable time setting and the Max run time setting are separate. If the Max run time is set to a lower value by the project it will override any user setting for time. When the WU hits the Max time set by the project it will abort. So if a WU gets stuck, the Max time setting will cause it to abort automatically when it hits 12 hours. Moderator9 ROSETTA@home FAQ Moderator Contact |
David Baker Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 17 Sep 05 Posts: 705 Credit: 559,847 RAC: 0 |
Based on the responses below, I think we shouild set the maximum time to 24 hours. How many people would this be a hardship for? |
Nite Owl Send message Joined: 2 Nov 05 Posts: 87 Credit: 3,019,449 RAC: 0 |
Works for me! :thumbsup: |
uioped1 Send message Joined: 9 Feb 06 Posts: 15 Credit: 1,058,481 RAC: 0 |
I'm running 1 machine on dial up, and I've set it to 48hr runtimes - it is just so much easier for me, uploading 1 job, downloading 1 job every 2 days, instead of baby sitting the machine daily for each network run, daily. That machine would probably just get shut down permenantly again if the run times go down. If you set your cache size to connect every two days, your boinc client should download enough work for 2 days, regardless if your runtime is set to 48 hours or 12... (of course this is after your machine has adjusted to the new runtime of your workunits after you make changes.) |
Johnathon Send message Joined: 5 Nov 05 Posts: 120 Credit: 138,226 RAC: 0 |
It means I've got to sit thru 8mb or more downloads instead of 4mb, if i update 1nce every 2 days. If its better for bakerlab, then I dont mind too much the 24hr, but for me it would be easier @ 48hr. |
David Baker Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 17 Sep 05 Posts: 705 Credit: 559,847 RAC: 0 |
It means I've got to sit thru 8mb or more downloads instead of 4mb, if i update 1nce every 2 days. If its better for bakerlab, then I dont mind too much the 24hr, but for me it would be easier @ 48hr. For us it really doesn't matter--our Science depends only on the overall throughput. but we thought this would be better until we have the 1% problem solved so people don't have to babysit machines |
Message boards :
Number crunching :
Any objections to reducing the maximum run time to 12-16 hours?
©2024 University of Washington
https://www.bakerlab.org