Credits Granted

Message boards : Number crunching : Credits Granted

To post messages, you must log in.

Previous · 1 · 2 · 3

AuthorMessage
Deamiter

Send message
Joined: 9 Nov 05
Posts: 26
Credit: 3,793,650
RAC: 0
Message 9056 - Posted: 15 Jan 2006, 1:04:21 UTC - in response to Message 9053.  
Last modified: 15 Jan 2006, 1:07:13 UTC


Beside that, Rosetta uses the Boinc platform where the utilization of an optimized client is quiet common (i.e. Seti). So Rosetta should meet those multi_project requirements. It should be very odd to ask people to change their Boinc clients for every other project, isn't it?

It would indeed be very odd for them to ask people to keep changing their Boinc clients. That's why they're not. I don't think it's particularly unreasonable for a project in development to assume that users are running the recommended version of the client. After all, there's a REASON it's recommended. If clients designed to inflate stats by increasing benchmarks WITHOUT speeding up the app (the Rosetta app at least) cause problems... I have very little sympathy for those with the "problem."

Note that I do acknowledge that there is a problem with the max time exceeded. It's just that I too have noticed that many (not all) of the problems have been on "optimized" clients. These people running the optimized client with projects that don't have an optimized application are ADDING to the problem.
ID: 9056 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Snake Doctor
Avatar

Send message
Joined: 17 Sep 05
Posts: 182
Credit: 6,401,938
RAC: 0
Message 9067 - Posted: 15 Jan 2006, 4:16:01 UTC - in response to Message 9056.  


Note that I do acknowledge that there is a problem with the max time exceeded. It's just that I too have noticed that many (not all) of the problems have been on "optimized" clients. These people running the optimized client with projects that don't have an optimized application are ADDING to the problem.


My G4 dual, and Powerbook G4 are both running the recommended BOINC 5.2.13. If left alone (without adjusting the DCF) they will error about 20% of the WUs on Max time exceeded. While optimized clients might make the problem worse they are not the source of the problem. The problem is WUs that can vary in size by 900% from the smallest to the largest. BOINC was never designed to accommodate that kind of variation. Now if that kind of variety is required to do the science then someone should talk with the BOINC developers so they can build in that sort of range.

My systems work P@H, R@H, E@H, Climate, and SETI. Of these I would personally rate R@H, P@H, and climate (in no particular order) as the most important. While it is fun to look for ET and try to prove Einstein right or wrong, clearly the near term real world potential of the medical and climate projects for saving lives is more important. But since all of this started with SETI, BOINC is largely slanted in that direction. The BOINC software should allow for project specific settings to allow for things like WU variation. You should also be able to set your preferences for each project individually without affecting the others.

I am certain that these things will come in due course, but until then all of us will have to deal with issues like Max time errors. With a little luck the R@H team will find a way to patch the problem until then. After all they are a pretty smart bunch.

Regards
Phil


We Must look for intelligent life on other planets as,
it is becoming increasingly apparent we will not find any on our own.
ID: 9067 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile nasher

Send message
Joined: 5 Nov 05
Posts: 98
Credit: 618,288
RAC: 0
Message 9069 - Posted: 15 Jan 2006, 5:14:21 UTC

is there a way that Rosetta can trick Boinc to think its work units will automaticaly take say 10x longer or 20X longer than what users have completed of similar WU's ... wouldnt that stop the problem or would that make it worse

Unfortunatly I dont really understand why it errors out cause of how fast your previous work units completed.

something definatly has to be done on the Boinc side of the house to support work units that are designed to run short and run long on the same project
ID: 9069 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Snake Doctor
Avatar

Send message
Joined: 17 Sep 05
Posts: 182
Credit: 6,401,938
RAC: 0
Message 9072 - Posted: 15 Jan 2006, 7:16:53 UTC - in response to Message 9069.  

is there a way that Rosetta can trick Boinc to think its work units will automaticaly take say 10x longer or 20X longer than what users have completed of similar WU's ... wouldnt that stop the problem or would that make it worse

Unfortunatly I dont really understand why it errors out cause of how fast your previous work units completed.

something definatly has to be done on the Boinc side of the house to support work units that are designed to run short and run long on the same project


Basically the answer to your question is yes. The problem is that BOINC calculates how much CPU time is available on your system by looking at how long it thinks the work you have loaded will take. If you make the WUs look bigger than they actually are, then the system will not download new work until you have processed enough work to free up some processing time. Many project participants like to download a lot of work all at once and process it over a few days time and this kind of setting would interfere with those people. This is particularly true of the "farmers" who have a lot of systems working on the project that can run for days unattended. It also would affect downloading for other projects running on the same system as there would be insufficient CPU time available for them to request new work.

Also tricking the system for one project can affect other projects by monopolizing the system time. This happens if the project forces your system into a processing debt. Under those conditions the BOINC manager will only work on the projects with the shortest deadlines and most work to do.

Those of us that have made adjustments to the DCF to get around the Max time issue are doing the same thing you are suggesting manually. I have my system adjusted to provide for as much as 20 hours for any R@H WU the system might get. But in fact most of them only run 4 or 5 hours at most. But I will get at least one every day that takes a lot longer. I had one that actually ran the full 20 hours just yesterday. In my case I am content to let the system run under those conditions, but many people are not. Also the setting is not permanent because the system can change it as it runs. Although I have never see it adjust to allow MORE time, just less.

The reason the system errors can increase if you get shorter WUs is because the system will dynamically adjust based on actual process times to reflect an estimated time to complete a WU. As it does this if you get a few short ones in a row, followed by a long one the system will have adjusted itself to all the short ones to the point that the long one exceeds the adjusted expected run length.

Regards
Phil


We Must look for intelligent life on other planets as,
it is becoming increasingly apparent we will not find any on our own.
ID: 9072 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Los Alcoholicos~La Muis

Send message
Joined: 4 Nov 05
Posts: 34
Credit: 1,041,724
RAC: 0
Message 9183 - Posted: 17 Jan 2006, 7:46:28 UTC - in response to Message 9040.  
Last modified: 17 Jan 2006, 8:19:44 UTC


After another "maximum cpu time exceeded" error I suspend the networkactivities on a dual G5 2GHz 2,5GB ram (boinc 5.2.13 no other projects)

I have the following queue of results:

cpu-time - status

12:38:43 - maximum cpu time exceeded
02:32:24 - finished
03:47:37 - finished
03:12:34 - finished
12:38:43 - maximum cpu time exceeded
08:49:47 - finished
06:19:48 - finished
03:46:29 - finished
07:42:59 - finished
03:17:35 - finished
06:08:48 - finished
05:53:45 - finished
03:05:26 - finished
02:28:05 - finished
02:22:14 - finished
12:38:45 - maximum cpu time exceeded
06:47:13 - finished
04:38:56 - finished
08:04:36 - finished
04:44:13 - finished
03:14:39 - finished
01:41:28 - finished
08:02:37 - finished
06:02:18 - finished
04:58:23 - finished

So far I didn't keep track of the variations in the estimated_time (at the moment: 07:44:12)

Although there is a sequence of 3 short wu's before an error I don't think that that's the real cause. As you can see some wu's take just too much time to finish (one was at 80%, the other at 90% when they errored out). And I can't recall seeing an estimated_time on this machine greater then 12:00:00.
Unless the max_cpu_time is increased these wu's will never finish.

06:17:23 - finished
04:47:58 - finished
09:15:10 - finished
07:10:31 - finished
07:10:52 - finished
12:05:23 - finished
04:55:55 - finished
04:37:23 - finished
07:31:24 - finished
07:00:22 - finished
07:41:25 - finished
12:24:32 - finished
04:40:39 - finished
12:34:46 - maximum cpu time exceeded (90%)
11:30:13 - 80,00%
02:57:45 - 30,00%

Even without short wu's before a long wu it ends with a maximum_cpu_time_exceeded error.
One way or the other it seams impossible to finish a wu properly if it would take more then 12:34:00 on this machine?

Is it of any use to the project just to let those long wu's end themself with an error? Or should I abort them when I suspect they will take too long?
ID: 9183 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3

Message boards : Number crunching : Credits Granted



©2024 University of Washington
https://www.bakerlab.org