Message boards : Number crunching : Credit posting screwiness
Previous · 1 · 2 · 3 · Next
Author | Message |
---|---|
Ingleside Send message Joined: 25 Sep 05 Posts: 107 Credit: 1,514,472 RAC: 0 |
I actually knew the 'not quite true on the loading' when I posted it If you've selected "ask before connecting" and BOINC has detected you're using a modem so really asks, atleast the checkin-notes indicates the client should also report anything before disconnecting... But, since (thankfully) isn't using dialup, can't check if it really works this way... But, if you're using "ask before connecting", you must also manually trigger this connection, so there shouldn't be a problem for user at the end to select project(s) and manually hit "update"... If you lets BOINC auto-dial on it's own accord, would guess it's just like for permanently-connected computers... But, if you lets BOINC auto-dial, it would try to connect after each result finished, and would also normally ask for more work, so shouldn't lose more than 1-2 finished results if you crashes-out... Even if you're multi-project, if you've managed to finish 20 results chances are the oldest is close to "report if N days since result finished", so again would be automatically reported... |
River~~ Send message Joined: 15 Dec 05 Posts: 761 Credit: 285,578 RAC: 0 |
In my experience (Ilast tested this with client v4.19) if you have 20 results in the cache they tend to get reported in batches of around 10. You stand to lose from 0 to 10 results from a network outage that takes you past deadlines. |
River~~ Send message Joined: 15 Dec 05 Posts: 761 Credit: 285,578 RAC: 0 |
IF, there is not a quorum of results, and you are "late" reporting, the work will be re-issued automatically. thus causing unnecessary crunching if the result has been uploaded but is unreported
in which case the box that crunched the duplicate is wasting cycles that could be better spent on another WU. However you describe it, if a missed deadline means that 5 boxes crunch a WU instead of four, that has wasted one boxes time. OK, so you may get credit but the same amount of science now takes up 5 blocks of cpu time rather than 4. The only exception is where a quorum is formed from the other results before the replacement result gets issued. None of which applies to Rosetta until such time as you start re-issuing WU that pass the deadline. Apols to Rosetta-only participants for bringing in issues relevant elsewhere. |
Ingleside Send message Joined: 25 Sep 05 Posts: 107 Credit: 1,514,472 RAC: 0 |
V4.19 is by now an outdated, unsupported BOINC client, made much earlier than the new rule "report if N days since result finished". Also, v4.19 and earlier clients asked for 2x the cache-setting, so it's no wonder if you'll even in single-project can have many results to report at once. As for losing results due to blowing past deadline if server-outages, this isn't really any different even if result is reported immediately after uploading, but has more to do with some users having too large cache... |
River~~ Send message Joined: 15 Dec 05 Posts: 761 Credit: 285,578 RAC: 0 |
Just to be clear, I am not still using 4.19 -- I am just saying I haven't tested the later clients on this point
I disagree strongly. There are good reasons for having a large cache. The fact that some donors have a cache that is larger than needed does not make the issue go away for other donors who have a sensibly large cache. For example, I've had to take a box offline for a week or so. Another box connects through it and won't have a net connection for a week. So I filled it up with ten days of Einstein, deadline 14 days. The alternative is to have two boxes down instead of one. The point that your argument misses (and everyone else making it also misses) is that server outages are not the only thing that a large cache protects against. Please notice I wrote "network outages" in my posting -- for some reason when making this argument it always gets interpreted as "server outages". The BOINC servers are not the only components that can go down for scheduled or unscheduled breaks. There are known-in-advance network outages, and some donors including myself connect through unreliable LANs before reaching the more reliable public Internet. I also currently run BOINC on machines belonging to a local charity. At present they are turned off at night and weekends, but if I ever get permission to have them left on, they won't have network at weekend because they connect via wireless and the wireless hub will still be powered down out of working hours. So those boxes would need a four-day cache to cover public holidays, or a five day cache this Christmas (where there are two public hols following the weekend in England) A box has a result that uploads on Dec 23rd. The client fails to report it as it thinks -- hey the deadline is 26th, plenty of time, I can always do it tomorrow. Then there is no network till 9am on 28th. Failed deadline. It is a problem. It's real. The silly instances, even if they are the majority, don't stop it being real for others. Every one of the credits you can see below in my sig has been returned over one of these two LANS, one just plain unreliable and the other with regular predictable interruptions. |
FluffyChicken Send message Joined: 1 Nov 05 Posts: 1260 Credit: 369,635 RAC: 0 |
As for losing results due to blowing past deadline if server-outages, this isn't really any different even if result is reported immediately after uploading, but has more to do with some users having too large cache... Coincidentaly I've just had 4 jobs do this at Rosetta (one is still crunching) To large cache ? Well I have 4 PC's running though my dial-up so I like a larger cache, most the computers crunch through the bunch of jobs in a day or two. But this one is not on all the time so has hit it's dead line. although I've lowered it now to try to make sure it doesn't happen again, just means I need to connect more often. Now I cannot specify this computer to have a lower cache. (use the profiles, I would be we are limited and cannot create more thatn Work/Home/School where I have one spare for fututre use, one for some remote computers/laptops and my home computers. If BOINC added a feature o create mor profiles that would be nice :D Also how about these features PURGE Jobs If job is reported after deadline (or at anytime really) purge jobs from people queues when they next connect if they are no longer needed or being/have been crunched (we had this feature at FaD). That way the job can be sent out if past the deadline, but then if it is returned, it purged and no waste of crunching. AUTO JOB creating, being on dialup I often run out of jobs before I next connect. -- Rosetta maybe able to do this ? Just a random number thing ?... (again had this at FaD) (ok could add to redundency, but most people would probably prefer there computer doing something than sat idle... idle=no points ;) Used in conjunction with 'purge' if it get sent back and it seen another member is doing it, purge it. Although we have probably gone way of topic by now. Note the ask before connecting doesn't work to well for my dialup (will check my setting though just in case) Why, I think it's to do with the LAN connection, it sees that ad thinks there is a connection, when really it is just the netwrok. Lock it to the modem ? Would love to but I often use the networks modem connection and not the one on the computer so it is set to auto. (ohhh so can't wait till I've finished doing up my house and can move in and back to broadband. Still a dial-up broadband though, come back NTL all is forgiven!) Team mauisun.org |
River~~ Send message Joined: 15 Dec 05 Posts: 761 Credit: 285,578 RAC: 0 |
Now I cannot specify this computer to have a lower cache. (use the profiles, I would be we are limited and cannot create more thatn Work/Home/School where I have one spare for fututre use, one for some remote computers/laptops and my home computers. How to have a fourth set of prefs 1. create prefs for home / school / work and a different set of general prefs. 2. set three sets of computers to venues home / school / work as normal 3. on the fourth set of computers edit the client state file. Do this while BOINC is not running, perhaps by booting up in safe mode to be sure. 3a. Open C: , Program files, BOINC, Client_State and you see the client state in IE. At this stage it is read only. 3b. View -> source to open it in notepad. 3c. Edit -> Find, search for "venue" you should find this line <host_venue>home</host_venue> 3d. Change the venue to something else making sure you leave the angle brackets intact , eg <host_venue>River</host_venue> 3e. When you restart the client you will see a message like, separate preferences for River not found, using general preferences Other people advise to delete the line altogether - I haven't tried this myself but I am told it also works. If BOINC added a feature o create mor profiles that would be nice :D Agreed, whether limited to three or to four, for some people it will not be enough. |
River~~ Send message Joined: 15 Dec 05 Posts: 761 Credit: 285,578 RAC: 0 |
Also how about these features yes, I've started a new thread cos it is well worth discussiomn but not here |
Ingleside Send message Joined: 25 Sep 05 Posts: 107 Credit: 1,514,472 RAC: 0 |
Ah, planned outages is another matter, my comment mentioned "server outages", so was thinking of any of the many reasons for unplanned outages that can happen. But, for a planned outage due to upgrade/repair, it's no problem to also report any finished results beforehand, so delayed reporting isn't really a problem. As for any dialup-connection, if you're not running with auto-dialup, meaning only concerned with unplanned outages, you must manually trigger the connection. But, if you're manually triggering the connection, you can also manually hit "connect" and report any results, if the BOINC client isn't working as the checking-notes indicates. Atleast around here dialup is paid by the second, so when earlier was stuck using dialup was always manually dialing out, manually enabling SetiQueue to upload/download work and so on. Doing the same with BOINC shouldn't be any different, now you can set "disable network-connection". For unexpected outages, like project-servers going down, your neighbour digging-through your phone-line and cutting your connection, computer/lan/modem/router crapping out and so on, there is no point of having a cache-setting larger than 1/2 the deadline. Atleast for server-outages there's normally also a recovery-period afterwards, so wouldn't recommend a cache-setting higher than 1/3rd the deadline.
Unreliable LANs goes into the same category as unexpected server-outages, meaning my recommendation is 1/3rd the deadline as cache-setting, and absolute max cache is 1/2 deadline. For planned outages like during Christmas, the most likely is a 6-day cache. But, if you're running with a 6-day cache, the BOINC client will switch into "Nearest deadline mode" 12 days beforehand. Meaning, the result finished 23rd shouldn't have deadline before the 29th, and again no problems...
Maybe my interpretation of planned downtime is coloured by earlier always manually dialing-out, meaning was always present and could hit "update", but atleast in my opinion planned downtime isn't a problem if you'll not also hit by an unexpected outage. For unexpected outages, since any work is always N days old when finished crunching, if an unexpected outage lasts N days, it means 2N till reported. But, if deadline is shorter than 2N, it means results is reported after deadline, meaning it's a waste of time to have larger cache-setting than 1/2 the deadline, if the point is to guard against unexpected outages. |
River~~ Send message Joined: 15 Dec 05 Posts: 761 Credit: 285,578 RAC: 0 |
hi Ingleside, we're getting closer to agreement on this ;-)
This is not true for all situations. Where boxes connect through other boxes I might not have access to both at the same time. I then need to set box A up in advance to survive the absence of box B In the case of regular outages, like the wireless hub going at weekends, these are different again. I want to set up all the boxes that connect through that hub and I want to do it just once, not every Friday.
Agree totally. An extra safety factor should always be built in anyway. On Rosetta this means you could sensibly have a cache of 9 days using the 1/3rd rule.
I see your point, but I am sure I've seen counter-examples. Maybe I'm biased by experiences under 4.19 tho...
yes, this leads you naturally to assume you can be at the BOINC box (or at least get remote aceess) close to the start of the downtime. That is not true in either of my situations. |
PCZ Send message Joined: 16 Sep 05 Posts: 26 Credit: 2,024,330 RAC: 0 |
I've never bought into the reasons for delaying reporting either. Seems that having to report seperately is a design flaw in the first place. The Fanboys will try to convince you that this is a good thing, it is not. It is a failed attempt to ease the load on underspeced hardware that is to the detriment of us contributors. The only way to restore the lost funtionality is to use an official client. It is unfortunate that participants in well run projects such as this one, have to suffer because of other poorly managed projects using the boinc framework. |
PCZ Send message Joined: 16 Sep 05 Posts: 26 Credit: 2,024,330 RAC: 0 |
I've never bought into the reasons for delaying reporting either. |
PCZ Send message Joined: 16 Sep 05 Posts: 26 Credit: 2,024,330 RAC: 0 |
I've never bought into the reasons for delaying reporting either. Seems that having to report seperately is a design flaw in the first place. The Fanboys will try to convince you that this is a good thing, it is not. It is a failed attempt to ease the load on underspeced hardware that is to the detriment of us contributors. The only way to restore the lost funtionality is to use an official client. It is unfortunate that participants in well run projects such as this one, have to suffer because of other poorly managed projects using the boinc framework. |
Paul D. Buck Send message Joined: 17 Sep 05 Posts: 815 Credit: 1,812,737 RAC: 0 |
The Fanboys will try to convince you that this is a good thing, it is not. Since the point of the BOINC System is to enable projects to be created and run on minimal hardware and budget (for software licences) designing to minimize hardware requirements makes sense. Ingleside did an analysis which I just added to the Wiki if anyone is interested in the technical logic. But, fundamentally, the point is to make several small atomic transactions instead of a larger and more complex one. With the smaller transactions, the "cost" of a failure of one has less impact. File uploads will not faill because of unrelated database issues. With the file uploaded, the database updates will not fail because of full disk drives. Also, the testing for the failure of the other component of the transaction does not need to be performed as it is implicit (you cannot report until the file is uploaded). Lastly, failure of the database update with an integrated transaction would require that the successfully uploaded file will have to be deleted to retain the ACID property of the transaction. Then again, I am probably a Fanboy ... |
PCZ Send message Joined: 16 Sep 05 Posts: 26 Credit: 2,024,330 RAC: 0 |
Yep I'm afraid your guilty as charged. :) Heh ive got an idea. I could run a boinc project on that 386 in the corner. Got a couple of 80MB ESDI's for storage. UM! i could back up the database on that old Jumbo 120. Does the boinc software come on floppies ? "Since the point of the BOINC System is to enable projects to be created and run on minimal hardware" Thats scary. |
FluffyChicken Send message Joined: 1 Nov 05 Posts: 1260 Credit: 369,635 RAC: 0 |
The Fanboys will try to convince you that this is a good thing, it is not. Therefore it would also make sense for the decision (i.e. option to enable/disable) to be given to the project so they can take advantage of their superior hardware or help out there minimal hardware. Team mauisun.org |
Tern Send message Joined: 25 Oct 05 Posts: 576 Credit: 4,695,362 RAC: 7 |
I could run a boinc project on that 386 in the corner. You think you're joking... but one of the projects that is big enough to be on the BOINC "main list", when their server went down, ran off of a Linux laptop on DSL. And I don't mean for a few hours - more like weeks... |
PCZ Send message Joined: 16 Sep 05 Posts: 26 Credit: 2,024,330 RAC: 0 |
Therefore it would also make sense for the decision (i.e. option to enable/disable) to be given to the project so they can take advantage of their superior hardware or help out there minimal hardware. Heh careful where you point that common sense :) |
Paul D. Buck Send message Joined: 17 Sep 05 Posts: 815 Credit: 1,812,737 RAC: 0 |
Therefore it would also make sense for the decision (i.e. option to enable/disable) to be given to the project so they can take advantage of their superior hardware or help out there minimal hardware. Not sure what you mean by this. But, the design for low cost minimal hardware does take advantage of better hardware. If you add faster servers, the system will be faster. But, will operate on a low end PC class server quite happily. And, as was noted, one of the smaller projects did operate off of a laptop for a short period of time. |
FluffyChicken Send message Joined: 1 Nov 05 Posts: 1260 Credit: 369,635 RAC: 0 |
Somewhere up there in the posts was mentioned that instant reporting was done basically to lower load on the servers, therefore if they have better hardware they would be able to allow it (at their descration, they being the project) Hence why give the option. Although I would recommend the instant reporting to be a batch reporting, after all jobs in upload queue report sort of thing. I say this as it's is bloody annoying so I revert to a 3rd party compile boinc that enables it (so undoing any of 'boincs' intentions anyway, unless it can be stopped and controlled project side) Why is it annoying, Reason, being that I have to baby sit these computers. 4 computers all going through dial-up I conect when I browse the net or do other thing, I would not normally connect just for BOINC. At this point any done jobs send them selves, but then sit at reporting stage. I may disconnect (as even after while they have no reported themsleves) I then run out of jobs and have a lot sat at reporting stage. Computers do not a lot. So now I would have to baby sit, hit buttons, wait till all sent, report etc... to keep it happy. All I want is my computers to run boinc continuosly on there own. At the moment 2 of the copmuter are sat idle, they uploaded their results an hour or so ago and still have not reported back, hence they have NO jobs to run. (so rosetta is loosing out as I could easily disconnect and not notice, computer do nothing but burn electricity, whats the point ?) The one computer I have set to instant report has never ran out of jobs yet, It would be loverly to have the world using always on connections, running 24/7 computers but that is not the case. If boinc is designed to run on minimal hardware, it should also be designed to run over minimal networking connections (modem, dial-up, etc..) It going to be a long time before we move away from 56k dial-up connections being in abundent use. (also some sort of decent compression built into boinc wouldn't go a miss...;-)) Team mauisun.org |
Message boards :
Number crunching :
Credit posting screwiness
©2024 University of Washington
https://www.bakerlab.org