Message boards : Number crunching : code release and redundancy
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 · Next
Author | Message |
---|---|
STE\/E Send message Joined: 17 Sep 05 Posts: 125 Credit: 4,100,301 RAC: 84 |
I wanted to offer a compromise to the quorum/no quorum factions: In a Perfect World maybe this could work out but as everybody knows the Worlds not Perfect. It is quit possible that some people could end up getting the shorter WU's 70% - 80% of the time and think this is great. And some people could end up getting the longer WU's 70% - 80% of the time and think this blows and just drop out ... |
Hoelder1in Send message Joined: 30 Sep 05 Posts: 169 Credit: 3,915,947 RAC: 0 |
I wanted to offer a compromise to the quorum/no quorum factions: I am sure our world has many imperfections but randomness is still randomness. You can be completely sure that once you have crunched more than a handful of WUs almost exacly 50% of them will be short ones or long ones. And the more you crunch the more accurate this will get (you seem to do a lot of crunching). |
FluffyChicken Send message Joined: 1 Nov 05 Posts: 1260 Credit: 369,635 RAC: 0 |
I also know that the top 20 teams are bringing in most of the WU (read=most of the science) and that they are stats-driven. Is it :eek: by how much does it over estimate? Where do we get it from? I'm only saying that as I am using 5.3.1 on some of my computers, the reason for using it is it gives instant 'report back on sending a result', i'm on dial-up and trhat is bloody annoying! and also alows me to set boinc's ports (so enabling BoincView to be used across the internet and through a router with port forwarding.) Is there an none-optimised version with these features. Team mauisun.org |
FluffyChicken Send message Joined: 1 Nov 05 Posts: 1260 Credit: 369,635 RAC: 0 |
I also know that the top 20 teams are bringing in most of the WU (read=most of the science) and that they are stats-driven. Never mind the link, I just google Boinc 5.3.1 and got this http://boinc.truxoft.com/ Team mauisun.org |
Tern Send message Joined: 25 Oct 05 Posts: 576 Credit: 4,695,362 RAC: 7 |
Use a quorum of three or five but make sure that each WU has a different random seed. First, terminology. A quorum only has one WU, and "n" results. That right there throws out your idea, unless the whole thing was redesigned on the server side. Each member of a quorum simply downloads a copy of the _same_ file from the server. That distribution would have to be changed. Also, how do you validate? Each result would be different, so no comparison would be possible. I realize there is effectively NO validation now, but you'd have to have some kind of validator running (more complex than current, though still simple) to do the credit averaging, even if it didn't actually look at the content of the results and only at the status and the claimed credit... And, which is the canonical result? The lowest energy? I think they want to store _all_ the results. Good basic idea, but I think given the structure of the BOINC architecture, it'd just be way too complicated to implement. |
Shaktai Send message Joined: 21 Sep 05 Posts: 56 Credit: 575,419 RAC: 0 |
First, terminology. A quorum only has one WU, and "n" results. That right there throws out your idea, unless the whole thing was redesigned on the server side. Each member of a quorum simply downloads a copy of the _same_ file from the server. That distribution would have to be changed. Also, how do you validate? Each result would be different, so no comparison would be possible. I realize there is effectively NO validation now, but you'd have to have some kind of validator running (more complex than current, though still simple) to do the credit averaging, even if it didn't actually look at the content of the results and only at the status and the claimed credit... You make good points Bill. The issue is with the core point calculation method. It is important to address the root cause, which is a credit claim that can be manipulated, rather then "patch things" project by project with solutions that aren't fair to anyone. I support redundancy if it benefits the science, but if it is only to give the "illusion" of fairness, I must oppose it. While the current BOINC credit calculation was well intentioned, people have found ways to manipulate it. Some with good intentions, to compensate for optimized science apps, and some with bad intent, to just boost the "illusion" of their production. I remember that it wasn't that long ago, there was someone who kept creating new unit ID numbers, and then would periodically just "merge" with one of the older Unit ID numbers. Did nothing for his real production or credits, but kept him at the top of some statistics lists because of the way it manipulated the RAC calculations. A silly game, but some people will do it no matter what. Team MacNN - The best Macintosh team ever. |
Hoelder1in Send message Joined: 30 Sep 05 Posts: 169 Credit: 3,915,947 RAC: 0 |
Use a quorum of three or five but make sure that each WU has a different random seed. Are you perhaps thinking in terms of SETI where each WU uses a different set of data from the telescope ? With Rosetta we are usually doing 100,000 runs or so of the same protein where the only difference between the WUs is a different random seed which makes the protein wiggle around in a different way. That random seed doesn't even have to be supplied to the application but can be created internally using the system clock. In a way one could say that we are running 100,000 incarnations of the same WU which, due to the randomness of the Rosetta algorithm, leads to 100,000 different results. Also, how do you validate? Each result would be different, so no comparison would be possible. I realize there is effectively NO validation now, but you'd have to have some kind of validator running (more complex than current, though still simple) to do the credit averaging, even if it didn't actually look at the content of the results and only at the status and the claimed credit... Yes, you are exactly right. I only meant to address the credit issue, there would be no validation of results, same as the current situation and yes, all results would have to be stored, there is no canonical result. @Shaktai I support redundancy if it benefits the science, but if it is only to give the "illusion" of fairness, I must oppose it. Just to make sure there is no misunderstanding: There would be _no_ redundancy in the system I proposed and I believe it would be fair. |
Tern Send message Joined: 25 Oct 05 Posts: 576 Credit: 4,695,362 RAC: 7 |
Are you perhaps thinking in terms of SETI where each WU uses a different set of data from the telescope? With Rosetta we are usually doing 100,000 runs or so of the same protein where the only difference between the WUs is a different random seed which makes the protein wiggle around in a different way. That random seed doesn't even have to be supplied to the application but can be created internally using the system clock. In a way one could say that we are running 100,000 incarnations of the same WU which, due to the randomness of the Rosetta algorithm, leads to 100,000 different results. I understand the general principle on Rosetta, but I'm not clear on "where" the random seed is supplied - at the server, or at the host. And, even if it is supplied at the host, is it this random seed that causes the HUGE variation in run times? If so, you could have two identical hosts, doing the "same WU" (actually two different results, with different random seeds) and one could take an hour and the other take nine. If it's something else causing the run-time differences, that could be prevented from impacting this, then the "issuing" of the WU wouldn't be a problem. Yes, you are exactly right. I only meant to address the credit issue, there would be no validation of results, same as the current situation and yes, all results would have to be stored, there is no canonical result. I have NOT looked at the BOINC-supplied server code... but I suspect there is no provision for doing anything remotely like this "built in". Thus Rosetta would have to write a validator and an assimilator from the ground up, or at least heavily modify the supplied ones. That _might_ be quicker than simply going to a flops-counting system, but I just have a 'gut feel' that it wouldn't be. @Shaktai I believe your proposal would be fair (well, as fair as SETI or Einstein or Predictor or...) and I understand there would be no "true" redundancy of the science. There would be the _appearance_ of redundancy, to anyone who didn't know about the random seeds, and as far as the basic BOINC software would be concerned, it would think there was redundancy. As I said, I think this is a _good_ idea, but I still don't think it's practical, not when compared to flops-counting. An additional objection given the current state of the science application - I think you would be forced to go to Homogenous Redundancy; have one "WU" be all Mac, one be all Windows, one be all Linux, etc. Otherwise the Mac times are going to be so skewed, anyone in a group with a Mac user will be thrilled, but the Mac users will suffer by having their credits lowered drastically (on average). I'm getting 90 credits on my Mac for what I _think_ are "similar" WUs that are giving me 20 credits on the PC, simply because the Mac takes so much longer to do them, well over the 2x the benchmarks would indicate. The "credits per hour" works out about right, but it wouldn't if I claimed 90 for a half-day's work, and got 20. |
Hoelder1in Send message Joined: 30 Sep 05 Posts: 169 Credit: 3,915,947 RAC: 0 |
Well, I had assumed that Rosetta must have options to either use a server supplied random seed _or_ create on by itself which will be different each time the executable is called. I am surprised by the huge range in run times on identical hosts (1 vs. 9 hours) that you quote. For the same type of WU (same name except for the running number at the end) I don't think I usually see deviations of more then +/- 20% on my machine (P4 3.2GHz), definitely haven't ever seen more than a factor of two between fastest and slowest. I guess I agree to most of the rest of what you say and neither do I know BOINC from the inside. My motivation to suggest this was simply as a way to fix the credits that would hopefully require less work for the Rosetta team to implement than flops-counting. But then I don't know enough of the details to be sure this would be the case. I can't comment on the MAC situation since I don't know anything about that. |
Tern Send message Joined: 25 Oct 05 Posts: 576 Credit: 4,695,362 RAC: 7 |
I am surprised by the huge range in run times on identical hosts (1 vs. 9 hours) that you quote. For the same type of WU (same name except for the running number at the end) I don't think I usually see deviations of more then +/- 20% on my machine (P4 3.2GHz), definitely haven't ever seen more than a factor of two between fastest and slowest. Hm... I never look at the names. I assumed that WUs were issued in "blocks", where I'd be likely to get, say, 10 that would run about the same length in a row. Instead, I seem to always get two or three "short", one "super long", then a couple more "short", and several "medium long". If they're issued semi-randomly rather than in blocks, then the range may be much closer "by name" - if you say 20%, I'll believe it. And on the Windows box, I probably only see a total range between shortest and longest of maybe 5 or 6x. The Mac side, 9x between one WU and the next is common. Makes it rough on the DCF! I think we're both just making assumptions about the relative difficulty of flops-counting vs. server-side "pseudo-redundancy", without knowing the real answer based on seeing the code. Either of us could well be right. Whichever, we've given David some things to think about! :-) |
Hoelder1in Send message Joined: 30 Sep 05 Posts: 169 Credit: 3,915,947 RAC: 0 |
I think we're both just making assumptions about the relative difficulty of flops-counting vs. server-side "pseudo-redundancy", without knowing the real answer based on seeing the code. Either of us could well be right. Whichever, we've given David some things to think about! :-) ... and "pseudo-redundancy" seems to be a nice, catchy name for this kind of approach. :-) |
Paul D. Buck Send message Joined: 17 Sep 05 Posts: 815 Credit: 1,812,737 RAC: 0 |
Understand that FLOP counting is not a panacea and does not address all of the problems with the awarding of credit. I believe that the experience on the SETI@Home beta test shows that the variance between the various clients is better. But, we still have a fundamental issues with the whole system. To know for sure we will need to wait and see what the results really look like with the new system once it is fielded and being used widely. Then we will know if this has addressed the issues well. |
Tern Send message Joined: 25 Oct 05 Posts: 576 Credit: 4,695,362 RAC: 7 |
Understand that FLOP counting is not a panacea and does not address all of the problems with the awarding of credit. Just in case I've been misunderstood, I'll note that I agree totally with Paul on this. I think flops-counting solves ONE of the several problems. Paul's proposal for calibrated hosts solves many more of them, and is independent of, and even complimentary to, flops-counting. However, because the UCB part of the work has _already_ been done to implement flops-counting, I believe it is _probably_ the easiest way for Rosetta to QUICKLY "patch" the credit issue, while avoiding "true redundancy" and the associated loss of CPU power. If Rosetta wants to be the first to implement calibrated hosts, I sure won't argue with it! :-) |
Ingleside Send message Joined: 25 Sep 05 Posts: 107 Credit: 1,514,472 RAC: 0 |
Understand that FLOP counting is not a panacea and does not address all of the problems with the awarding of credit. I believe that the experience on the SETI@Home beta test shows that the variance between the various clients is better. But, we still have a fundamental issues with the whole system. Simplified, there's 4 problems: 1; Different computers crunching the same wu gives wastly different claimed credit. While one computer claims 10 CS, another can claim 50 CS. As BOINC alpha has shown, even re-running the exact same wu on same computer can give over 30% variation in cpu-time. 2; Badly time-calibration, meaning example 1h gives 10 CS, while 2h gives 100 CS. 3; Badly cross-project calibration, example an average computer gets 10 CS for 1h crunching in project 1, but in project 2 would have got 25 CS. 4; Cheaters, trying to get more credit than should. #1, should hopefully be fixed by using boinc_ops_per_cpu_sec, the very limited numbers with Seti_Enhanced indicates 1% variation. #2, not sure if this is a problem with the current BOINC-benchmark... If it becomes a problem with boinc_ops_per_cpu_sec, the project must re-calibrate so flops from the most time-consuming functions is given higher weight than flops from the "easy" functions. #3, since the same computer can be excellent in one project but total crap in another, you can never remove cross-project-variations. The only you can do, is to calibrate so an average computer gives roughly the same claimed credit regardless of project. #4, by using fixed credit for same-type wu, only way to "cheat" is if a project isn't following #2, and users therefore dumps the "slow, little-credit-wu" for the "fast, much-credit-wu". Example CPDN and Folding@home uses fixed wu-crediting. Since Folding@home is awarding "bonus-points", it means they're not following #2... Another solution is to use redundant computing, like most BOINC-projects is doing. This doesn't stop someone from trying to cheat, but since granted credit is either lowest claimed with min_quorum = 2 or average after removing highest and lowest claimed with min_quorum > 2, as long as not two cheaters is running the same wu the trying to cheat was unsuccessful. Now, as long as claimed credit can be all over the place, there's difficult to test if someone is really trying to cheat or not. But, if using boinc_ops_per_cpu_sec gives example max 1% variation, can add something like this: If highest claimed credit > 1.02x lowest claimed, increase userid_cheater_count of highest claimer If userid_cheater_count > N, deny credit for the next 2N results, and set userid_cheater_count = N-1 Use N = 10 or something... Well, it's unlikely any project will use this automated punishment-system, but atleast in this system anyone trying to cheat once too much very likely looses much more than gained by trying to cheat. :evil-grin: While implementation can be different, as long as not over 50% is cheating, a project that uses redundancy and boinc_ops_per_cpu_sec can easily check server-side after someone trying to cheat... Anyway, adding boinc_ops_per_cpu_sec will not stop anyone from trying to cheat. But, if this removes the huge more or less random variations in claimed credit, even in a project that does not use redundancy it will make it easy to example sort each wu-family after the result with highest claimed credit... |
Ingleside Send message Joined: 25 Sep 05 Posts: 107 Credit: 1,514,472 RAC: 0 |
.... hmm, double-post... :oops: |
Paul D. Buck Send message Joined: 17 Sep 05 Posts: 815 Credit: 1,812,737 RAC: 0 |
Ingleside, I was going to look at the SETI@Home beta to look at the numbers as part of this debate, but lost my URL to it (not a member). And, I am more than willing to admit that if the FLOPS/IOPS counting does work as well as you say, this may be the "best" solution. Without any data I am adverse to speculating more. The reservation I have is the point you make that the same WU run on the same computer has such a potential variance in CPU time. This means that the whole basis of our measurement of FLOPS vs CPU seconds is not a valid metric. Of course it is entierly possible I misunderstood what you were saying. :) I think the part I am still missing is when I count FLOPS on, say, EAH, My PowerMac does the WU in 4 hours ... a standard PC will take about 10 ... in theory we should have the same FLOPS count, but how do we get from FLOPS to Cobblestones? |
Ingleside Send message Joined: 25 Sep 05 Posts: 107 Credit: 1,514,472 RAC: 0 |
Ingleside, The SETI@Home test-project To find examples on the other hand is more difficult, since there's few wu there 2 or more is running BOINC v5.2.6 or later. Also, Seti_Enhanced was upgraded to v4.11 on Friday, and this version should give higher claimed credit, to be more comparable of claims you'll get if used the benchmark. So, for the moment the claimed credits is still all over the place... But, I do have two examples with v4.09: One example there none is running optimized application: 60.6510490617153 60.6537686345174 Difference: 0.004484% Another example: 62.3131028703056 (p4-ht, XP) 62.0348941663368 (p3, win2k) 62.3130984694757 (p4-ht, linux, optimized) Difference for p3: 0.448% Difference for unoptimized-XP and optimized-linux: 0.0000071%
Re-running the exact same wu on same computer with the same application should give the exact same flops-count. As the last example has shown, even if wastly different cpu and OS, and therefore wastly different crunch-times, there's little variation in the flops-count. So, if an average computer in Einstein@home uses 10h and average granted credit is 66 CS, your computer using only 4h on an average Einstein-wu should claim it used 5.7024e13 flops. If you re-runs the same wu multiple times, the reported cpu-time can variate with example 3.9h, 4h, 4.2h, 4.1h, 4.05h and so on, but you'll still use 5.7024e13 flops. This variation in cpu-time is due to whatever the computer is running alongside. This variation is impossible to remove, so the best can do is to do multiple runs, and afterwards example remove highest and lowest cpu-time and average the rest, and calibrate so 4h on this computer is 66 CS. |
Paul D. Buck Send message Joined: 17 Sep 05 Posts: 815 Credit: 1,812,737 RAC: 0 |
Ingleside, Thanks! :) As usual, concise and clear. But, not being on the test project, I have not been paying attention to the full mechanism. Thus my denseness. In the Cobble-Compter we have FLOPS per Second as our unit of capacity. In the new mechanism we count FLOPS, with time so variable, my expectation is that the credit calculation will also result in variation. I know I am mising the point somewhere here ... Something simple too ... |
Honza Send message Joined: 18 Sep 05 Posts: 48 Credit: 173,517 RAC: 0 |
I know I am mising the point somewhere here ... It's been almost a month since this thread started and it already expanded quite well [discussing topic in 100 post] and has quite an interest of Rosetta participants [3000 views]. But, one can hardly see any post regarding the topic from Rosetta team there expect on the very first day of the thread (Nov 10th). IMHO, it would be nice to see some official words on how things are going in this matter, please. |
Jack Schonbrun Send message Joined: 1 Nov 05 Posts: 115 Credit: 5,954 RAC: 0 |
I'll chime in with my opinions as one of the Project Developers, though I am not actively working on this issue. I do find it interesting. I am still trying to improve my understanding of the psychology of credit. It is a little different from what we normally think about, doing our science research. So I don't want to make any faulty assumptions about what would be a good system. But I've been trying absorb your discussions regarding what you consider fair. From my perspective, the goal is to have as many productive cpu cycles as possible. Clearly, having a fair credit system makes users happy, and should lead to more hosts, thereby creating more cpu cycles for Rosetta@home. Conversely, redundancy, by definition, reduces the number of cycles available for computation. And it may even turn some people off, because they feel like the the project is being run inefficient way. Redundancy has several good points though: (1) It provides automatic validation of results. (2) It is already built into boinc, so it would take less development time to implement. If we had unlimited development time, I think the best system would be one along the lines of Hermann's pseudo-redundancy. One could take the median time of the Work Units that differ only by random number seed. I believe the Work Units are structured such that thousands are sent out at a time that fit this requirement. This would mean that very reliable statistics could be generated about the average cpu requirements of a Work Unit. This could then be used to assign credit. It would be far less noisy than 2 fold or 4 fold redundancy. Because I don't have a true hacker's mentality I'll need help understanding the cheating loopholes in this system. In practice, we are more likely to implement true redundancy. Boinc already has strong support for it. And besides more trustworthy credit, we will get automatic validation of results. |
Message boards :
Number crunching :
code release and redundancy
©2024 University of Washington
https://www.bakerlab.org