code release and redundancy

Author	Message
BIG DAVE* Send message Joined: 2 Oct 05 Posts: 9 Credit: 786,697 RAC: 0	Message 4698 - Posted: 29 Nov 2005, 17:14:36 UTC I'm repeating myself but it applies to this thread too... To me there's not enough optimized clients! There should be Rosetta clients optimized for every kind of cpu and OS out there and also a default unoptimized client for those who don't know what kind of cpu they have (sse,sse2,sse3 etc etc) To me, the thing that matters the most is getting the maximum amout of valid units crunched for the project. Now all these optimized clients should be made official by the project and only those official clients should be used. This would eliminate almost any kind of cheating, no need for a quorum of 3 and would get more units crunched because almost everybody would be using specifically optimized clients. Cheating can never be totally eradicated so at least this way a lot more crunching will get done and this is only good for the project. ID: 4698 · Rating: 0 · rate: / Reply Quote

nasher Send message Joined: 5 Nov 05 Posts: 98 Credit: 890,793 RAC: 0	Message 4711 - Posted: 29 Nov 2005, 19:14:40 UTC @sir_LION: the problem with multiple optimised clients is someone out there is going to find the one that makes HIS processer report higher numbers and use it no matter what his processer truly is. as for makeing it official that means the people at boinc or at the local projects have to look over the code and agree its best (alot of time there) and optimising a client for Rosetta could make peoples score run high or low on other projects and that could be a cause for "cheating " or it could be the fact that like me each computer has 2-5 difrent projects running on it. ID: 4711 · Rating: 0 · rate: / Reply Quote

stephan_t Send message Joined: 20 Oct 05 Posts: 129 Credit: 35,464 RAC: 0	Message 4738 - Posted: 29 Nov 2005, 22:51:24 UTC @sir_LION: see my reply on the "@Admins: quorum of 3 results needed" thread. I didn't want to duplicate my message over here :-) Team CFVault.com http://www.cfvault.com ID: 4738 · Rating: 0 · rate: / Reply Quote

dgnuff Send message Joined: 1 Nov 05 Posts: 350 Credit: 24,773,605 RAC: 0	Message 4756 - Posted: 30 Nov 2005, 3:13:53 UTC - in response to Message 4698. I'm repeating myself but it applies to this thread too... To me there's not enough optimized clients! Just adding my voice here, it's worth remembering some advice given to me (and my classmates) about 25 years ago when I was a young, still wet behind the ears comp sci undergrad. "First make it work. Then make it work fast." Right now, we're trying to get the basic Rosetta algorithm tuned. David Baker et al are working on ways to make the search process more efficient. The "Work smarter, not harder" approach. Once we have the "smartest" algorithm codified, then it makes sense to look at optimizing it. A quote from a good friend mirrors the advice above. "The worst mistake you can make in software engineering is premature optimization." We may well see optimized clients for Rosetta. Just not yet, it's too early. ID: 4756 · Rating: 1 · rate: / Reply Quote

Snake Doctor Send message Joined: 17 Sep 05 Posts: 182 Credit: 6,401,938 RAC: 0	Message 4764 - Posted: 30 Nov 2005, 6:06:25 UTC Last modified: 30 Nov 2005, 6:07:51 UTC David, Having been involved in beta testing of both E@H and S@H coupled with being a Mac user, I would offer the following. As to cheating, the most obvious reason to cheat is to gain a credit advantage in the informal competitions surrounding all of the BOINC projects. This can be accomplished in a number of ways that do not involve the project code. On that basis sequestering the code for the project offers no protection. As a Mac user I can understand why the projects do not devote a lot of support to the Mac folks, but the Mac community has a lot of bright guys that can improve the performance of project code without damaging the science. SETI is a good example of this. There is an altivec enhanced version of the S@H application that is about 3 times faster than the project released code and it actually has tighter math than the project requires. This was accomplished by two guys using willing people from the project community as testers. I would agree that processing the WUs more than once is a resource waste. That said this may be the only way to satisfy people about the credit cheating issue assuming that the rather flakey credit calculation BOINC does stays in place. Since BOINC does all the credit calculations this is actually beyond your control. The fact is the BOINC code is out there and the optimized clients do nothing more than inflate the credit claims. I am aware of systems in this project that process no more WUs in a day than I do yet they receive three times the credit. Does that bother me? Yes. Will I stop working the project because of it? No. So in the final analysis the science is all that matters. If releasing the code for optimization to outside programmers could cut processing time in half, then you could release it and send each WU out to two machines without loss of progress on the project. Were I on your development team, I would modularize the code, and release the portions that are time intensive and see if someone can help. I know for certain that if altivec code were used for the Mac version of your application the speed improvement might astonish you. This might also speed the delivery of a screen saver for Mac users that want one. But you should only do this as your comfort level with the project dictates. Regards Phil We Must look for intelligent life on other planets as, it is becoming increasingly apparent we will not find any on our own. ID: 4764 · Rating: 1 · rate: / Reply Quote

adrianxw Send message Joined: 18 Sep 05 Posts: 662 Credit: 12,167,519 RAC: 0	Message 4780 - Posted: 30 Nov 2005, 12:08:17 UTC >>> release the portions that are time intensive and see if someone can help. This will help, but almost certaily not produce the fastest solution. If it is the approach that is wrong, not the details of function abc(), then this will be the case. True, optimising function abc() will yield an improvement, but if the reason it is taking too long is because abc() is being called too often from def(), then you are treating the symptom, not the cause. Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. ID: 4780 · Rating: 0 · rate: / Reply Quote

Honza Send message Joined: 18 Sep 05 Posts: 48 Credit: 173,517 RAC: 0	Message 4783 - Posted: 30 Nov 2005, 12:25:17 UTC @ Phil, nice post. I don't see the problem as adrianxw - finding a fastest solution. Releasing whole code might be necessary in order to test optimalized application - perhaps in an off-line state. We need to test whole process. > But you should only do this as your comfort level with the project dictates. Yes, the graphics implementation stage, IMO, is not good timing for optimalization. But once it is there and working, we might looking into porting it to other platforms and look at the code/compiler optimalization as well. For now, we can only wonder how much faster would the optimalized application be... ID: 4783 · Rating: 0 · rate: / Reply Quote

Snake Doctor Send message Joined: 17 Sep 05 Posts: 182 Credit: 6,401,938 RAC: 0	Message 4784 - Posted: 30 Nov 2005, 12:56:29 UTC - in response to Message 4780. >>> release the portions that are time intensive and see if someone can help. This will help, but almost certainly not produce the fastest solution. If it is the approach that is wrong, not the details of function abc(), then this will be the case. True, optimizing function abc() will yield an improvement, but if the reason it is taking too long is because abc() is being called too often from def(), then you are treating the symptom, not the cause. It is true that his would not necessarily initially produce the fastest possible code. But with processing times rising into the area of 9 and 10 hours it would still help the project. Also consider that if the comfort level of the Project team is not high for release of the code at all, a partial release of those portions of lessor importance or lower impact to the project may improve their comfort over time. I would rather see computational intensive portions released and improved than to see no improvement and still be watching WU times climb. There is the promise of saving lives in this research, and as such it should be done as fast as as we can do the work with accuracy. I understand the reluctance for the release of the code in the face of possible corruption of the basic science. Regards Phil We Must look for intelligent life on other planets as, it is becoming increasingly apparent we will not find any on our own. ID: 4784 · Rating: 0 · rate: / Reply Quote

BIG DAVE* Send message Joined: 2 Oct 05 Posts: 9 Credit: 786,697 RAC: 0	Message 4880 - Posted: 1 Dec 2005, 16:29:47 UTC I didn't make myself clear on my last post, when I said there should be more optimized clients I meant not just the Boinc app but Rosetta too... There should be optimized Rosetta for Every kind of OS and processors be it Intel, Amd, Mac, Sun or whatever... And these optimized apps should be made standard by Rosetta so everybody would have to use them, any other app used should be rejected. This way, we'd all be crunching much faster and the cheating would be kept to a minimum. This would help the project in every way, more consistant results and in the points awarded, less cheating and a lot more units being crunched. Sorry for the repeat ID: 4880 · Rating: 0 · rate: / Reply Quote

Honza Send message Joined: 18 Sep 05 Posts: 48 Credit: 173,517 RAC: 0	Message 4891 - Posted: 1 Dec 2005, 18:34:03 UTC - in response to Message 4880. I didn't make myself clear on my last post, when I said there should be more optimized clients I meant not just the Boinc app but Rosetta too I would suggest to use terms like BOINC core and Rosetta application. There is no such thing as BOINC application (it is a platform with core, deamon) but there are BOINC project's application (like Rosetta, CPDN, optimalized SETI etc.). Each project can have more than one application: the standard, beta, optimalized or even different type of WU to meet project objectives - CDPN Slab, Sulphur, SpinUp [beta], CoupledModel [anticipated] etc. ID: 4891 · Rating: 0 · rate: / Reply Quote

Tern Send message Joined: 25 Oct 05 Posts: 576 Credit: 4,702,007 RAC: 0	Message 4895 - Posted: 1 Dec 2005, 19:40:31 UTC - in response to Message 4891. I would suggest to use terms like BOINC core and Rosetta application. The "official" terminology is Client = BOINC Core Client, and Application = project science application. "Client" should never be used when referring to a project's app... but it frequently is. Making it "core client" does reduce the confusion. ID: 4895 · Rating: 0 · rate: / Reply Quote

j2satx Send message Joined: 17 Sep 05 Posts: 97 Credit: 3,670,592 RAC: 0	Message 4898 - Posted: 1 Dec 2005, 21:05:23 UTC - in response to Message 4895. I would suggest to use terms like BOINC core and Rosetta application. The "official" terminology is Client = BOINC Core Client, and Application = project science application. "Client" should never be used when referring to a project's app... but it frequently is. Making it "core client" does reduce the confusion. Why don't we all call it the BOINC Manager, which is what BOINC calls it? ID: 4898 · Rating: 0 · rate: / Reply Quote

Honza Send message Joined: 18 Sep 05 Posts: 48 Credit: 173,517 RAC: 0	Message 4900 - Posted: 1 Dec 2005, 21:50:48 UTC - in response to Message 4898. Why don't we all call it the BOINC Manager, which is what BOINC calls it? Because one doesn't need BOINC Manager to run BOINC. It's only GUI. You can use command-line version of BOINC core (boinc.exe) that uses less memory and is a bit faster. BoincView provides some function that of BOINC Manager, but can be easily applied over LAN (many machines). So, there might be no need for BOINC Manager or/and there are substitutions for it (wink). ID: 4900 · Rating: 0 · rate: / Reply Quote

[B^S] sTrey Send message Joined: 25 Sep 05 Posts: 16 Credit: 15,524 RAC: 0	Message 4910 - Posted: 2 Dec 2005, 2:32:30 UTC Last modified: 2 Dec 2005, 2:33:15 UTC FWIW, my "votes" I prefer open or at least visible source, and would be in favor of redundancy by two. But I'll stay here regardless. I don't get why the idea of rosetta redundancy so outrages some people. Most of the BOINC projects use redundancy --I think of Rosetta as having a temporary advantage in resources by not using it but it doesn't seem real world. That said, if the science doesn't need it i.e. to weed out bogus (by design or accident) results, and it's not needed to avoid a credit or validity "scandal" and subsequent mass exodus from Rosetta, then there's no point. (I'd mind non-redundancy less if the benchmarks & credit calcs seemed less flaky, but credit is secondary.) And for open source -- I have a hard time understanding the driving need behind hacking up the code to get higher numbers on a list, but I'm cynical/old enough to believe it exists. So opening up the source or making it visible, is only worthwhile if that sort of stupidity can be made ineffective. ID: 4910 · Rating: 1 · rate: / Reply Quote

Aegion Send message Joined: 14 Oct 05 Posts: 12 Credit: 3,374,900 RAC: 0	Message 4917 - Posted: 2 Dec 2005, 7:14:19 UTC - in response to Message 4910. FWIW, my "votes" I prefer open or at least visible source, and would be in favor of redundancy by two. But I'll stay here regardless. I don't get why the idea of rosetta redundancy so outrages some people. The key is that the nature of the science isn't actually threated at all by someone simply cheating in order to boost their score, so that removes the key reason for redundancy in other projects. At worst that person is just not productively helping the project, but he's not significantly harming the science of the project in a relevant way. The only real threat would be someone deliberately faking a perfect, or close to perfect, result using knowledge of what protein structure we are trying to predict, and thereby confusing the answer to what strategies actually serve to have software predict the shape of the protein without knowledge of its original shape. Of course the only people who could plausibly do this and trick the scientists involved with the project are those who are experts themselves in this area, so that makes this possibility extremely unlikely. Basically the reason many people are against redundancy is you're cutting the effective cpu power of the project in half or more is you're talking about triple redundancy, and gaining no measurable scientific benefit even in assured accuracy in the process. ID: 4917 · Rating: 0 · rate: / Reply Quote

stephan_t Send message Joined: 20 Oct 05 Posts: 129 Credit: 35,464 RAC: 0	Message 4921 - Posted: 2 Dec 2005, 8:01:33 UTC - in response to Message 4917. the reason many people are against redundancy is you're cutting the effective cpu power of the project in half or more is you're talking about triple redundancy, and gaining no measurable scientific benefit even in assured accuracy in the process. Aegion is right, although the flip side of the argument is that without redundancy or flop counting, the stats are useless (as they are very easily cheated). I don't think it's a matter of the science vs the stats being 'the most important'. IMHO the most important thing for this project is that a lot of users participate. More users = more science. I know for one that if it wasn't for the stats, I wouldn't be here today, and I certainly wouldn't spend a good bit of time trying to keep 4 boxes at 100% cpu time and minimum overheating 24/7. I also know that the top 20 teams are bringing in most of the WU (read=most of the science) and that they are stats-driven. From experiencing other projects, such as FAD, I can tell you that it's the competition aspect that make the most productive users join. So, if you love the science and don't care about the stats, you should love the stats anyway, because they bring in more users, and therefore more science. As other said before Flop counting is one solution but it requires work on the rosetta application. On a similar thread in those forums there's a great post by Bill Michael that describe various steps the admins could take to tackle this issue, one of them being actually measuring the ammount of cheating that do take place. Team CFVault.com http://www.cfvault.com ID: 4921 · Rating: 0 · rate: / Reply Quote

Webmaster Yoda Send message Joined: 17 Sep 05 Posts: 161 Credit: 162,253 RAC: 0	Message 4924 - Posted: 2 Dec 2005, 8:56:40 UTC - in response to Message 4921. I also know that the top 20 teams are bringing in most of the WU (read=most of the science) and that they are stats-driven. Definitely stats driven. When I looked at 3 hosts in the top 3 teams, at random, 2 of the 3 were using BOINC 5.3.1, which is an optimized version. Many have also hidden their computers, some perhaps because they ARE cheating. What I am saying is... Stats driven teams have cheaters too. Want to lose them? * Join BOINC@Australia today * ID: 4924 · Rating: -1 · rate: / Reply Quote

stephan_t Send message Joined: 20 Oct 05 Posts: 129 Credit: 35,464 RAC: 0	Message 4925 - Posted: 2 Dec 2005, 9:10:09 UTC - in response to Message 4924. What I am saying is... Stats driven teams have cheaters too. Want to lose them? Cheaters are cheaters, regardless of their intentions- and yes, they should go! I think someone posted a good breakdown of what level of 'cheats' we might encounter on such a project in a different thread. Team CFVault.com http://www.cfvault.com ID: 4925 · Rating: 0 · rate: / Reply Quote

Paul D. Buck Send message Joined: 17 Sep 05 Posts: 815 Credit: 1,812,737 RAC: 0	Message 4928 - Posted: 2 Dec 2005, 9:32:37 UTC - in response to Message 4924. What I am saying is... Stats driven teams have cheaters too. Want to lose them? Yes. Because they cheapen the accomplishment. ID: 4928 · Rating: 0 · rate: / Reply Quote

Hoelder1in Send message Joined: 30 Sep 05 Posts: 169 Credit: 3,915,947 RAC: 0	Message 4932 - Posted: 2 Dec 2005, 9:49:24 UTC Last modified: 2 Dec 2005, 9:51:48 UTC I wanted to offer a compromise to the quorum/no quorum factions: Use a quorum of three or five but make sure that each WU has a different random seed. That way no work will be lost, each WU will give a different result. They will of course also have somewhat different completion times but since they all will be assigned the same granted credit (median of the the claimed credit) there will be some randomness involved. If yours was the longest/slowest WU you have had bad luck, but half the time you will be the lucky one and you will be granted somewhat more credit than your WU actually is worth. Doesn't this even add some additional thrill to the process ? ;-) The randomness will very quickly be averaged out such that for a reasonably fast machine even daily credit numbers will be quite accurate. I see the following advantages in this system: It is completely just, everyone is treated the same. No work is lost as would be the case with the usual kind of quorum/redundancy. It seems to be fairly easy to implement, no code needs to be added to the Rosetta application as for the Flops-counting. And don't forget the added thrill aspect: Each time a WU is returned you will keep your fingers crossed that this one will be grouped together with even slower/longer WUs. Well, what do you think ? ID: 4932 · Rating: 0 · rate: / Reply Quote