Are we any closer?

Author	Message
Ed and Harriet Griffith Send message Joined: 17 Sep 05 Posts: 39 Credit: 2,038,891 RAC: 0	Message 4017 - Posted: 23 Nov 2005, 1:42:01 UTC Do the new work units take us any closer to finding the lowest energy protein fold? To my unscientific eye it looks from the diagrams like some do and some don't. Ed Griffith ID: 4017 · Rating: 0 · rate: / Reply Quote

David Baker Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 17 Sep 05 Posts: 705 Credit: 559,847 RAC: 0	Message 4026 - Posted: 23 Nov 2005, 4:25:29 UTC - in response to Message 4017. Do the new work units take us any closer to finding the lowest energy protein fold? To my unscientific eye it looks from the diagrams like some do and some don't. Ed Griffith The answer thus far is that if you happen to be lucky and begin close, the new work units take you closer than before, but if you are too far away to begin with, they can't help you. The best demonstration of this is some calculations I did on our local computers where I started from the lowest rmsd structures found in the past week--I'll post these as soon as I get a chance. Yesterday we started calculations on four new proteins which based on our past experience are likely to be easier than the two we have been running for the past two weeks (1hz6 and 1n0u). For these two protiens, which we knew would be hard problems to begin with, I found lower energy structures when I started from very low rmsd structures than anybody found starting from an extended chain. This suggests that with more sampling, the lowest energy structures would be very close to correct. But "more" in this case means a lot more, so we are going to return to these two proteins in a month or two, when, if we extrapolate the current growth trends, we should have 10x more compute power! Also coming up in the next couple of weeks will be some new approaches to the sampling problem, in which we attempt to cover space as evenly as possible (you may have noticed that for 1n0u, there is an annoying "trap" around 9-10A from the correct structure) and the first HIV vaccine design calculations. Vaccine design is something I'm very excited about currently and I'll post an explanation of how we are going about this soon. ID: 4026 · Rating: 1 · rate: / Reply Quote

Webmaster Yoda Send message Joined: 17 Sep 05 Posts: 161 Credit: 162,253 RAC: 0	Message 4027 - Posted: 23 Nov 2005, 4:47:04 UTC - in response to Message 4026. Last modified: 23 Nov 2005, 4:48:34 UTC I notice on the 1hz6 results (I don't see results for 1n0u yet) that quite a lot of the calculations have energy levels in line with the native structure, but we're still quite a way off in terms of the RMSD. In fact, it looks like the lowest one found is way off the native structure (at an RMSD of about 12) * Join BOINC@Australia today * ID: 4027 · Rating: 0 · rate: / Reply Quote

dgnuff Send message Joined: 1 Nov 05 Posts: 350 Credit: 24,773,605 RAC: 0	Message 4035 - Posted: 23 Nov 2005, 7:32:50 UTC Two thoughts occur to me about this. Firstly, it's going to take a LOT of patience. Find-a-Drug crunched away for quite a few years, running an incredibly highly optimized program. With help from Intel, Keith Davis was able to get a 40 to 1 speed increase in the core algorithm. Even with that, we took a number of years. Without that speedup, we'd not have completed the work in anyone's lifetime. However, Rosetta at home is somewhat unusual in that we're getting in right on the ground floor. Not only will we be doing the heavyweight crunching when the time comes, but right now we're helping to optimize the algorithms themselves. Not only are we helping with the research, we're helping to work out HOW to do the research. So this means that we probably won't see anything tangible for a good long while. Whatever you do, don't lose heart because of this. The work we're doing now, even though it doesn't APPEAR to be producing results, is potentially worth its weight in gold, in terms of what it will allow us to do in the future. ID: 4035 · Rating: 0 · rate: / Reply Quote

Christian Diepold Send message Joined: 23 Sep 05 Posts: 37 Credit: 300,225 RAC: 0	Message 4037 - Posted: 23 Nov 2005, 7:53:19 UTC Well said dgnuff! For me it was indeed a strange feeling to realize that I wasn't actually looking for any cure but for a mere method of how to find cures - at least for the time being. But it's true that the basic work has to be done as well. And I'd rather stick with R@H right from the start and in a few years have the knowledge that I helped the project in every aspect there was, from theroy and method to an actual search and result. ID: 4037 · Rating: 0 · rate: / Reply Quote

Hoelder1in Send message Joined: 30 Sep 05 Posts: 169 Credit: 3,915,947 RAC: 0	Message 4049 - Posted: 23 Nov 2005, 13:16:12 UTC Last modified: 23 Nov 2005, 13:26:55 UTC I guess I have a couple of questions: But "more" in this case means a lot more, so we are going to return to these two proteins in a month or two, when, if we extrapolate the current growth trends, we should have 10x more compute power! So, is there some argument (that can be explained in lay-persons terms) for the expectation that we need a factor of 10 more sampling (as opposed to, say, a factor 1,000 or a factor 1,000,000) ? In this thread I argued that, due to the large number of free parameters, the search space volume corresponding to rmsd < 1 A is such a tiny fraction of the rmsd < 2 A volume that one can hardly hope to find it by random sampling of the rmsd < 2 A volume. At the time David convinced me that, due to all sorts of constraints between the parameters, the effective number of independent, free parameters is really much smaller than I had assumed. But still, even if the number of free parameters were only, say, 10 or 20, using that same argument, a factor of 10 more sampling would reduce the lowest rmsd value by just something like 10 or 20 %. Would that be enough for our purpose ? Can the number of independent, free parameters really be that small ? Taking the liberty to ask one more question, I am somewhat confused about the energy scale of the plots. When in the original runs all the energy values were below zero (the no_cst ones in the results section), I had assumed that the energy is something like the binding energy of the molecule and that E>0 corresponds to unbound states. Since in the newer runs some of the energy values are now above zero, this can't possibly be the case. So what does E=0 actually stand for, or is this just an arbitrary scale ? -Hermann ID: 4049 · Rating: 0 · rate: / Reply Quote

David Baker Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 17 Sep 05 Posts: 705 Credit: 559,847 RAC: 0	Message 4059 - Posted: 23 Nov 2005, 15:40:13 UTC - in response to Message 4035. Two thoughts occur to me about this. Firstly, it's going to take a LOT of patience. Find-a-Drug crunched away for quite a few years, running an incredibly highly optimized program. With help from Intel, Keith Davis was able to get a 40 to 1 speed increase in the core algorithm. Even with that, we took a number of years. Without that speedup, we'd not have completed the work in anyone's lifetime. However, Rosetta at home is somewhat unusual in that we're getting in right on the ground floor. Not only will we be doing the heavyweight crunching when the time comes, but right now we're helping to optimize the algorithms themselves. Not only are we helping with the research, we're helping to work out HOW to do the research. So this means that we probably won't see anything tangible for a good long while. Whatever you do, don't lose heart because of this. The work we're doing now, even though it doesn't APPEAR to be producing results, is potentially worth its weight in gold, in terms of what it will allow us to do in the future. You have all made very good points! Yes--a difference between this project and other distributed computing projects is that you are actively participating in methods development efforts rather than large scale production level runs. Why are we doing methods development? It is because computational modeling of biological systems is still in its infancy, and if we can make the methods better, we will be able to produce much more accurate and useful models. While Rosetta is probably the best biological structure modeling program in existence today, we feel that it is important to work to make it still better. While you so far have seen the methods development aspect primarily, the program is being used for concrete applications already, for example we just helped build a model of an infectious component of anthrax toxin. As I said below, we are actively engaged in trying to design a vaccine for HIV. In the near future we will start distributing vaccine design calculations along with the structure prediction methods development calculations, so you will be doing a mix of basic research with long term payoff, and vaccine design with potentially huge short term payoff. ID: 4059 · Rating: -1 · rate: / Reply Quote

David Baker Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 17 Sep 05 Posts: 705 Credit: 559,847 RAC: 0	Message 4077 - Posted: 23 Nov 2005, 18:51:20 UTC - in response to Message 4049. I guess I have a couple of questions: But "more" in this case means a lot more, so we are going to return to these two proteins in a month or two, when, if we extrapolate the current growth trends, we should have 10x more compute power! So, is there some argument (that can be explained in lay-persons terms) for the expectation that we need a factor of 10 more sampling (as opposed to, say, a factor 1,000 or a factor 1,000,000) ? In this thread I argued that, due to the large number of free parameters, the search space volume corresponding to rmsd < 1 A is such a tiny fraction of the rmsd < 2 A volume that one can hardly hope to find it by random sampling of the rmsd < 2 A volume. At the time David convinced me that, due to all sorts of constraints between the parameters, the effective number of independent, free parameters is really much smaller than I had assumed. But still, even if the number of free parameters were only, say, 10 or 20, using that same argument, a factor of 10 more sampling would reduce the lowest rmsd value by just something like 10 or 20 %. Would that be enough for our purpose ? Can the number of independent, free parameters really be that small ? Taking the liberty to ask one more question, I am somewhat confused about the energy scale of the plots. When in the original runs all the energy values were below zero (the no_cst ones in the results section), I had assumed that the energy is something like the binding energy of the molecule and that E>0 corresponds to unbound states. Since in the newer runs some of the energy values are now above zero, this can't possibly be the case. So what does E=0 actually stand for, or is this just an arbitrary scale ? -Hermann 1) My estimate is based on the results of the "cheat" runs. The state of each residue can be described by two angles, called phi and psi. the values that phi and psi can take on are restricted by the local geometry of the chain, and the allowed values fall into three reasonably well separated regions. In the cheat runs, for two or three residues for which the correct state was underrepresented in the original runs, I specified which of the three regions the phi and psi angles had to be in. The lowest rmsd structures in this case relax using the improved protocol to very low energies and rmsds (around 1A for 1n0u and 1hz6). Without knowing the answer, we could carry out calculations for all combinations of these problem residues, which would take roughly ten fold more computer time. 2) Essentially all of the interactions are attractive (negative interaction energy), except for atomic overlaps, which can be large and positive. structures with substantial numbers of clashes can have positive overall energies. ID: 4077 · Rating: 0 · rate: / Reply Quote

Hoelder1in Send message Joined: 30 Sep 05 Posts: 169 Credit: 3,915,947 RAC: 0	Message 4078 - Posted: 23 Nov 2005, 19:29:12 UTC - in response to Message 4077. 1) My estimate is based on the results of the "cheat" runs. The state of each residue can be described by two angles, called phi and psi. the values that phi and psi can take on are restricted by the local geometry of the chain, and the allowed values fall into three reasonably well separated regions. In the cheat runs, for two or three residues for which the correct state was underrepresented in the original runs, I specified which of the three regions the phi and psi angles had to be in. The lowest rmsd structures in this case relax using the improved protocol to very low energies and rmsds (around 1A for 1n0u and 1hz6). Without knowing the answer, we could carry out calculations for all combinations of these problem residues, which would take roughly ten fold more computer time. 2) Essentially all of the interactions are attractive (negative interaction energy), except for atomic overlaps, which can be large and positive. structures with substantial numbers of clashes can have positive overall energies. Many thanks, once again, for the quick and detailed response. I had almost forgotten about those "cheat" runs. ;-) -H.B. ID: 4078 · Rating: 0 · rate: / Reply Quote

Doug Worrall Send message Joined: 19 Sep 05 Posts: 60 Credit: 58,445 RAC: 0	Message 4088 - Posted: 23 Nov 2005, 20:33:58 UTC Hello, Just read this whole Post "Page",and am really impressed with Rosetta and her Scientists,Moderators and Crunchers.Everyone is on the Ball here,have chosen Rosetta over P.due to the Information available.And the Wonderful 7 hour w/u.Nice size,great advances in these Proteins ect...Coodos to all.And Ed and Harriet Griffith that began this thread.Fellow Team mates. Happy Crunching All Sincerely Doug Worrall ID: 4088 · Rating: 0 · rate: / Reply Quote

nasher Send message Joined: 5 Nov 05 Posts: 98 Credit: 890,793 RAC: 0	Message 4162 - Posted: 24 Nov 2005, 16:27:16 UTC its nice to see such interaction with the scientists that is one of the reasons i picked this project I am hopeing we find a better way to find a cure also so we get every disease cured before someone i know dies of it ID: 4162 · Rating: 0 · rate: / Reply Quote

hob. Send message Joined: 4 Nov 05 Posts: 64 Credit: 250,683 RAC: 0	Message 4176 - Posted: 24 Nov 2005, 18:39:19 UTC - in response to Message 4026. so we are going to return to these two proteins in a month or two, when, if we extrapolate the current growth trends, we should have 10x more compute power! i would expect to see a jump in power on december 17th ....when FaD closes 46 years dc so far join team FaDbeens join us ID: 4176 · Rating: 0 · rate: / Reply Quote

Honza Send message Joined: 18 Sep 05 Posts: 48 Credit: 173,517 RAC: 0	Message 4181 - Posted: 24 Nov 2005, 20:00:16 UTC - in response to Message 4176. i would expect to see a jump in power on december 17th ....when FaD closes Significant imcome of new users can be seen among SETI (900 vs. 2500 last day), CPDN (200 vs 600), Rosetta (200 vs. 600-700). Those were mentioned in SETI/Classic shutdown e-mail. Also Einstein has 450 instead of average 300 news users today. I would expect the UCB wanted to avoid the spike or users (jump in power) so that 1. servers can hold the load (CPU, database, WUs pool, internet traffic), 2. moderators and those who care to help newcomers would be able to handle it. I think same apply to FaD as well.. ID: 4181 · Rating: 0 · rate: / Reply Quote

marshall2k Send message Joined: 3 Nov 05 Posts: 25 Credit: 22,981 RAC: 0	Message 4198 - Posted: 24 Nov 2005, 22:32:26 UTC Last modified: 24 Nov 2005, 22:33:31 UTC Check out this graph, prety impressive growth over the last 2 days! ID: 4198 · Rating: 0 · rate: / Reply Quote

Cureseekers~Nightanimal Send message Joined: 20 Nov 05 Posts: 19 Credit: 26,396 RAC: 0	Message 4204 - Posted: 24 Nov 2005, 23:09:28 UTC - in response to Message 4198. Last modified: 24 Nov 2005, 23:13:58 UTC [quote]Check out this graph, prety impressive growth over the last 2 days! [Img] Ah just only this graph alone could stimulate people to help with Rosetta, if we could motivate them too with the purpose of the project, then we can expect many many Rosetta-ers too come.... (i hope :p) i do have a good hope of FaD refugees too come to this project...i do see this already on my own dutch power cows team. The signature is away on the moment, just leave a message after the beep ID: 4204 · Rating: 0 · rate: / Reply Quote

Honza Send message Joined: 18 Sep 05 Posts: 48 Credit: 173,517 RAC: 0	Message 4205 - Posted: 24 Nov 2005, 23:18:55 UTC - in response to Message 4204. Last modified: 24 Nov 2005, 23:19:50 UTC Check out this graph, prety impressive growth over the last 2 days! Exactly what i meant. i do have a good hope of FaD refugees too come to this project...i do see this already on my own dutch power cows team. Not sure how many new users comes from FaD and how many from SETI Classic. Yes, there is a better change that FaD regugees will stay there. And I hope so as well. But the increase in SETI, CPDN a Rosetta user base recently correlate more with SETI Classic shutdown and transition to BOINC mass e-mail. Either way, Rosetta gains :-) ID: 4205 · Rating: 0 · rate: / Reply Quote