Are we any closer?

Message boards : Rosetta@home Science : Are we any closer?

To post messages, you must log in.

AuthorMessage
Profile Ed and Harriet Griffith
Avatar

Send message
Joined: 17 Sep 05
Posts: 39
Credit: 1,896,635
RAC: 1,112
Message 4017 - Posted: 23 Nov 2005, 1:42:01 UTC

Do the new work units take us any closer to finding the lowest energy protein fold? To my unscientific eye it looks from the diagrams like some do and some don't.
Ed Griffith

ID: 4017 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
David Baker
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 17 Sep 05
Posts: 705
Credit: 559,847
RAC: 0
Message 4026 - Posted: 23 Nov 2005, 4:25:29 UTC - in response to Message 4017.  

Do the new work units take us any closer to finding the lowest energy protein fold? To my unscientific eye it looks from the diagrams like some do and some don't.
Ed Griffith


The answer thus far is that if you happen to be lucky and begin close, the new work units take you closer than before, but if you are too far away to begin with, they can't help you. The best demonstration of this is some calculations I did on our local computers where I started from the lowest rmsd structures found in the past week--I'll post these as soon as I get a chance.

Yesterday we started calculations on four new proteins which based on our past experience are likely to be easier than the two we have been running for the past two weeks (1hz6 and 1n0u). For these two protiens, which we knew would be hard problems to begin with, I found lower energy structures when I started from very low rmsd structures than anybody found starting from an extended chain. This suggests that with more sampling, the lowest energy structures would be very close to correct. But "more" in this case means a lot more, so we are going to return to these two proteins in a month or two, when, if we extrapolate the current growth trends, we should have 10x more compute power!

Also coming up in the next couple of weeks will be some new approaches to the sampling problem, in which we attempt to cover space as evenly as possible (you may have noticed that for 1n0u, there is an annoying "trap" around 9-10A from the correct structure) and the first HIV vaccine design calculations. Vaccine design is something I'm very excited about currently and I'll post an explanation of how we are going about this soon.
ID: 4026 · Rating: 1 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Webmaster Yoda
Avatar

Send message
Joined: 17 Sep 05
Posts: 161
Credit: 162,253
RAC: 0
Message 4027 - Posted: 23 Nov 2005, 4:47:04 UTC - in response to Message 4026.  
Last modified: 23 Nov 2005, 4:48:34 UTC

I notice on the 1hz6 results (I don't see results for 1n0u yet) that quite a lot of the calculations have energy levels in line with the native structure, but we're still quite a way off in terms of the RMSD. In fact, it looks like the lowest one found is way off the native structure (at an RMSD of about 12)
*** Join BOINC@Australia today ***
ID: 4027 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dgnuff
Avatar

Send message
Joined: 1 Nov 05
Posts: 350
Credit: 24,773,605
RAC: 0
Message 4035 - Posted: 23 Nov 2005, 7:32:50 UTC

Two thoughts occur to me about this.

Firstly, it's going to take a LOT of patience. Find-a-Drug crunched away for quite a few years, running an incredibly highly optimized program. With help from Intel, Keith Davis was able to get a 40 to 1 speed increase in the core algorithm. Even with that, we took a number of years. Without that speedup, we'd not have completed the work in anyone's lifetime.

However, Rosetta at home is somewhat unusual in that we're getting in right on the ground floor. Not only will we be doing the heavyweight crunching when the time comes, but right now we're helping to optimize the algorithms themselves. Not only are we helping with the research, we're helping to work out HOW to do the research.

So this means that we probably won't see anything tangible for a good long while. Whatever you do, don't lose heart because of this. The work we're doing now, even though it doesn't APPEAR to be producing results, is potentially worth its weight in gold, in terms of what it will allow us to do in the future.
ID: 4035 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Christian Diepold
Avatar

Send message
Joined: 23 Sep 05
Posts: 37
Credit: 300,225
RAC: 0
Message 4037 - Posted: 23 Nov 2005, 7:53:19 UTC

Well said dgnuff!

For me it was indeed a strange feeling to realize that I wasn't actually looking for any cure but for a mere method of how to find cures - at least for the time being. But it's true that the basic work has to be done as well. And I'd rather stick with R@H right from the start and in a few years have the knowledge that I helped the project in every aspect there was, from theroy and method to an actual search and result.
ID: 4037 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Hoelder1in
Avatar

Send message
Joined: 30 Sep 05
Posts: 169
Credit: 3,915,947
RAC: 0
Message 4049 - Posted: 23 Nov 2005, 13:16:12 UTC
Last modified: 23 Nov 2005, 13:26:55 UTC

I guess I have a couple of questions:
But "more" in this case means a lot more, so we are going to return to these two proteins in a month or two, when, if we extrapolate the current growth trends, we should have 10x more compute power!

So, is there some argument (that can be explained in lay-persons terms) for the expectation that we need a factor of 10 more sampling (as opposed to, say, a factor 1,000 or a factor 1,000,000) ? In this thread I argued that, due to the large number of free parameters, the search space volume corresponding to rmsd < 1 A is such a tiny fraction of the rmsd < 2 A volume that one can hardly hope to find it by random sampling of the rmsd < 2 A volume. At the time David convinced me that, due to all sorts of constraints between the parameters, the effective number of independent, free parameters is really much smaller than I had assumed. But still, even if the number of free parameters were only, say, 10 or 20, using that same argument, a factor of 10 more sampling would reduce the lowest rmsd value by just something like 10 or 20 %. Would that be enough for our purpose ? Can the number of independent, free parameters really be that small ?

Taking the liberty to ask one more question, I am somewhat confused about the energy scale of the plots. When in the original runs all the energy values were below zero (the no_cst ones in the results section), I had assumed that the energy is something like the binding energy of the molecule and that E>0 corresponds to unbound states. Since in the newer runs some of the energy values are now above zero, this can't possibly be the case. So what does E=0 actually stand for, or is this just an arbitrary scale ?

-Hermann
ID: 4049 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
David Baker
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 17 Sep 05
Posts: 705
Credit: 559,847
RAC: 0
Message 4059 - Posted: 23 Nov 2005, 15:40:13 UTC - in response to Message 4035.  

Two thoughts occur to me about this.

Firstly, it's going to take a LOT of patience. Find-a-Drug crunched away for quite a few years, running an incredibly highly optimized program. With help from Intel, Keith Davis was able to get a 40 to 1 speed increase in the core algorithm. Even with that, we took a number of years. Without that speedup, we'd not have completed the work in anyone's lifetime.

However, Rosetta at home is somewhat unusual in that we're getting in right on the ground floor. Not only will we be doing the heavyweight crunching when the time comes, but right now we're helping to optimize the algorithms themselves. Not only are we helping with the research, we're helping to work out HOW to do the research.

So this means that we probably won't see anything tangible for a good long while. Whatever you do, don't lose heart because of this. The work we're doing now, even though it doesn't APPEAR to be producing results, is potentially worth its weight in gold, in terms of what it will allow us to do in the future.


You have all made very good points!

Yes--a difference between this project and other distributed computing projects is that you are actively participating in methods development efforts rather than large scale production level runs. Why are we doing methods development? It is because computational modeling of biological systems is still in its infancy, and if we can make the methods better, we will be able to produce much more accurate and useful models. While Rosetta is probably the best biological structure modeling program in existence today, we feel that it is important to work to make it still better. While you so far have seen the methods development aspect primarily, the program is being used for concrete applications already, for example we just helped build a model of an infectious component of anthrax toxin.

As I said below, we are actively engaged in trying to design a vaccine for HIV. In the near future we will start distributing vaccine design calculations along with the structure prediction methods development calculations, so you will be doing a mix of basic research with long term payoff, and vaccine design with potentially huge short term payoff.


ID: 4059 · Rating: -1 · rate: Rate + / Rate - Report as offensive    Reply Quote
David Baker
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 17 Sep 05
Posts: 705
Credit: 559,847
RAC: 0
Message 4077 - Posted: 23 Nov 2005, 18:51:20 UTC - in response to Message 4049.  

I guess I have a couple of questions:
But "more" in this case means a lot more, so we are going to return to these two proteins in a month or two, when, if we extrapolate the current growth trends, we should have 10x more compute power!

So, is there some argument (that can be explained in lay-persons terms) for the expectation that we need a factor of 10 more sampling (as opposed to, say, a factor 1,000 or a factor 1,000,000) ? In this thread I argued that, due to the large number of free parameters, the search space volume corresponding to rmsd < 1 A is such a tiny fraction of the rmsd < 2 A volume that one can hardly hope to find it by random sampling of the rmsd < 2 A volume. At the time David convinced me that, due to all sorts of constraints between the parameters, the effective number of independent, free parameters is really much smaller than I had assumed. But still, even if the number of free parameters were only, say, 10 or 20, using that same argument, a factor of 10 more sampling would reduce the lowest rmsd value by just something like 10 or 20 %. Would that be enough for our purpose ? Can the number of independent, free parameters really be that small ?

Taking the liberty to ask one more question, I am somewhat confused about the energy scale of the plots. When in the original runs all the energy values were below zero (the no_cst ones in the results section), I had assumed that the energy is something like the binding energy of the molecule and that E>0 corresponds to unbound states. Since in the newer runs some of the energy values are now above zero, this can't possibly be the case. So what does E=0 actually stand for, or is this just an arbitrary scale ?

-Hermann


1) My estimate is based on the results of the "cheat" runs. The state of each residue can be described by two angles, called phi and psi. the values that phi and psi can take on are restricted by the local geometry of the chain, and the allowed values fall into three reasonably well separated regions. In the cheat runs, for two or three residues for which the correct state was underrepresented in the original runs, I specified which of the three regions the phi and psi angles had to be in. The lowest rmsd structures in this case relax using the improved protocol to very low energies and rmsds (around 1A for 1n0u and 1hz6). Without knowing the answer, we could carry out calculations for all combinations of these problem residues, which would take roughly ten fold more computer time.

2) Essentially all of the interactions are attractive (negative interaction energy), except for atomic overlaps, which can be large and positive. structures with substantial numbers of clashes can have positive overall energies.
ID: 4077 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Hoelder1in
Avatar

Send message
Joined: 30 Sep 05
Posts: 169
Credit: 3,915,947
RAC: 0
Message 4078 - Posted: 23 Nov 2005, 19:29:12 UTC - in response to Message 4077.  

1) My estimate is based on the results of the "cheat" runs. The state of each residue can be described by two angles, called phi and psi. the values that phi and psi can take on are restricted by the local geometry of the chain, and the allowed values fall into three reasonably well separated regions. In the cheat runs, for two or three residues for which the correct state was underrepresented in the original runs, I specified which of the three regions the phi and psi angles had to be in. The lowest rmsd structures in this case relax using the improved protocol to very low energies and rmsds (around 1A for 1n0u and 1hz6). Without knowing the answer, we could carry out calculations for all combinations of these problem residues, which would take roughly ten fold more computer time.

2) Essentially all of the interactions are attractive (negative interaction energy), except for atomic overlaps, which can be large and positive. structures with substantial numbers of clashes can have positive overall energies.

Many thanks, once again, for the quick and detailed response. I had almost forgotten about those "cheat" runs. ;-) -H.B.
ID: 4078 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Doug Worrall
Avatar

Send message
Joined: 19 Sep 05
Posts: 60
Credit: 58,445
RAC: 0
Message 4088 - Posted: 23 Nov 2005, 20:33:58 UTC

Hello,
Just read this whole Post "Page",and am really impressed with Rosetta
and her Scientists,Moderators and Crunchers.Everyone is on the Ball
here,have chosen Rosetta over P.due to the Information available.And
the Wonderful 7 hour w/u.Nice size,great advances in these Proteins
ect...Coodos to all.And Ed and Harriet Griffith that began this thread.Fellow
Team mates.
Happy Crunching All
Sincerely
Doug Worrall
ID: 4088 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile nasher

Send message
Joined: 5 Nov 05
Posts: 98
Credit: 618,288
RAC: 0
Message 4162 - Posted: 24 Nov 2005, 16:27:16 UTC

its nice to see such interaction with the scientists

that is one of the reasons i picked this project

I am hopeing we find a better way to find a cure also so we get every disease cured before someone i know dies of it


ID: 4162 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile hob.
Avatar

Send message
Joined: 4 Nov 05
Posts: 64
Credit: 250,683
RAC: 0
Message 4176 - Posted: 24 Nov 2005, 18:39:19 UTC - in response to Message 4026.  



so we are going to return to these two proteins in a month or two, when, if we extrapolate the current growth trends, we should have 10x more compute power!



i would expect to see a jump in power on december 17th ....when FaD closes


46 years dc so far

join team FaDbeens
join us

ID: 4176 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Honza

Send message
Joined: 18 Sep 05
Posts: 48
Credit: 173,517
RAC: 0
Message 4181 - Posted: 24 Nov 2005, 20:00:16 UTC - in response to Message 4176.  

i would expect to see a jump in power on december 17th ....when FaD closes

Significant imcome of new users can be seen among SETI (900 vs. 2500 last day), CPDN (200 vs 600), Rosetta (200 vs. 600-700). Those were mentioned in SETI/Classic shutdown e-mail.
Also Einstein has 450 instead of average 300 news users today.

I would expect the UCB wanted to avoid the spike or users (jump in power) so that 1. servers can hold the load (CPU, database, WUs pool, internet traffic), 2. moderators and those who care to help newcomers would be able to handle it.

I think same apply to FaD as well..

ID: 4181 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
marshall2k
Avatar

Send message
Joined: 3 Nov 05
Posts: 25
Credit: 22,981
RAC: 0
Message 4198 - Posted: 24 Nov 2005, 22:32:26 UTC
Last modified: 24 Nov 2005, 22:33:31 UTC

Check out this graph, prety impressive growth over the last 2 days!


ID: 4198 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Cureseekers~Nightanimal
Avatar

Send message
Joined: 20 Nov 05
Posts: 19
Credit: 26,396
RAC: 0
Message 4204 - Posted: 24 Nov 2005, 23:09:28 UTC - in response to Message 4198.  
Last modified: 24 Nov 2005, 23:13:58 UTC

[quote]Check out this graph, prety impressive growth over the last 2 days!

[Img]

Ah just only this graph alone could stimulate people to help with Rosetta, if we could motivate them too with the purpose of the project, then we can expect many many Rosetta-ers too come.... (i hope :p) i do have a good hope of FaD refugees too come to this project...i do see this already on my own dutch power cows team.
The signature is away on the moment, just leave a message after the beep
ID: 4204 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Honza

Send message
Joined: 18 Sep 05
Posts: 48
Credit: 173,517
RAC: 0
Message 4205 - Posted: 24 Nov 2005, 23:18:55 UTC - in response to Message 4204.  
Last modified: 24 Nov 2005, 23:19:50 UTC

Check out this graph, prety impressive growth over the last 2 days!

Exactly what i meant.

i do have a good hope of FaD refugees too come to this project...i do see this already on my own dutch power cows team.

Not sure how many new users comes from FaD and how many from SETI Classic.
Yes, there is a better change that FaD regugees will stay there. And I hope so as well. But the increase in SETI, CPDN a Rosetta user base recently correlate more with SETI Classic shutdown and transition to BOINC mass e-mail.
Either way, Rosetta gains :-)


ID: 4205 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Rosetta@home Science : Are we any closer?



©2024 University of Washington
https://www.bakerlab.org