Making Rosetta more efficient?

Message boards : Number crunching : Making Rosetta more efficient?

To post messages, you must log in.

AuthorMessage
soriak

Send message
Joined: 25 Oct 05
Posts: 102
Credit: 137,632
RAC: 0
Message 22321 - Posted: 12 Aug 2006, 6:05:54 UTC

Let me start off with a disclaimer right away: I really have no idea what the application does 'behind the scenes' so this suggestion is merely based on observation of the screensaver and what I imagine is the case. If you have a minute, I'd appreciate if you could tell me why this may or may not work ;)


The calculation for each model is a two-step process. First we have a "fast" stage (ab initio?) which is like a rough estimation. The "best guess" of that stage is then 'relaxed' in a second much slower process. At that stage only small modifications to the structure are made.

So basically we have first step: big change in structure, big change in energy - 2nd step: small change in structure, small change in energy.

As it is now, Rosetta does this much slower relaxing for every best guess of a model, no matter what the energy of the rough estimate is. This, in my opinion, seems like somewhat of a waste.

Let's say my model comes to a rough estimate of -20, then the refinement may bring some changes (I'd imagine the difference depends on the model, but from what I've noticed it'd at best be +/- 20 or so), but it would never get close to the -250 someone else may have come up with.


Wouldn't it then save a lot of time and effort if the Rosetta application could get the current 'best' result from the server when it connects to download new work units?

So Rosetta would know that some user found a -250 energy structure, therefore everything that comes up with a higher energy than -200 in the first step, doesn't get relaxed.

From what I have observed, the relaxing really is what takes the most amount of time - if that can be eliminated for everything that is far off from the top prediction, many more models could be run. In the end, I don't think it makes a difference if my best find is -18, -22 or -130 if the best prediction is -250 - it's still no where close to what needs to be found.


I hope this isn't completly off-base, but at the same time I'm sure you'd have come up with that a long time ago if it were a valid approach. Are relaxed models maybe useful, even if they are far from the lowest energy model? Or can relaxing make a much bigger difference than I have observed?
ID: 22321 · Rating: 1 · rate: Rate + / Rate - Report as offensive    Reply Quote
Keith Akins

Send message
Joined: 22 Oct 05
Posts: 176
Credit: 71,779
RAC: 0
Message 22322 - Posted: 12 Aug 2006, 6:14:00 UTC
Last modified: 12 Aug 2006, 6:21:21 UTC

Sounds perfectly logical to me as a time saver. But someone from the project team really needs to weigh in on this as I don't know the exact reasoning behind the methods.
ID: 22322 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile River~~
Avatar

Send message
Joined: 15 Dec 05
Posts: 761
Credit: 285,578
RAC: 0
Message 22325 - Posted: 12 Aug 2006, 7:51:29 UTC - in response to Message 22322.  
Last modified: 12 Aug 2006, 7:53:03 UTC

It is a great idea and I hope that the project try it out over on Ralph, and bring it here if it seems to work out in practice.

Pedantic point:

Sounds perfectly logical to me as a time saver.


More exactly, it will not shorten the run time of a WU, but will increase the number of runs in the single workunit. Unlike most other projects, Rosetta WU aim to run for approximately the same time regardless of how long the individual models take. The app adjusts the number of models tried in a WU to get somewhere near to the desired runtime(*).

The project will get to their target of N million runs in less WU which would be boost overall production for the same pool of computers.

So I'd call it a productivity boost rather than a time saver. But I guess you knew that anyway...

River~~


(*)By the way, for anyone who does not already know, you can change this setting yourself, it is in the Rosetta prefs page. The app always does a whole number of runs, so don't expect exact timings -- especially not if you select short runs on a slow box!


ID: 22325 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Christoph Jansen
Avatar

Send message
Joined: 6 Jun 06
Posts: 248
Credit: 267,153
RAC: 0
Message 22328 - Posted: 12 Aug 2006, 8:18:09 UTC - in response to Message 22321.  

Let's say my model comes to a rough estimate of -20, then the refinement may bring some changes (I'd imagine the difference depends on the model, but from what I've noticed it'd at best be +/- 20 or so), but it would never get close to the -250 someone else may have come up with.


Hi Soriak,

that is an incorrect assumption. If the model is already close to a good structure prediction it may still be very high in energy - and a very low energy may correspond to a pretty high RMSD:



You see that you can almost draw a line down the structures with a very low RMSD but still have all possible energies. And you can also go to the right at the level of -110 and have all RMSD values from a very low one to a bad one of over 10. There are a lot of structures that come pretty close to the correct one but do not match closely enough to relly reach the best on from there. You can imagine it like a very bumpy road along a coast and thus almost everywhere at sea level: many bumps will have comparable depths, but to find the truly deepest one you have to examine them all. So Rosetta simply has to go on trying at random and look what it finally produces.

The reason for this is not a flaw in Rosetta'S conception but the nature of the approach of making random changes: you do not work your way forward by chosing the next best step but by doing the next step and then deciding if it was better than the previous one.

The planetary explorer analogy from the welcome section describes that very well: it is no use saying that you have just reached a level of -150 m and everybody else should join if you are not sure it is in the same valley where the lowest ground level is fround.
ID: 22328 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
soriak

Send message
Joined: 25 Oct 05
Posts: 102
Credit: 137,632
RAC: 0
Message 22331 - Posted: 12 Aug 2006, 8:55:18 UTC
Last modified: 12 Aug 2006, 9:11:12 UTC

Hi Christoph,

I think you misunderstood my suggestion - I can see how I may not have been very detailed on this part: I don't think just because one way of searching produced a low energy model, we should all switch to that one - a method that produced high energies could still be the right approach for the lowest energy prediction. So in terms of the way we do the first stage, nothing would change.

I suppose in the canyon example it'd be like randomly dropping in one and saying "well, I think I'm about 15 meters below the surface, but I'll measure it exactly to make sure I get it right" when someone else is 200 meters below the surface. Sure, the first guy may end up being 16 or 17 meters down, but after the initial estimation we can be sure he's not the lowest - it's just a matter of how close to that he is, which (as I understand it) is something the project doesn't care about. All that matters is finding the lowest spot. (or getting very close to it)

As of now, the poor guy in his 15m hole doesn't know about the other one, so he puts a lot of effort into getting an accurate measurement. My idea would essentially give him a radio where he'd be informed about progress of other climbers and realize that he's no where close to 200m, so an accurate measurement wasn't needed.


You have to keep in mind that we only know the RMSD for proteins that are already known. If we are predicting unknown structures (which is the goal in the future) like in CASP, the lowest energy model is the one the project would (and did) go with.

So even though a high energy model may be much closer in terms of RMSD, it wouldn't be used and is also not the one we find in nature. The more models we calculate, the higher the probability gets to find the 'right' one - which would have the lowest energy.


edit:
When you run a current work unit, each model first does the large changes to find the 'best' structure - there we see a lot of jumping around. The relaxing, however, only happens to the best structure from the first run - not all of them. That's what makes me think the change can't be all that big, or we'd have to relax every ab initio prediction.

So if we can relax only the best of our current prediction, wouldn't it even better if we could only relax the best couple ones from everyone's prediction?
ID: 22331 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
tralala

Send message
Joined: 8 Apr 06
Posts: 376
Credit: 581,806
RAC: 0
Message 22336 - Posted: 12 Aug 2006, 13:08:49 UTC - in response to Message 22321.  

So basically we have first step: big change in structure, big change in energy - 2nd step: small change in structure, small change in energy.

...

Let's say my model comes to a rough estimate of -20, then the refinement may bring some changes (I'd imagine the difference depends on the model, but from what I've noticed it'd at best be +/- 20 or so), but it would never get close to the -250 someone else may have come up with.


I'm not sure whether this is true. If it is, your idea should work. I read somewhere that Rosetta aborts the relax stage in some cases where it does not make any progress. So part of your suggestion seems already to be implemented.

However if your above assumption is true more optimisation could be achieved with i.e. low-latency computing, where each WU reports back the so far lowest energy level which is then redistributed to the clients in order to make pruning decisions.
ID: 22336 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 22347 - Posted: 12 Aug 2006, 16:29:28 UTC

This is EXACTLY the type of breakthrough thinking that will leap-frog this science forward. I'm serious about that. But, I've had the same thought, and did some more reading, and watched my WUs as they run and what I conclude is that you can't see it coming. In other words, while it is certainly true that some models at the end of stage 1 will prove uninteresting... you can't predict by anything as simple as the resulting energy level at the end of stage 1 what your outcome of stage 2 might be.

Dr. Baker refers to this as "deep wells in the energy landscape". The model seems to be climbing a volcano looking for a LOWER elevation, things APPEAR to be going increasingly worse... and then WHAM! you fall in to the core of the volcano and find it drops well below the surrounding ground surface.

If you study your models as they crunch, see if you can catch 'em at step 340,000 and note their energy level and RMSD at that point... then see how the model finishes out.

Perhaps one of the project scientists could help us to find the data in the .out file produced. We'd like to know the lowest energy and RMSD found before and after the full atom relax stage. Or, if this isn't explicitly recorded in our .out files, I'm sure they have some data on such things.
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 22347 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Avi

Send message
Joined: 2 Aug 06
Posts: 58
Credit: 95,619
RAC: 0
Message 22431 - Posted: 14 Aug 2006, 2:49:11 UTC - in response to Message 22347.  

Hmm, I spent some time watching the screensaver too. While its doing all these random things, sometimes the energy spikes up high, which I know is a bad thing. sometimes also it drop down low (these both happen during both stages)
I don't know the math or chemistry behind this, but since this seems to be math, could these "bad" changes be somehow stored and then not used? Or maybe its not worthwhile to check verse a DB (bandwith and processing time) but instead the data could help develop a trend towards making these algorithms more effecient.
Just my thoughts.
-Avi
ID: 22431 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Keith Akins

Send message
Joined: 22 Oct 05
Posts: 176
Credit: 71,779
RAC: 0
Message 22437 - Posted: 14 Aug 2006, 6:04:45 UTC

Seems I heard Dr. Baker once say that the low resolution (AB) search had to yeild a structure in the ballpark in order to "...smack the nail on the head" during refinement (relax).
ID: 22437 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
David Baker
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 17 Sep 05
Posts: 705
Credit: 559,847
RAC: 0
Message 22440 - Posted: 14 Aug 2006, 6:40:53 UTC - in response to Message 22321.  

Let me start off with a disclaimer right away: I really have no idea what the application does 'behind the scenes' so this suggestion is merely based on observation of the screensaver and what I imagine is the case. If you have a minute, I'd appreciate if you could tell me why this may or may not work ;)


The calculation for each model is a two-step process. First we have a "fast" stage (ab initio?) which is like a rough estimation. The "best guess" of that stage is then 'relaxed' in a second much slower process. At that stage only small modifications to the structure are made.

So basically we have first step: big change in structure, big change in energy - 2nd step: small change in structure, small change in energy.

As it is now, Rosetta does this much slower relaxing for every best guess of a model, no matter what the energy of the rough estimate is. This, in my opinion, seems like somewhat of a waste.

Let's say my model comes to a rough estimate of -20, then the refinement may bring some changes (I'd imagine the difference depends on the model, but from what I've noticed it'd at best be +/- 20 or so), but it would never get close to the -250 someone else may have come up with.


Wouldn't it then save a lot of time and effort if the Rosetta application could get the current 'best' result from the server when it connects to download new work units?

So Rosetta would know that some user found a -250 energy structure, therefore everything that comes up with a higher energy than -200 in the first step, doesn't get relaxed.

From what I have observed, the relaxing really is what takes the most amount of time - if that can be eliminated for everything that is far off from the top prediction, many more models could be run. In the end, I don't think it makes a difference if my best find is -18, -22 or -130 if the best prediction is -250 - it's still no where close to what needs to be found.


I hope this isn't completly off-base, but at the same time I'm sure you'd have come up with that a long time ago if it were a valid approach. Are relaxed models maybe useful, even if they are far from the lowest energy model? Or can relaxing make a much bigger difference than I have observed?


These are good ideas. We are doing several things along these lines. First, as you can see from the banding pattern on the top prediction graphs, we terminate relaxing at two steps (about 20% and about 50% through) if the energy is high compared to other trajectories at these stages. This saves a lot of time since these higher energy trajectories are probably stuck in unpromising regions of the landscape (there is of course a non zero risk here as the calculation could be climbing a high mountain pass with the golden valley beyond ...).
Second, the conformational space annealing approach we used for some of the CASP targets revisits the most promising valleys found in an initial set of runs while forcing the explorers revisiting a given valley to stay as spread out as possible.


ID: 22440 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : Making Rosetta more efficient?



©2024 University of Washington
https://www.bakerlab.org