Christoph Jansen's formula of protein size and model completion times

Message boards : Rosetta@home Science : Christoph Jansen's formula of protein size and model completion times

To post messages, you must log in.

AuthorMessage
Profile Hoelder1in
Avatar

Send message
Joined: 30 Sep 05
Posts: 169
Credit: 3,915,947
RAC: 0
Message 24593 - Posted: 24 Aug 2006, 8:28:53 UTC
Last modified: 24 Aug 2006, 8:48:30 UTC

I copied this very interesting post by Christoph Jansen over from the Ralph forum because my comments would have been completely off-topic there:
...I have done a check of some 20 Rosetta WUs and have found out that the time to calculate one decoy is pretty exactly proportional to the number of amino acids in the protein to the power of 1,3.

My formula is (number of amino acids)^1.3*n(decoys) / time = const.(for a given machine)

It yields pretty good values that vary by an average of 2.3% around the median. I am still collecting numbers to compare, but the latest two samples I put in after adjusting the proportionality factor had 99,9 and 100,2 of the average "work factor" for my machine. And the length of proteins varies from 28 to 157 amino acids, which is a factor of nearly six in length.
I think it is reassuring that the model completion times only grow polynomially with the size of the protein and even with such a small exponent. One might have feared that, since the size of the parameter space grows exponentially with the number of amino acids in a protein, so do the model completion times, in which case the dependence would be:

<something>^<number of amino acids> * n/time = const, or more conveniently

<something> * <number of amino acids> + log(n/time) = const

So perhaps it is the number of required models (to reach a desired rmsd) that scales exponentially with the size of the protein, rather than the individual model completion times ? Or perhaps it is not protein size but contact order (how often the chain touches itself in the folded state - I hope I am right about that) which determines how many models are needed ? Well, you can't determine this from the data you have, Christoph but I am sure the Baker lab have figured this out. These scaling laws seem to be an excellent way to test the quality of the different algorithms (imagine that with your analysis you could determine that for one particlular WU type the exponent is, say 1.15, rather than 1.3...).

This is all very interesting and thought-provoking (much more so than the credit stuff) !
Team betterhumans.com - discuss and celebrate the future - hoelder1in.org
ID: 24593 · Rating: 1 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Christoph Jansen
Avatar

Send message
Joined: 6 Jun 06
Posts: 248
Credit: 267,153
RAC: 0
Message 24621 - Posted: 24 Aug 2006, 9:11:11 UTC - in response to Message 24593.  
Last modified: 24 Aug 2006, 9:11:56 UTC

Well, you can't determine this from the data you have, Christoph but I am sure the Baker lab have figured this out. These scaling laws seem to be an excellent way to test the quality of the different algorithms (imagine that with your analysis you could determine that for one particlular WU type the exponent is, say 1.15, rather than 1.3...).


Hi Hoelder1in,

very interesting thoughts by you. Not at all what I had in mind, but I only did that calculation out of interest and was surprised it came out so straightforwardly simple. Maybe a system identification for various algorithms can shed a light on the topic.

Whatever it says, my intention was only to share that observation. After all I am a chemist, and we are mostly awfully bad at math (I always admired those who weren't), else we would have become physicists ;-)

I would not be surprised if it leads to nothing and only expresses things the Baker team already knows in a different way.

Regards,

Christoph
ID: 24621 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Hoelder1in
Avatar

Send message
Joined: 30 Sep 05
Posts: 169
Credit: 3,915,947
RAC: 0
Message 24630 - Posted: 24 Aug 2006, 9:30:54 UTC - in response to Message 24621.  

After all I am a chemist...
I guess I was awfully bad in the chemistry lab or else I might have become a chemist. ;-)
Team betterhumans.com - discuss and celebrate the future - hoelder1in.org
ID: 24630 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Christoph Jansen
Avatar

Send message
Joined: 6 Jun 06
Posts: 248
Credit: 267,153
RAC: 0
Message 24646 - Posted: 24 Aug 2006, 9:53:29 UTC - in response to Message 24630.  

I guess I was awfully bad in the chemistry lab or else I might have become a chemist. ;-)


I worked in the lab course for physicists and had a group of eight people that I'd coach each term practically and in theory. I loved it and most of my colleagues loathed it because "them guys don't know anything". Pretty enervating attitude for both sides. I always learned something new or discovered connections that I should have noticed before but hadn't. Teaching is a great way of finding out if you really understood what you do yourself.
ID: 24646 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1831
Credit: 119,526,853
RAC: 5,737
Message 24740 - Posted: 24 Aug 2006, 16:37:21 UTC

Good work Hoelder1in ;)

From my understanding, it makes sense that the job times only increase proportionally with the number of amino acids as, in basic terms, the algorithm first tries the best fit for the first amino acid, and then moves along the chain to the next one. Each energy calculation for each amino acid will be the same process but in bigger proteins there will be more of these to do.

I guess this is good news for the project as we only need to increase the total amount of CPU power exponentially, and not the power of individual computers!
ID: 24740 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Hoelder1in
Avatar

Send message
Joined: 30 Sep 05
Posts: 169
Credit: 3,915,947
RAC: 0
Message 24743 - Posted: 24 Aug 2006, 16:51:38 UTC - in response to Message 24740.  

I guess this is good news for the project as we only need to increase the total amount of CPU power exponentially, and not the power of individual computers!
On the other hand, the power of individual computers does increase exponentially with time (the 18 months doubling time of computing power described by Moores's Law)...

Team betterhumans.com - discuss and celebrate the future - hoelder1in.org
ID: 24743 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1831
Credit: 119,526,853
RAC: 5,737
Message 24744 - Posted: 24 Aug 2006, 16:56:35 UTC - in response to Message 24743.  

On the other hand, the power of individual computers does increase exponentially with time (the 18 months doubling time of computing power described by Moores's Law)...

It does, but wouldn't that only be ok if we were only doubling the number of amino acids every 18 months? ;)

We can do them next week at this rate!
ID: 24744 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Rosetta@home Science : Christoph Jansen's formula of protein size and model completion times



©2024 University of Washington
https://www.bakerlab.org