DISCUSSION of Rosetta@home Journal (2)

Message boards : Rosetta@home Science : DISCUSSION of Rosetta@home Journal (2)

To post messages, you must log in.

1 · 2 · 3 · 4 · Next

AuthorMessage
Moderator9
Volunteer moderator

Send message
Joined: 22 Jan 06
Posts: 1014
Credit: 0
RAC: 0
Message 16598 - Posted: 19 May 2006, 1:32:38 UTC
Last modified: 19 May 2006, 4:22:42 UTC

This thread is a continuation of discussion of Dr. Bakers journal. The original thread with all prior posts can be found here
Moderator9
ROSETTA@home FAQ
Moderator Contact
ID: 16598 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Aglarond

Send message
Joined: 29 Jan 06
Posts: 26
Credit: 446,212
RAC: 0
Message 17589 - Posted: 4 Jun 2006, 0:14:53 UTC - in response to Message 17304.  
Last modified: 4 Jun 2006, 0:26:41 UTC

Why Rosetta, currently, does not use any optimization ?

Using 3Dnow! (for Atlhon XP+) and (sse2 for Pentium IV & others)
can shirink the CPU time required to finish a float-point WU by 6 times (1:6)
So, why not ?


Hmm.. I was thinking about the same few weeks ago. But I've read some posts from Akos F. and he explained, that just compilig with 3Dnow or sse2 may increase the speed by only 3%. If you want bigger increase you have to do some low-level programming in assembler. This is that "magic" that can give you 600% increase in speed. Of course 3Dnow and sse2 can be strong tools, but not by itself. They has to be used the right way.
ID: 17589 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile adrianxw
Avatar

Send message
Joined: 18 Sep 05
Posts: 653
Credit: 11,840,739
RAC: 51
Message 17618 - Posted: 4 Jun 2006, 16:15:36 UTC

I think suggestions such as...
if you have 287 work units still remaining on your computer you can delete them. Please keep all others running!

... should be echoed in this thread so crunchers can subscribe to that thread and kill obsolete wu's promptly and get on with the newer targets.

An aside, the "BakerBlog" now contains 3 months of posts, time for a "son of" BakerBlog? I believe the blog should be kept online though, perhaps on another page. It demonstrates more then anything, the commitment to communication shown by the Rosetta team, which I am sure is a part of the projects success.

Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 17618 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Chris

Send message
Joined: 5 Jun 06
Posts: 1
Credit: 94,712
RAC: 0
Message 17751 - Posted: 6 Jun 2006, 5:58:55 UTC - in response to Message 17335.  

The AP article on rosetta@home is out! See Ethan's post on the boards today. I think it turned out very well--what do you think? Lets hope lots of people see it.


this is how i found out about it, my comp has been crunchin since i turned all the stuff on.
ID: 17751 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Robert Everly

Send message
Joined: 8 Oct 05
Posts: 27
Credit: 665,094
RAC: 0
Message 18869 - Posted: 18 Jun 2006, 1:14:02 UTC

Are there other suggestions for feedback we could give? Certificates, etc. we could think about if people would like this, but we would certainly need this to be at least in part handled by a volunteer group as we are swamped with CASP.


Printable certificates would be cool. I'm sure someone that knows some PHP (not me) could modify the Seti certificates for use here. Would be neat to have them available for milestones and top predictions.

The seti certificates are located in the repository Here. They are Cert1, Cert2 & Cert3.



ID: 18869 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 19028 - Posted: 21 Jun 2006, 3:01:12 UTC

I believe I know the answer, but I thought this thread might be a good place for one of you more bio-techie folks to explain the item in the release notes today that says:
We can efficiently assemble predefined domains of the protein chain into a whole structure.

Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 19028 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Keck_Komputers
Avatar

Send message
Joined: 17 Sep 05
Posts: 211
Credit: 4,246,150
RAC: 0
Message 19035 - Posted: 21 Jun 2006, 7:21:32 UTC - in response to Message 18869.  

Are there other suggestions for feedback we could give? Certificates, etc. we could think about if people would like this, but we would certainly need this to be at least in part handled by a volunteer group as we are swamped with CASP.


Printable certificates would be cool. I'm sure someone that knows some PHP (not me) could modify the Seti certificates for use here. Would be neat to have them available for milestones and top predictions.

The seti certificates are located in the repository Here. They are Cert1, Cert2 & Cert3.



I have always liked this idea and have long hoped that someone would program a BOINC standard certificate into the baseline code. It would still require the projects to produce image(s) but would make things much simpler for smaller/new projects.
BOINC WIKI

BOINCing since 2002/12/8
ID: 19035 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Christoph Jansen
Avatar

Send message
Joined: 6 Jun 06
Posts: 248
Credit: 267,153
RAC: 0
Message 19036 - Posted: 21 Jun 2006, 7:30:38 UTC
Last modified: 21 Jun 2006, 7:35:32 UTC

I think it refers to this part of the research overview.

The first parts of a protein that will form a defined structure will be amino acids very close to each other. If you take naturally occurring structures matching short parts of the sequence of the protein you want to model this may give you a clue of how the protein looks piecewise.

The problem may however be to find an efficient way of connecting those small sequences to each other or to sequences in between to which no special structure has been assigned. There are a number of variables to be regarded as you will e.g. only want chemically valid angles and conformations to occur in your starting model or do not want parts to overlap in space. The overall asembly is like a 3d-jigsaw puzzle which needs to be solved.

So what I think it basically says is that they have found a way to assemble those bits and unstructured parts together in a way that is fast as well as accurate and complies with physical and chemical prerequisites.


"I know that you believe you understand what you think I said, but I'm not sure you realize that what you heard is not what I meant." R.M. Nixon
ID: 19036 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 19066 - Posted: 21 Jun 2006, 15:39:05 UTC

...many proteins consist of multiple independently folded "domains". In many cases, it is possible to recognize from the amino acid sequence roughly where the boundaries between the domains are, and in these cases we carry out folding calculations separately on each domain. This in the end produces models for different parts of an amino acid sequence, and we then need to assemble these into one coherenet structure. For this we use a protocol again very similar to what you have been running, except that the only variation allowed is in the linker between the domains, typically around 10 residues, while the intradomain structure is kept fixed (this is quite analogous to the docking problem I mentioned above).


...attempted translation to laymen's English...

It is sometimes useful to study portions of the protein rather than the entire chain. Some specific sequences will fold in a consistent mannar whenever they appear in the chain. By recognizing these, you essentially break the problem into several pieces, some pieces are the known shapes, and other portions you still don't know. You then work with the amino acids that exist between the known portions to fill in the complete chain.

You may have seen this in the graphic of some of your work units. You could see gaps throughout the protein chain, and then blinks where they were filled in with various possible shapes and tested for the resulting energy levels.

It's like putting together a puzzle now... where some of the pieces are made of stiff cardboard, and other pieces are moldable clay, in various sizes. If you can form the clay and make it fill the entire gap to the next piece, then you have created a possible solution to the puzzle. And since the pieces are moldable, other solutions are possible. The energy levels tell Rosetta which of the possible solutions is "best", or most likely to be the actual form the protein takes in nature (the "native state").
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 19066 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 19367 - Posted: 27 Jun 2006, 18:37:58 UTC

Curious, when crunching for CASP, where protein's native structure is unknown, how do you determine which of these "domains" to isolate and assume to take a given shape to use the JUMPING approach? Are there specific sequences that ALWAYS take the same shape? Or are the sequences chosen based upon preliminary results found from the traditional ab initio and full atom relax approach?

Or, to use my puzzle analogy below, how do you determine where to place a stiff cardboard piece, and where to place the moldable clay?
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 19367 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 19473 - Posted: 29 Jun 2006, 16:06:52 UTC

Today I met with the people who design the science curriculum for Seattle Public School middle and high schools to discuss incorporating rosetta@home into middle and high school science classes. I think that participating in a real research project could be more inspiring than just learning a set of facts; I certainly never found science classes very fun or interesting--the exciting part is discovering new things more than learning about discoveries made long ago.


This is GREAT news! And I hope you will find a way to incorporate "the exciting part" for the students. I mean if they presented chemistry by saying "we know the PH levels of each chemical... now let's DISCOVER what happens when they are combined"... and take the student THROUGH the steps that the originally scientists followed to learn these things in the first place... to the student it IS a new discovery then, and they aren't fully AWARE that it was a discovery made long ago.

I hope you can find a way to present the basic structure of how the atoms comprise the protein and try and leave it up to them to guess the rest. In essence to reinvent the science you've been working on for years. It is just possible that in doing this, you give them the information required... without biasing them to YOUR approach to solving the problem, and they discover that new method of solving the problem that works better (just like that theoretical little girl in Korea that grows up with these things and sees the problem from a whole new angle).

I mean it's like introducing the idea of a perpetual motion machine... but not telling them it's impossible to make one, and asking them to devise one. They may not be successful... but YOU may learn a lot from watching what they DO come up with :)

Maybe give them a few weeks to think about it and then present information about your approach to the problem. This may give them the incentive to learn how to compute angles and cosigns, and statistics and biology and chemistry... show them how all of these branches of science are involved in the problem. And this will help them see what we're trying to teach them is important in the real world.
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 19473 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Mats Petersson

Send message
Joined: 29 Sep 05
Posts: 225
Credit: 951,788
RAC: 0
Message 19736 - Posted: 3 Jul 2006, 18:11:03 UTC - in response to Message 17304.  

Why Rosetta, currently, does not use any optimization ?

Using 3Dnow! (for Atlhon XP+) and (sse2 for Pentium IV & others)
can shirink the CPU time required to finish a float-point WU by 6 times (1:6)
So, why not ?

This may apply to SOME types of calculations, although in my experience, SSE or 3DNow! generally gives an improvement on a particular calculation around 2-3x, unless it's something like a massive vector calculation (typically, that's what SETI does - FFT's are pretty good at being optimized for performace).

I've looked a little bit at what calculations there are in Rosetta, and there's no real obvious places where a SSE can be just slotted in to give a BIG boost of performance. There's certainly places that can it be used to improve performance by some amount, but there's no huge vector operations, just to give an example.

And I very much doubt that 6x would be the result, unless:
1. I've missed something (rosetta is quite large, and I've been looking at the disassembly of the code, not the source-code).
2. Someone spends a HUGE amount of time hand-optimizing large portions of code.

In view of the fact that the algorithms are still being changed for Rosetta, I don't think that it makes sense to spend large amounts of time optimizing small portions of it, and then have to redo the same optimization a little while later because the entire calculation changed.

When it comes to optimizing for 3DNow! or SSE, it's hard to get good results purely by adding a compiler switch to the compile, it takes re-writing the code in assembler to get much improvement - compilers are often poor at choosing the right things to go into registers for auto-vectorization [assuming it's supported AT all by the compiler]. Just using SSE instead of x87 instructions doesn't actually give much improvement in general.

Here's some (simple) benchmarks of the following:
add a sequence of floating point values from a two large arrays, 256 elements at a time:
fpu: 750 kcycles
sse_scalar: 775 kcycles
sse_vector: 460 kcycles
sse_v_unroll: 430 kcycles
3dnow: 470 kcycles

The total array-size is (1 << 17) elements, so (4 << 17) bytes, or 256KB, so two arrays fit well inside the L2 cache of my Opteron processor cache. The exact number of clock-cycles vary slightly between run to run, and the numbers are "best and worst removed" then averaged, over 15 runs in total.

There are cases where 3dNow! is better, but in this case I think it suffers from having to do twice as many operations compared to the SSE vectorized calculations.


BTW: Was that "Internal benchmark" compared with a credited
benchmark program ?
eg: Sisoft Sandra
http://downloads.guru3d.com/download.php?det=177


SiSoft Sandra's integer and floating point benchmarks are optimized to produce the higest possible results. That's all well and good, but it's not REALLY important exactly what numbers Boinc comes up with in it's results, as long as it's reasonably good at linearly giving a score that matches the speed of the machine. If you take the time to optimize the benchmark so that it gives a better score, would it actually give a fairer result? Probably not...

--
Mats




ID: 19736 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Leonard Kevin Mcguire Jr.

Send message
Joined: 13 Jun 06
Posts: 29
Credit: 14,903
RAC: 0
Message 19739 - Posted: 3 Jul 2006, 19:09:31 UTC


Hmm.. I was thinking about the same few weeks ago. But I've read some posts from Akos F. and he explained, that just compilig with 3Dnow or sse2 may increase the speed by only 3%. If you want bigger increase you have to do some low-level programming in assembler. This is that "magic" that can give you 600% increase in speed. Of course 3Dnow and sse2 can be strong tools, but not by itself. They has to be used the right way.

If this is correct then taking into consideration the below post by Mats Petersson. I agree with Akos, and I agree with Mats based from the point:

Its a daunting task to hand optimize assembler to use SSEx instructions, and it is componded when these routines change often. However, the only people who know how much the routines change and if the routines that are changed alot -- are the developers. So to completely put the issue to the grave would be their input - If there is even a definied way for them to communicate this, that could provide useful information to make the final determination.
ID: 19739 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Dimitris Hatzopoulos

Send message
Joined: 5 Jan 06
Posts: 336
Credit: 80,939
RAC: 0
Message 19746 - Posted: 3 Jul 2006, 22:37:41 UTC - in response to Message 19736.  
Last modified: 3 Jul 2006, 22:39:53 UTC


BTW: Was that "Internal benchmark" compared with a credited
benchmark program ?
eg: Sisoft Sandra
http://downloads.guru3d.com/download.php?det=177


SiSoft Sandra's integer and floating point benchmarks are optimized to produce the higest possible results. That's all well and good, but it's not REALLY important exactly what numbers Boinc comes up with in it's results, as long as it's reasonably good at linearly giving a score that matches the speed of the machine. If you take the time to optimize the benchmark so that it gives a better score, would it actually give a fairer result? Probably not...


I think by "internal benchmark", DB was referring to a set of calculations (a mini-WU if you prefer, doing e.g. time to perform 10 steps of full-atom-relax of the "1tul" protein vs a reference PC) which will be compiled into the base Rosetta.exe and will be used instead of BOINCclient.exe's own benchmark, so that an fpops-based credits system can be used.

This way credit claims will be more "objective" as it will measure "real" work for the project (although one can always crack the Rosetta.exe and change it, since we're still talking initial-replication=1).

Some BOINC project science apps might fit entirely in L2 cache, whereas others might be more dependant on memory (FSB) speed. In the latter case, a 3GHz and a 2GHz P4 might do about the same "real work" per CPU-hour. Using fpops, will probably cause some differences between points/CPU-hour among BOINC projects.
Best UFO Resources
Wikipedia R@h
How-To: Join Distributed Computing projects that benefit humanity
ID: 19746 · Rating: 0 · rate: Rate + / Rate - Report as offensive
David Baker
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 17 Sep 05
Posts: 705
Credit: 559,847
RAC: 0
Message 19747 - Posted: 3 Jul 2006, 23:36:39 UTC - in response to Message 19746.  

[quote]
BTW: Was that "Internal benchmark" compared with a credited
benchmark program ?
eg: Sisoft Sandra
http://downloads.guru3d.com/download.php?det=177


SiSoft Sandra's integer and floating point benchmarks are optimized to produce the higest possible results. That's all well and good, but it's not REALLY important exactly what numbers Boinc comes up with in it's results, as long as it's reasonably good at linearly giving a score that matches the speed of the machine. If you take the time to optimize the benchmark so that it gives a better score, would it actually give a fairer result? Probably not...


I think by "internal benchmark", DB was referring to a set of calculations (a mini-WU if you prefer, doing e.g. time to perform 10 steps of full-atom-relax of the "1tul" protein vs a reference PC) which will be compiled into the base Rosetta.exe and will be used instead of BOINCclient.exe's own benchmark, so that an fpops-based credits system can be used.

YES--we have this implemented, but it is not yet in use. we are instead thinking now of using average WU times from the RALPH tests to assign credits for each computed structure on ROSETTA, as suggested by participants earlier. For the duration of CASP the credit system will remain as it is now to avoid disruptions.


ID: 19747 · Rating: 0 · rate: Rate + / Rate - Report as offensive
R/B

Send message
Joined: 8 Dec 05
Posts: 195
Credit: 28,095
RAC: 0
Message 19905 - Posted: 7 Jul 2006, 20:24:36 UTC

Today I met with the people who design the science curriculum for Seattle Public School middle and high schools to discuss incorporating rosetta@home into middle and high school science classes. I think that participating in a real research project could be more inspiring than just learning a set of facts; I certainly never found science classes very fun or interesting--the exciting part is discovering new things more than learning about discoveries made long ago. Anyway, they were very interested and we should have some pilot projects in schools this fall.

These message boards were what gave me the idea for this--it has been really fun and rewarding to try to explain our research and answer all of your questions. As part of making the project more educational, we are working, with help from a Microsoft expert, to increase the amount of feedback participants can get on the results their computer produces. Hopefully you will see this here in not too long.

Outstanding...I forget who kicked off that idea but that is the kind of step that will pay off in the long run....

A wonderful step, Dr. Baker...
Founder of BOINC GROUP - Objectivists - Philosophically minded rational data crunchers.


ID: 19905 · Rating: 0 · rate: Rate + / Rate - Report as offensive
catalin

Send message
Joined: 28 Jun 06
Posts: 4
Credit: 17,134
RAC: 0
Message 20191 - Posted: 14 Jul 2006, 17:10:08 UTC - in response to Message 20187.  

We desperately need as much CPU power as possible for the next two weeks...


Here's my problem: I already have a project from climateprediction running just fine on my computer, but every time I try to get some new work from Rosetta, Boinc Manager doesn't seem to react - I hit the update button, read the "scheduler request pending" message for three minutes and... that's it - no result, no new work downloaded, from last month... The other project is working just fine, as I said... so I can't blame the manager for it...
What might be the issue...?
Thanks in advance.
ID: 20191 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Daral

Send message
Joined: 13 Jan 06
Posts: 13
Credit: 870,334
RAC: 0
Message 20196 - Posted: 14 Jul 2006, 18:05:51 UTC

Your computer is overcommitted, so it went into earliest deadline first (edf) mode. It stops getting new work until it thinks it can handle them all before their deadlines hit. If you want to work on rosetta, pause the climate model for a couple seconds, it will download rosetta wu's, then restart the climate model and it'll do them both.
ID: 20196 · Rating: 2 · rate: Rate + / Rate - Report as offensive
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 20205 - Posted: 14 Jul 2006, 20:25:39 UTC

If you are looking for information to show your friends to help get them to crunch Rosetta too... the newsletter that was sent in May might be a good resource.
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 20205 · Rating: 1 · rate: Rate + / Rate - Report as offensive
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 20206 - Posted: 14 Jul 2006, 20:26:51 UTC
Last modified: 14 Jul 2006, 20:27:49 UTC

Might I suggest copying Dr. Baker's post into the project homepage news area? These all get copied and shown on Boincstats where other folks might see it and be interested.
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 20206 · Rating: 0 · rate: Rate + / Rate - Report as offensive
1 · 2 · 3 · 4 · Next

Message boards : Rosetta@home Science : DISCUSSION of Rosetta@home Journal (2)



©2024 University of Washington
https://www.bakerlab.org