Message boards : Number crunching : ROSETTA MUST ENABLE GPU UNITS
Author | Message |
---|---|
CZ Send message Joined: 4 Apr 14 Posts: 17 Credit: 78,584 RAC: 0 |
I know there are many arguments about how difficult it is to do this, but Rosetta@home is slowly losing support as people who are here just to get the highest RAC will simply run other (much less important) projects. It is a shame that such an important project, perhaps one of the three most "worthy" of our computing power, can't be bothered to make this difficult transition to attract the much more efficient GPU crunchers. It is absolutely beyond my why such a well known, well established, well regarded, and long running project hasn't made this transition - do you not need our help, or do you just not care enough to do anything about it ? Not being willing or able to enable GPU processing of Rosetta work units makes me question whether the scientific work being done by this project is viewed as worthy of the effort by their scientific peers in the IT area to make GPU processing possible for Rosetta. Are we just wasting time here ? |
CZ Send message Joined: 4 Apr 14 Posts: 17 Credit: 78,584 RAC: 0 |
I would just like to add that enabling GPU processing is especially important considering Rosetta work units take so long compared to other projects - about 24 hours (on my Intel Core i5 3.2 MHz CPU) while I can crunch similar sized units for other projects on my GPU in 15 to 60 minutes. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,615,189 RAC: 9,134 |
I know there are many arguments about how difficult it is to do this, but Rosetta@home is slowly losing support as people who are here just to get the highest RAC will simply run other (much less important) projects. No, you're not wasting time here. Rosetta is one of the most prolific boinc project with over 100 publications. But 1) Not all the code/algorithm can "pass on gpu". 2) If the code can pass, you have to consider the performances. 3) One year ago, rosetta team publish 2 pdf of gpu test/benchmark and seem that actual gpu cannot give what developers want. Here: https://boinc.bakerlab.org/forum_thread.php?id=6185&nowrap=true#75756 |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,615,189 RAC: 9,134 |
Rosetta work units take so long compared to other projects - about 24 hours (on my Intel Core i5 3.2 MHz CPU) In your profile, you can edit the cpu run time... |
CZ Send message Joined: 4 Apr 14 Posts: 17 Credit: 78,584 RAC: 0 |
No, you're not wasting time here. Rosetta is one of the most prolific boinc project with over 100 publications. But Thanks for your reply - it seems you raised this question in the thread you linked. However, the replies you got in reply to why they won't enable GPU is basically "we are not doing it because it's too hard", the last reply from the project scientists is that they are not even trying to port it to GPUs anymore (that was in 2013). I have stopped my project from accepting any more tasks from them if they can't be bothered, then neither can I - my CPU time can be better used in the "Mapping Cancer Markers" project for the World Community Grid (they take less than two hours per task) and the project is just as worthy. SIMAP also did protein folding and their work units were only about an hour long - I feel that Rosetta is not even trying to make the best use of my donated PC time by enabling GPU processing so why should I bother? Other people will no doubt continue to support this project, but I have been using BOINC on and off for years and I can see Rosetta being MUCH less popular than it used to be - there are other projects (much less worthy) who have GPU crunchers churning out tons of GFLOPS for them because they get more "credit" for their time. This is just purely egotistical credit where GPU crunchers compete for the highest RAC, rather than doing work for a project based on it's scientific merit. It is this competitive drive of younger users with powerful GPUs that really benefit other "worthless" projects just because they grant the highest RAC credits. If Rosetta wants to survive as a distributed computing project it will need to keep up, or be left behind. In your profile, you can edit the cpu run time... I'm well aware of that, but the fact remains it takes about a day to process one Rosetta work unit/core - if I set it to use 25% of CPU it will just take 4 times longer (but still use up about a day of continuous run time). Thanks for your feedback VENETO, and keep up the good work, but I'm outta here ... |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
That isn't what they meant. There is a Rosetta setting for preferred WU runtime. Default is 3hrs. Max is 24hrs. If a shorter runtime per WU would be preferable, you do have the option of running less models per WU and reporting them back on a shorter schedule. Rosetta Moderator: Mod.Sense |
CZ Send message Joined: 4 Apr 14 Posts: 17 Credit: 78,584 RAC: 0 |
Oh OK, I'll give that a go. My Rosetta WUs always ran at 24 hours (and I installed BOINC and all projects with default settings). I thought Rosetta was unable or unwilling to make their WU more efficient (and that's why they took so long - and hence my argument for GPU processing). What I do find curious though, is that a forum moderator and project administrator actually has no RAC and no "Credit" for Rosetta@Home at all - isn't that like saying "do what I say, not what I do"? ;-) |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,615,189 RAC: 9,134 |
"It's too hard"...compared to the results. In the future, maybe, new gpus, new software generation (cuda or opencl), new protocols will give to admin what they want. Now i'm waiting to see how rosetta runs on Android. my CPU time can be better used in the "Mapping Cancer Markers" project for the World Community Grid That's a great project. But event they don't have a gpu client!!! there are other projects (much less worthy) who have GPU crunchers churning out tons of GFLOPS for them because they get more "credit" for their time. Are we crunching for credits or for help science? If Rosetta wants to survive as a distributed computing project it will need to keep up, or be left behind. Rest assure that if there will be the possibility for gpu, Rosetta developers there will be!! Thanks for your feedback VENETO, and keep up the good work, but I'm outta here Uh, my nickname is boboviz. Veneto is my region of Italy (http://en.wikipedia.org/wiki/Veneto). This is tipical use in Boinc Italy Team to create regional group. |
CZ Send message Joined: 4 Apr 14 Posts: 17 Credit: 78,584 RAC: 0 |
Uh, my nickname is boboviz. Veneto is my region of Italy (http://en.wikipedia.org/wiki/Veneto). This is tipical use in Boinc Italy Team to create regional group. Thanks for the feedback boboviz and Mod.Sense. I decided to stay with Rosetta - I am here for the cause not the glory. I just set the Rosetta WU to 4 hours in preferences (it defaulted to 24 hours on my PC for some reason - that was what was annoying me and making me think Rosetta was inefficient). While I slowly crunch Rosetta WUs on my CPU, I still hope they can enable GPU processing some time in the future though ... |
CZ Send message Joined: 4 Apr 14 Posts: 17 Credit: 78,584 RAC: 0 |
I just set the Rosetta WU to 4 hours in preferences It is still sending me 24 hour long units - I have had to abort those units and go back to preferences to make sure it was still set to target CPU time to 4 hours (did it twice in the last 15 minutes). Every time I "Update" the Rosetta project in BOINC I keep getting 24 hour long units, I have aborted all 24 hour long units and stopped my client from accepting new work units from Rosetta. I will check tomorrow that my preferences are still set to 4 hours in the Rosetta preferences, then allow new work units for Rosetta - if they still send me 24 hour long units I will have to reconsider my participation because I will have lost confidence in the competence of the project leaders (I can understand porting code over to GPU is difficult, but making sure my preferences are correctly applied can't be that difficult). |
Murasaki Send message Joined: 20 Apr 06 Posts: 303 Credit: 511,418 RAC: 0 |
I just set the Rosetta WU to 4 hours in preferences Judging from the timings of your posts I am guessing that you are judging the length of the task from the estimated end time your client is giving you. That isn't a good way to measure, as the client bases its estimates on recent history. As your recent tasks have been taking 24 hours your client estimates any new ones will take 24 hours too. If you let a new Rosetta task run with your 4 hour preference you should see it generate as many decoys as possible within 4 hours and then finish the last decoy it was working on at the 4 hour mark. This may mean your task may last until 4.5 or 5 hours if the last decoy takes a long time to generate. If the final decoy takes an extremely long time to generate the watchdog process will step in, terminate the final decoy and report the results of the completed decoys to the project. In these exceptional cases a couple more hours may be added to the run time, so perhaps 6 or 7 hours (the watchdog works on a percentage of your prefered run time, though I can't remember the exact figure). If you are still having trouble try reporting again here. These things usually turn out to have a simple explanation and are often within the user's control (such as forgetting to click a particular tick box). It has happened to me and many other contributors of this forum at one time or another. |
Murasaki Send message Joined: 20 Apr 06 Posts: 303 Credit: 511,418 RAC: 0 |
I have aborted all 24 hour long units and stopped my client from accepting new work units from Rosetta. Once you allow Rosetta to accept new tasks you may find there is a delay before the project sends you any. When you terminate a large batch of work the BOINC system thinks you are having a problem with your computer and delays sending new work to give you a chance to fix the problem. The more work that is aborted the longer it will be until a new task is issued. If this happens, don't worry, the work will come through after a while. |
CZ Send message Joined: 4 Apr 14 Posts: 17 Credit: 78,584 RAC: 0 |
I just set the Rosetta WU to 4 hours in preferences Thanks, that makes sense, I will allow new WUs and see what happens ... |
CZ Send message Joined: 4 Apr 14 Posts: 17 Credit: 78,584 RAC: 0 |
There is a Rosetta setting for preferred WU runtime. Default is 3hrs. Max is 24hrs. If a shorter runtime per WU would be preferable, you do have the option of running less models per WU and reporting them back on a shorter schedule. Thanks, it seems to be working for me (I set my target CPU run time to 4 hours), but the WUs I now receive appear to be "fragmented" (or divided). I assume that when these work units are validated (by another user) they would be compared to results returned by users who have completed the entire WU, or those who have returned the same WU "fragment" as me. So I imagine running either the complete (24 hour long) or the default length WUs would be preferable ? If so, is the default length still 3 hours ? - because when I attached Rosetta (without specifying WU length) to my default BOINC installation, I was receiving the 24 hour long WUs, not the 3 hour WUs (which you say are the default length). Long story short, does the length of WUs set in my Rosetta preferences affect the efficiency with which the WUs are processed or validated, or does it affect or negatively impact the project or it's users if "non-default" or shorter work units are processed (for example in amount of credit granted, or length of time between WU report time and validation)? Thanks. |
Murasaki Send message Joined: 20 Apr 06 Posts: 303 Credit: 511,418 RAC: 0 |
To answer your questions it will probably be best to start off with an explanation of how the Rosetta project works. At a basic level Rosetta takes the chemical structure of a protein (or segment of a protein) and tries to calculate the shape it would form in nature. The shape of a protein has a significant impact on how it reacts to other molecules, so knowing the shape is very useful for disease and drug research. In nature all molecules tend to structure themselves in a form that requires the least energy to create (molecules are lazy as my old Chemistry teacher used to say). Rosetta sends out protein structures to your computer and asks you to calculate the shape that has the least energy. There are billions of possible shapes for every protein which would take centuries for a single computer to calculate. To shorten this time Rosetta gives you a guesstimate of what part of the structures look like already and asks you to adjust the other parts. After a period of time (e.g. half an hour) your computer records the lowest point it found and notes that point as the result for the first decoy. Rosetta then gives you a slightly different guess of the structure to start with and you repeat the process to find the lowest energy point - this is decoy two. Your computer repeats this process until your preferred run time expires and you return all decoys (all lowest energy points you found) to the project. Meanwhile many other contributors are doing similar calculations and return their own low energy points for the same protein. The scientists review all the low energy points and select several structural areas that look promising. They then prepare a new batch of tasks to focus on these specific points and send them out for participants to calculate in more detail. This is repeated several times until the scientists have two or three potential shapes that they are fairly confident could match the real structure. At this point the results are taken to the laboratory for the scientists to conduct real world experiments to check which of the three structures is correct (doing these experiments with billions of possible answers is expensive, but limiting it to two or three results cuts the cost dramatically). I assume that when these work units are validated (by another user)... To maximise the search of the billions of possible results there is no cross-validation of tasks in Rosetta. Instead if several results return low energy points with a similar structure the scientists conduct the more detailed calculation in that area. The second set of results then provide more detailed answers to either validate or discount the original results. Long story short, does the length of WUs set in my Rosetta preferences affect the efficiency with which the WUs are processed or validated, or does it affect or negatively impact the project or it's users if "non-default" or shorter work units are processed (for example in amount of credit granted, or length of time between WU report time and validation)? The difference between a short and long run time is the number of decoys (low energy points) you produce. Whether you complete 48 decoys in a 24 hour run or 8 decoys a time in 4 hour runs (again 48 results in 24 hours) makes no real difference to the science. It is just a matter of personal preference for how you want to run your machine. The only major effect is the server load of asking for 1 big task every 24 hours or 6 small tasks; the project prefers longer run times to reduce the burden of communication but have opted for 3 hours as a reasonable default position. --- If my answer above was too technical, a common analogy for this type of science is searching for the deepest valley in a mountain range. A hundred people are asked to find the lowest point in the range and given a different start position. Each person walks around for a while climbing peaks and descending into valleys. They make a note of the lowest points they find on their maps and return to camp. Back at camp they compare notes and spot several common parts of the range that appear to be deep - they then return to those locations the next day for a more detailed look (all 100 looking at five or six locations now instead of the whole range). Returning to camp each night they compare notes and repeat the process until they eventually discover the lowest point. |
Murasaki Send message Joined: 20 Apr 06 Posts: 303 Credit: 511,418 RAC: 0 |
If so, is the default length still 3 hours ? - because when I attached Rosetta (without specifying WU length) to my default BOINC installation, I was receiving the 24 hour long WUs, not the 3 hour WUs (which you say are the default length). As far as I am aware the default length is still 3 hours. Your starting preference could have been caused by a glitch in the BOINC system (either client or server) but the most common explanation is that the user misunderstood what was being asked by "target CPU runtime" and thought it was asking how much of the day Rosetta was allowed to use the CPU. Setting 24 hours then having your client update its preferences before the first task reached 3 hours could have caused your first and subsequent tasks to all run with the new run time. |
sgaboinc Send message Joined: 2 Apr 14 Posts: 282 Credit: 208,966 RAC: 0 |
just like to add some 2 cents comments on this topic there are various floating point based compute benchmarks it seemed boinc / rosetta use the simple Whetstone benchmark http://boinc.berkeley.edu/wiki/computation_credit http://www.netlib.org/benchmark/ to estimate the floating point compute prowess on my pc that give about 4 Gflops per 'cpu' x 8 (i7 4771 4 cores hyperthreaded into 8) ~ 32 Gflops however if a different linpack benchmark together with a highly processor optimised blas (openblas) is used it can reach a staggering ~ 150-180 Gflops http://www.pugetsystems.com/blog/2013/08/26/Haswell-Floating-Point-Performance-493/ https://github.com/Garyfallidis/blasy/tree/master/OpenBLAS A a sample run from my pc for the linpack benchmark shows: ~/src/OpenBLAS-develop/benchmark> ./dlinpack.goto 10000 10000 1 From : 10000 To : 10000 Step = 1 SIZE Residual Decompose Solve Total 10000 : 1.591717e-10 148216.66 MFlops 3441.51 MFlops 146369.99 MFlops ~/src/OpenBLAS-develop/benchmark> ./dlinpack.goto 15000 15000 1 From : 15000 To : 15000 Step = 1 SIZE Residual Decompose Solve Total 15000 : 5.632548e-10 159243.41 MFlops 3609.93 MFlops 157882.34 MFlops 157.88 Gflops! if the oversimple-minded Gflops is used to measure computing prowess then a single i7 4770, 4770k, 4771, 4970 etc haswell cpus that runs on a higher end home pc today did what supercomputers perhaps a generation or 2 ago used to do: http://en.wikipedia.org/wiki/History_of_supercomputing#Historical_TOP500_table http://blog.sfgate.com/techchron/2008/09/11/fujitsu-supercomputer-goes-to-computer-history-museum/
----- the difference here is probably that the Whetstone benchmark is probably unoptimized for the cpu. on the other hand openblas used specific CPU features that's optimised for vectorised processing e.g. SSE, AVX2. this benefits a specific class of problems which is *matrix multiplications*. As the applications is directed for a specific purpose (i.e. matrix multiplications and the linpack benchmark), it in a way explains the huge performance gap (in terms of gflops). in the same way GPUs delivers Tflops for *very specific problems* that can benefit from payloads which matches the GPUs design http://www.cinemablend.com/games/AMD-Radeon-HD-7990-Malta-Sports-6GB-GDDR5-8-2-TFlops-54822.html While a lot of people are excited about the PlayStation 4's 2 teraflops of computational power, both Nvidia and AMD have completely superseded those numbers with their new line of GPUs, with Nvidia's Titan bolstering up to 4 teraflops of computational power and AMD's newly announced Radeon HD 7990 Malta curb-stomping the competition with 8.2 freaking teraflops of beastly computational dominance. however it's important to note that for some of the GPU Tflops some of which could refer to single precision floating point processing Tflops (this is a world of difference from double precision floating point Tflops) http://en.wikipedia.org/wiki/Radeon_HD_7000_Series#Radeon_HD_7900 For that matter the Radeon HD 7990 which sports 8.2 single precision TFlops did a paltry 1.894 double precision TFlops and that to achieve those Tflops it may only benefit a specific class of problems where the processing happens only within the GPU registers. For memory intensive applications that has 1000s of competing synchronizing locks this may severely degrade an app even if an attempt is done to re-code the app for a GPU. i.e. it may not benefit from the use of a GPU at all. -------------- what could be said is that not all computational payloads can be optimized or can even benefit from the likes of CPU or GPU specific features. i.e. real world problems are often 'different' from those of the 'idealised' benchmarks. e.g. the linpack benchmark benefits for huge matrices say 10000x10000 in dimensions, this may often not reflect real world scenarios where the matrices are *much smaller* but has a good number of other types processing which do not benefit from vectorised codes However rosetta development teams can indeed look into possible use of CPU vectorized features or GPU vectorised features for specific areas that can benefit from those. |
Murasaki Send message Joined: 20 Apr 06 Posts: 303 Credit: 511,418 RAC: 0 |
You have hit the nail on the head right there. Rosetta tasks can take anything from 100MB to 1,000MB+ per core to process. There is no way the current generation of GPUs can provide that much dedicated memory per core. |
sgaboinc Send message Joined: 2 Apr 14 Posts: 282 Credit: 208,966 RAC: 0 |
You have hit the nail on the head right there. Rosetta tasks can take anything from 100MB to 1,000MB+ per core to process. There is no way the current generation of GPUs can provide that much dedicated memory per core. even though this may be true, there are projects such as this: http://ambermd.org/ http://ambermd.org/gpus/benchmarks.htm which apparently suggests that there may possibly be different techniques which may make the GPU accelerated / vectorized capabilities possible. this could however be very different from techniques/methods used in rosetta. rossetta may possibly be simplier and perhaps more flexible in different aspects perhaps. e.g. the vectorized methods may be targetted at solving specific class of problems but not all problems. hence i guess there may be trade offs after all. However, i'm no expert in this arena and could only contribute little findings as such from a google search. i personally would not invest in a gpu, however there are people owning sophisticated high end graphic cards. I'd think those could be defectors from bitcoin/litecoin mining perhaps. if some of these high end GPU can be used in the efforts, we may see petaflops scale up of the distributed computing prowess. http://www.forbes.com/sites/reuvencohen/2013/11/28/global-bitcoin-computing-power-now-256-times-faster-than-top-500-supercomputers-combined/ |
CZ Send message Joined: 4 Apr 14 Posts: 17 Credit: 78,584 RAC: 0 |
All the dedicated/discrete (not the ones that come with the motherboard) GPUs these days come with at least 1GB of dedicated graphics memory - even the cheapest ones for $30 or so. |
Message boards :
Number crunching :
ROSETTA MUST ENABLE GPU UNITS
©2024 University of Washington
https://www.bakerlab.org