GPU WU's

Message boards : Number crunching : GPU WU's

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 92520 - Posted: 29 Mar 2020, 11:22:06 UTC - in response to Message 92511.  
Last modified: 29 Mar 2020, 12:06:52 UTC

Porting that one without breaking code will not be easy. I'm totally with you regarding runtime and it being worth the effort. But this one calls for full time developers with formal training, not scientists doing development on the side. I'm not sure how much man-power Rosetta actually has to do this and I'm also not sure if the commercial side of Rosetta has an interest in doing this.

Have you seen the posts by rjs5? He is an expert on parallelism (AVX, etc.) and has been trying to help them along for years, but it is slow progress.
ID: 92520 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Falconet

Send message
Joined: 9 Mar 09
Posts: 353
Credit: 1,227,479
RAC: 1,836
Message 92521 - Posted: 29 Mar 2020, 11:25:38 UTC

Read this regarding GPU work: https://boinc.bakerlab.org/rosetta/forum_thread.php?id=13533&postid=92291
ID: 92521 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Laurent

Send message
Joined: 15 Mar 20
Posts: 14
Credit: 88,800
RAC: 0
Message 92531 - Posted: 29 Mar 2020, 14:42:33 UTC - in response to Message 92520.  

Have you seen the posts by rjs5? He is an expert on parallelism (AVX, etc.) and has been trying to help them along for years, but it is slow progress.


Yes, i have seen them and the history (the no SSE thread in 2015) makes me shiver. They could push though a lot more data just using a competent compiler. Who knows how fast this thing would be on a good platform.

I'm a OpenCL developer and have already done such ports of scientific code. I offered help, they declined politely. That's life.
ID: 92531 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 92535 - Posted: 29 Mar 2020, 15:42:50 UTC - in response to Message 92531.  
Last modified: 29 Mar 2020, 16:00:36 UTC

I'm a OpenCL developer and have already done such ports of scientific code. I offered help, they declined politely. That's life.

Thanks for your efforts. But don't feel singled out. They refuse everyone. Maybe they have reasons; I am not a programmer.

PS - Good luck on TN-Grid. I do that one too. They were making progress for a while, and then stopped.

PPS: Do you know about QChemPedIA? It is a new project. I don't know if it is suitable for your efforts, but they have said that the open-source package they are using is ten times slower than the proprietary one they use in-house, so there could be potential.
https://quchempedia.univ-angers.fr/athome/
ID: 92535 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mad_Max

Send message
Joined: 31 Dec 09
Posts: 209
Credit: 25,885,934
RAC: 10,551
Message 93157 - Posted: 3 Apr 2020, 5:17:26 UTC - in response to Message 92497.  
Last modified: 3 Apr 2020, 5:52:26 UTC

As a result modern GPU only just few times faster compared to modern CPUs. And only on task well suitable for highly parallel SIMD computation. On tasks non well suitable for such way of computation it can be even slower compared to CPUs.
Actually, the facts say otherwise.
Seti@home data is broken up in to WUs that are processed serially, perfect for CPUs. Over time they made use of volunteer developers, and applications that made use of SSSEx.x and eventually AVX were developed- the AVX application being around 40% faster than the early stock application (depending on the type of Task being done).
Then they started making use of GPUs, and guess what? It is possible to break up some types of work that is normally processed serially to do parallel processing, then re-combine the parallel produced results to give a final result that matches that produced by the CPU application, and it does it in much less time.

For example- using the final Special Application for LINUX for Nvidia GPUs (as it uses CUDA) a particular Task on a high-end CPU will take around 1hr 30min (all cores, all threads and using the AVX application which is 40% faster than the original stock application). The same task on a high-end GPU is done in less than 50 seconds. And the GPU result matches that of the CPU- a Valid result.

Personally, i think going from 90min to 50 seconds to process a Task is a significant improvement.
Think of how much more work could be done each hour, day, week.

It just rubbish rather than facts. Either you misunderstood / counted something wrong. For example, you probably took the runtime of WU on 1 thread / core of the CPU (and not the whole processor), and compare it with the runtime of a job using an entire GPU (plus some from CPU as all GPU apps do) .

Or the programmers of SETI are completely unable to use modern CPUs normally. Only 40% boost form AVX compared with plain app without any SIMD is pity and puny: on a code/tasks suitable for vectorization it should gain 3x / +200% speed or more, and if code/task is NOT suitable for vectorization gain can be low but such tasks can not work effectively on GPU at all. Because both GPU programming and CPU SIMD programming needs the same (because all current GPUs cores are wide SIMD engines inside), but SIMD for CPU is simpler to implement.

current high end CPUs:
Intel Core i9-9900k is capable of ~450 GFLOPS with dual precision or 900 GPLOPS at single precision calculations
AMD Ryzen 9 3950X: ~900 GFLOPS DP and 1800 SP.
AMD Threadripper 3990X: 2900 GFLOPS DP and 5800 SP.

Peak speeds of few current high end GPUs
AMD VEGA 64 = 638 GFLOPS DP and 10215 SP
AMD RX 5700 XT = 513 GFLOPS DP and 8218 SP
NVidia RTX 2080 = 278 GFLOPS DP and 8 920 SP
NVidia RTX 2080 Ti = 367 GFLOPS DP and 11750 SP

And it’s much easier to get real app speed closer to the theoretical maximum on the CPU than on the GPU. And all GPU computation also need additional support/use of resources from CPU to run. Both facts reducing speed gap even further as we move from theoretical potential (shown above) to practical computing.

As i said: modern GPU only few times faster compared to modern CPUs, not ~100x (if you properly use all cores and SIMD extensions). And only if used for single precision calculations. On dual precisions GPUs usually even slower compared to CPUs at least for all "consumer grade" GPUs (there are special versions of GPUs for data centers and supercomputers with high DP speeds like NV Tesla or AMD Instinct, but they priced few times more compared to consumer/gamer counterparts GPUs and usually not sold to retail customer at all) .
ID: 93157 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1677
Credit: 17,747,962
RAC: 22,865
Message 93162 - Posted: 3 Apr 2020, 6:06:31 UTC - in response to Message 93157.  
Last modified: 3 Apr 2020, 6:07:59 UTC

For example, you probably took the runtime of WU on 1 thread / core of the CPU (and not the whole processor), and compare it with the runtime of a job using an entire GPU.
Of course, that is how you compare things. You compare things, that are comparable...
And it takes a lot of CPU threads to match the output of 1 GPU.
A lot of extremely slow processing units can match the output of a single high performance processing unit- if you have enough of them, and when it comes to CPUs v GPUs- it takes up 100 CPU processing units (Cores/threads) to match the output of an equivalent GPU (ie low end v low end, high end v high end).


And it’s much easier to get real app speed closer to the theoretical maximum on the CPU than on the GPU.
Yep, and still a GPU is capable of significantly greater processing rates than a CPU. Having lots of cores in a CPU helps offset it's poor capabilities, but then adding GPUs helps improve their output as well (checkout the hardware used in the current & future crop of Supercomputers. Real life facts, not from the world you live in, but actual facts. Reality).


And all GPU computation need support/use of resources from CPU to run.
Yep, and in every case the loss of the CPU output is more than offset by the increase in output the GPU provides.


As i said: modern GPU only few times faster compared to modern CPUs (if you properly use all cores and SIMD extensions).
The processing time of a CPU core is much greater than the processing time of a single GPU- withboth applications optimised for maximum output. Adding more cores to the CPU improves it's output, but then adding more GPUs to a system improves it's output as well.


And only if used for single precision calculations. On dual precisions GPUs usually even slower compared to CPUs at least for all "consumer grade" GPUs
AMD consumer GPUs have much higher DP (Double Precision) capabilities than NVidia.


(there are special versions of GPUs for data centers and supercomputers with high DP speeds like NV Tesla or AMD Instinct, but they priced few times more compared to consumer/gamer counterparts GPUs and usually not sold to retail customer at all) .
So what? The fact is they still well exceed a CPU's capabilities.



Of course you could process data that cannot in any way be parallelised, in which case then yes- a CPU (low, mid, highend) can out perform a GPU (low, mid, highend). But for work that can be done in parallel, GPUs win every time (with a well developed application of course, comparing an extremely optimised application with an extremely poorly written one isn't a valid comparison).
Grant
Darwin NT
ID: 93162 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dgnuff
Avatar

Send message
Joined: 1 Nov 05
Posts: 350
Credit: 24,773,605
RAC: 0
Message 93365 - Posted: 4 Apr 2020, 11:28:18 UTC
Last modified: 4 Apr 2020, 11:30:12 UTC

Here's a very different argument to try and explain why Rosetta doesn't have GPU WU.

To quote Spock from the original Star Trek show: "You're proceeding from a false assumption."

That assumption is that any program can be converted to run on a GPU and will go faster if that happens.

OK, lets assume that's correct. If it were, Intel and AMD would go out of business tomorrow, because we wouldn't need them any more. We'd just stop using conventional CPUs and run everything on the GPU instead: OS, Browser, the whole lot.

But we don't do that, do we. Why not?

Because there are some things that GPUs just don't do well at.

Read this Q & A from the Computer Science Stack Exchange website: https://cs.stackexchange.com/questions/121080/what-are-gpus-bad-at

It does a really really good job of explaining what GPUs are good at, and what they are bad at. And it just so happens that while Seti, Folding and others can be made efficient on a GPU, Rosetta can't.

So next time you're asking why Rosetta isn't on your GPU, ask yourself why your browser doesn't run on your GPU. The answer to both those questions is about the same.
ID: 93365 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 93389 - Posted: 4 Apr 2020, 15:27:33 UTC - in response to Message 93365.  

Rosetta can't.


Not sure "can't" is the perfect word there, but certainly not trivial to get there.
Rosetta Moderator: Mod.Sense
ID: 93389 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
gerardgn

Send message
Joined: 5 Apr 20
Posts: 1
Credit: 168,552
RAC: 0
Message 93647 - Posted: 6 Apr 2020, 16:17:15 UTC

I also have a NVIDIA Jetson nano.
The CPU is a low end (ARM A57 ) but the GPU is quite huge ( ~128 cores Maxwel) => ~450 Gflops.
Is it possible to enhance the client to take advantage of NVIDIA GPU like F@H for PC?
It will unleash more power as I see we are few with this devices.
ID: 93647 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Millenium

Send message
Joined: 20 Sep 05
Posts: 68
Credit: 184,283
RAC: 0
Message 93668 - Posted: 6 Apr 2020, 19:31:41 UTC

No, Rosetta@Home is a CPU only project. Not every problem is well suited for a GPU. So well, use Folding for your GPU and Rosetta for the CPU.
ID: 93668 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1994
Credit: 9,551,716
RAC: 6,403
Message 93678 - Posted: 6 Apr 2020, 20:54:25 UTC - in response to Message 93668.  

No, Rosetta@Home is a CPU only project.

Rosetta@Home is cpu only.
But trRosetta is also for gpu.
ID: 93678 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
markhl

Send message
Joined: 18 Feb 22
Posts: 1
Credit: 2,508
RAC: 0
Message 105115 - Posted: 22 Feb 2022, 3:59:20 UTC - in response to Message 93678.  

Hello! I am new to R@h. I started running R@h because WCG ran out of units due to their migration. Not running any other BOINC program. I'd welcome your thoughts.

This thread states that R@h only uses the CPU. But my BOINC event log shows GPU usage:

2/21/2022 7:30:24 PM | | Resuming GPU computation
2/21/2022 7:32:28 PM | | Suspending GPU computation - computer is in use


So, it looks like R@h does use GPU now?
ID: 105115 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1677
Credit: 17,747,962
RAC: 22,865
Message 105116 - Posted: 22 Feb 2022, 7:07:55 UTC - in response to Message 105115.  

So, it looks like R@h does use GPU now?
Nope.
It looks like if you have a GPU, and you have it set to suspend BOPINC processing under certain conditions, then when those conditions occur BOINC makes note of that in the log- even though the GPU isn't actually being used.
Grant
Darwin NT
ID: 105116 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bryn Mawr

Send message
Joined: 26 Dec 18
Posts: 390
Credit: 12,073,013
RAC: 4,827
Message 105138 - Posted: 22 Feb 2022, 19:28:26 UTC - in response to Message 91505.  

I am just saying that it is not safe to presume that since project X has done some sort of protein structure prediction on a GPU, that R@h, and the algorithms it uses for the various sorts of predictions, would see a similar performance boost from GPU.

Years ago they tried a gpu app of R@H (so i think is possible to do, even if limited to some protocols), with little benefits.
But during these years a lot of things changed, like HW and SW, so i don't know if benefits are bigger now.


If the algorithm has not changed then no amount of change to the hw and sw will make any difference to the benefits.
ID: 105138 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 105254 - Posted: 27 Feb 2022, 0:27:18 UTC - in response to Message 105138.  

I am just saying that it is not safe to presume that since project X has done some sort of protein structure prediction on a GPU, that R@h, and the algorithms it uses for the various sorts of predictions, would see a similar performance boost from GPU.

Years ago they tried a gpu app of R@H (so i think is possible to do, even if limited to some protocols), with little benefits.
But during these years a lot of things changed, like HW and SW, so i don't know if benefits are bigger now.


If the algorithm has not changed then no amount of change to the hw and sw will make any difference to the benefits.



The short version from my memory is their code is not designed for GPU use and that they are constantly changing the parameters or other things in each of the proteins they were sending out in 4.2. This project does not like change, so they don't bother writing code for GPU. Plus...with their neural network they have all the GPU power they need for deep machine learning.

What the story these days is for resistance to GPU usage, I don't know. But they have a hard enough time keeping CPU work straight sometimes. So GPU would be really unreliable if they released it here.

To sum it up, RAH has never been GPU and WILL never be GPU at least for the next 5 years if not longer.
ID: 105254 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 105255 - Posted: 27 Feb 2022, 0:28:04 UTC - in response to Message 93678.  

No, Rosetta@Home is a CPU only project.

Rosetta@Home is cpu only.
But trRosetta is also for gpu.


That does not appear to be BOINC related?
It appears to be its own stand alone app.
ID: 105255 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1994
Credit: 9,551,716
RAC: 6,403
Message 105278 - Posted: 28 Feb 2022, 8:24:26 UTC - in response to Message 105255.  

But trRosetta is also for gpu.


That does not appear to be BOINC related?
It appears to be its own stand alone app.


Yes, it's stand-alone, but it's open and is part of IPD/Rosettacommons/Rosetta software ecosystem.
Another software of IPD is RoseTTAFold, suitable on GPU.
ID: 105278 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1994
Credit: 9,551,716
RAC: 6,403
Message 105282 - Posted: 28 Feb 2022, 10:04:51 UTC - in response to Message 105255.  

That does not appear to be BOINC related?
It appears to be its own stand alone app.


This tweet from official R@H account said that TrRosetta is inside our VM's wus.
ID: 105282 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1994
Credit: 9,551,716
RAC: 6,403
Message 105283 - Posted: 28 Feb 2022, 10:18:06 UTC - in response to Message 105282.  

TrRosetta is running also with Pythorch
Pythorch runs well on Nvidia and Amd gpu....
ID: 105283 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1994
Credit: 9,551,716
RAC: 6,403
Message 105285 - Posted: 28 Feb 2022, 10:19:41 UTC - in response to Message 93678.  

But trRosetta is also for gpu.


Here the new github for TrRosetta, inside RosettaCommons
ID: 105285 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · Next

Message boards : Number crunching : GPU WU's



©2024 University of Washington
https://www.bakerlab.org