Message boards : Number crunching : GPU computing
Previous · 1 · 2
Author | Message |
---|---|
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1018 Credit: 4,334,829 RAC: 0 |
Plenty of scientific progress! https://www.bakerlab.org/wp-content/uploads/2016/09/HuangBoyken_DeNovoDesign_Nature2016.pdf https://www.bakerlab.org/wp-content/uploads/2016/09/Bhardwaj_Nature_2016.pdf |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2125 Credit: 41,228,659 RAC: 8,784 |
I think the question was about re-coding to take advantage of newer protocols, but wrt these papers from some weeks ago, these are the sort of things that should be posted up in the Science forum when they're available |
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1018 Credit: 4,334,829 RAC: 0 |
They have been posted and tweeted. Lots of cool science happening recently. |
Dr. Merkwürdigliebe Send message Joined: 5 Dec 10 Posts: 81 Credit: 2,657,273 RAC: 0 |
Because I want to help. That's why. One of the problems is the heterogeneous architecture of rosetta@home: There are PCs, Macs and tablets/smartphones (seriously?). Why not an internet-connected dual-core toaster? There are a lot of issues with these devices and this is IMHO a waste of developer resources. A homogeneous architecture based on AVXx would alleviate all those problems while yielding a higher performance. The distributed nature of rosetta also introduces latencies: preparing work, zipping it, sending it and collect the results back over a WAN. Being forced to deal with ultra-lame ancient CPUs and the like are another problem. |
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1018 Credit: 4,334,829 RAC: 0 |
We do use local and UW hosted clusters. https://itconnect.uw.edu/service/shared-scalable-compute-cluster-for-research-hyak/ We also have been given time on cloud computing resources. We also have had many many compute years awarded for supercomputing resources like blue gene etc. Specific questions and concerns about code development and optimizations are more of a Rosetta Commons issue. They have hired developers to tackle such issues. Keep in mind, we are a research lab whose main priority is research. One overlooked benefit to distributed computing is getting people familiar with science and allowing them to be directly a part of it. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,281,662 RAC: 1,402 |
I've just been looking at the performance of the new GTX1080 and for DOUBLE precision calculations it does 4 Tflops!!!! For comparison a relatively high performance chip like an overclocked 5820K will do maybe 350GFlops. So we are talking an order of magnitude difference. In addition the Tesla HPC version will probably be double that at 8 TFlops. (Edit: Looks like it is actually 5.3TFlops) The Volta version of the gtx1080 (next gen on, due in about 18 months time) is rumoured to be 7TFlops FP64 in the consumer version. More computing performance is not a good answer if the limit comes from available memory limits rather than from computing limits. Rosetta@Home has already looked into GPU versions, and found that they would require about 6 GB of graphics memory per GPU to get the expected 10 times as much performance as for the CPU version. The GPU version would run each workunit at about the same speed as the CPU version, and would therefore need to run 10 workunits at the same time, using 10 times as much memory, to get 10 times as much performance. Rather few of the high-end graphics boards have that much memory. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,281,662 RAC: 1,402 |
I can't fathom the computing knowledge you need for something like Rosetta. Or anything useful for that matter... I just got into learning Python (I figured an EE should know a good bit of programming) and I'm struggling like mad. MATLAB is the only language I'm proficient at, but it's so user friendly it doesn't count IMO. I've used Fortran for several years, and have taken classes in C++ and CUDA since then. Is any help needed for translating any remaining Fortran code to C++? I would not be able to travel for this. I'm still looking for an online OpenCL class aimed at GPUs rather than FPGAs. A CUDA version would work on most Nvidia GPUs, but not on other brands. An OpenCL version should work on other brands of GPUs. A GPU version REQUIRES that most of the application allows many threads to run in any order, or even at the same time, since they don't use anything produced by the other threads. If this is not satisfied, the GPU version may be as slow as a quarter of the speed of the CPU version. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,281,662 RAC: 1,402 |
So you want far fewer processors to be used? None of my computers use a CPU that even has AVXx available, and not enough money is available to replace all the computers available through BOINC with equivalents that have AVXx available. It would be possible, though, to produce separate compiles of the application for computers with AVXx and computers without, and add a shell program that tests what the CPU has available, then starts only the version of the program best for the current CPU. |
Dr. Merkwürdigliebe Send message Joined: 5 Dec 10 Posts: 81 Credit: 2,657,273 RAC: 0 |
Your computer does support AVX. AVX2 however has been introduced with the Haswell CPU generation. AVX-512 will be featured on Skylake-EP CPUs.
Yes, I know. We've already had that discussion here on this board. We're just waiting for results. |
sgaboinc Send message Joined: 2 Apr 14 Posts: 282 Credit: 208,966 RAC: 0 |
imho GPUs may be simplified as very simplified ALUs with 1000s of 'registers' in which the ALUs can do SIMD (single instruction multiple data) executions on them. typical GPUs possibly have hundreds to thousands of 'gpu' (e.g. cuda) 'cores' on them & they benefit from a specific class of problem, i.e. the whole array or matrix is loaded into the gpu as 'registers' and in which simd instructions runs the algorithm in a highly *vectorized* fashion. this means among various things, the problem needs to be *vectorizable* and *large* and *runs completely in the 'registers' without needing to access memory*, it is useless if we are trying to solve 2x2 matrices over and over again in which the next iteration depends on the previous iteration. the whole of the rest of the gpu is simply *unused* except for a few transistors. In addition, adapting algorithms to gpus is often a significantly *difficult* software task. it isn't as simply as 'compiling' a program to optimise for gpu. Quite often the algorithms at hand *cannot make use of GPU vectorized* infrastructure, this requires at times a *complete redoing* of the entire design and even completely different algorithms and approaches. while i'd not want to discourage users who have invested in GPUs, the above are true software challenges to really 'make it work'. As i personally did not use s/w that particular use the above aspects of gpu, i've actually refrained from getting one and basically made do with a rather recent intel i7 cpu. i would think that similar challenges would confront the rosetta research team and i tend to agree that functional needs are the higher priority vs trying to redo all the algorithms just to make them use gpus. as the functional needs in themselves could be complex and spending overwhelming efforts into doing 'gpu' algorithms could compromise the original research objectives |
Darrell Send message Joined: 28 Sep 06 Posts: 25 Credit: 51,934,631 RAC: 0 |
As someone with 14 discrete GPU cards, I support those projects that have applications that run primarily in the GPUs (Einstein, SETI). My five computers have fairly modern CPUs, so I also give their cycles to projects that DON'T have applications for GPUs (Rosetta, LHC). This works for me. Keeps both GPUs and CPUs busy. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,623,704 RAC: 7,594 |
Up in this thread, i've reported the 2 pdf about gpu in Rosetta@Home project. They said, in this papers, that they have created a gpu app (so, it's possible) for specific simulations, but they were not satisfied about performances. This, over 3 years ago. Now, i don't know if they retried this app with recent and powerful gpus, if they have recompiled this app with new updated compilers/libraries/etc or if they have abandoned definitively this app... |
AMDave Send message Joined: 16 Dec 05 Posts: 35 Credit: 12,576,896 RAC: 0 |
For curiostity's sake, what about incorporating open source, specifically this (second paragraph)? |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,623,704 RAC: 7,594 |
For curiostity's sake, what about incorporating open source, specifically this (second paragraph)? First, it's great that poem's admins will release the code. Second, i don't think that rosetta can use this code. Poem, if i'm not wrong, runs omogeneous simulations, not heterogeneous like Rosetta (ab initio, docking, etc). |
ToyMachine Send message Joined: 31 Oct 16 Posts: 1 Credit: 621,562 RAC: 0 |
Could this thread be made a "Sticky"? Right up front, separated from all the other newb questions. It might also be appropriate to add this topic to the FAQ section, and maybe a bit on the main page. "We don't utilize GPUs, and here's why." I think that would make it quicker and easier for new contributors to determine which project to add to which computer, or where to direct upgrade funds, even if they (I) are too lazy to dig into the forum. ;) |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,623,704 RAC: 7,594 |
More computing performance is not a good answer if the limit comes from available memory limits rather than from computing limits. Rosetta@Home has already looked into GPU versions, and found that they would require about 6 GB of graphics memory per GPU to get the expected 10 times as much performance as for the CPU version. I thought, until yesterday, that gpu memory is only a problem of "amount", not a problem of "kind" of memory... Matrix-vector case study |
mmonnin Send message Joined: 2 Jun 16 Posts: 59 Credit: 24,222,307 RAC: 66,706 |
That test basically hits the memory wall where data can't be moved fast enough to fully utilize the processing cores. In this case its the GPU and HBM improves the bandwidth between processor and memory. HMC is a similar tech for the CPU and main memory. |
Greg Tippitt Send message Joined: 4 May 07 Posts: 5 Credit: 8,086,891 RAC: 4,476 |
There is a reason that PC's must have a CPU rather than simply a big video card that tries to run the entire operating system on a GPU. The reason is that GPUs are specialized processors for applications that can be designed for parallel computing. Huge speed improvements for some analysis when using GPUs does not mean that everything can run on a GPU more quickly. One metaphor for understanding this is to think about a WalMart store. If they open all of the checkout lanes, then it makes it faster for you to checkout without having to wait in line. This is like a GPU using parallel computing. Having lots of available checkout lanes will not make it faster for you to do your shopping, if for instance you need milk, antifreeze, shampoo, a pair of sweatpants, and a bag of kitty litter. These items are normally in departments that scattered all over the store, so it takes you lots of time to go to each. Having lots of empty checkout lanes doesn't help. If you've taken your family with you to Walmart, then you can send each person to get different items and rendezvous at the checkout. This might be thought of being analogous to having 4 CPU cores. The repeated posts "Why don't they compile the code for GPUs so it will run faster?" is somewhat like asking "Why doesn't the highway department attached a snowblower to the front of a Dodge Challenger SRT Hellcat, so that they can clear all the streets really quickly instead of using those slow trucks that take forever to get about town?" The easy solution is to run Rosetta on your CPU cores, and then run GPUGRID, or your other favorite BOINC apps, on your GPUs. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,623,704 RAC: 7,594 |
There is a reason that PC's must have a CPU rather than simply a big video card that tries to run the entire operating system on a GPU. The reason is that GPUs are specialized processors for applications that can be designed for parallel computing. Huge speed improvements for some analysis when using GPUs does not mean that everything can run on a GPU more quickly. Completely agree with you. In fact, top500 supercomputers use cpus and gpus TOGETHER The easy solution is to run Rosetta on your CPU cores, and then run GPUGRID, or your other favorite BOINC apps, on your GPUs. A BETTER solution is to use "deeply" our cpus, for example with SSEx or AVX |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,623,704 RAC: 7,594 |
Some news on OpenCl side (i posted these also on ralph@home forum) New CodeXL 2.5. ROCm now is at 1.6 version. Codeplay released ComputeCpp to develop SYCL app in Visual Studio. VC4CL brings OpenCl to Raspberry Pi. Khronos Group releases SYCL 1.2.1 to develop "code for heterogeneous processors to be written in a “single-source” style using completely standard modern C++" (and supports TensorFlow). |
Message boards :
Number crunching :
GPU computing
©2024 University of Washington
https://www.bakerlab.org