Discussion of the merits and challenges of using GPUs

Author	Message
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0	Message 96728 - Posted: 22 May 2020, 15:56:52 UTC - in response to Message 96669. Last modified: 22 May 2020, 15:58:04 UTC For those interested only in projects related to medical research, the only choice now appears to be Folding@home, which wasn't set up to be compatible with BOINC projects. It's possible, but difficult, to run it on a computer that has BOINC running at the same time. Their forums currently aren't working. I run Folding on the GPU on all my machines with BOINC on the CPU work units. It is no more difficult than the usual annoyances with Folding. That is, you have to set it up and then delete the "CPU" slot, or it will run by default (and check it again - you usually have to do it twice). And you of course have to reserve a CPU core to support the GPU, as with most setups. But they have a new version of their app recently, which may ease the setup. It won't take long to get the hang of it. And their forums are up, and have been for some time. Maybe you were not trying the SSL version? https://foldingforum.org/index.php If you are interested in other types of GPU projects, note that Asteroids@home currently has disk space problems interfering with uploads. I am about to post a comparison of how awful their GPU version is as compared to the CPU version for efficiency. It will be something like 40 watt-hours per work unit for the GPU (i.e., GTX 1060 or 1070), and about 14 watt-hours for the CPU. They should ban the GPU version to save the planet. (It has been stated by others before, but should be emphasized again.) ID: 96728 · Rating: 0 · rate: / Reply Quote

sgaboinc Send message Joined: 2 Apr 14 Posts: 282 Credit: 208,966 RAC: 0	Message 96729 - Posted: 22 May 2020, 17:05:36 UTC Last modified: 22 May 2020, 17:07:04 UTC i don't like crunching on gpus, i do play with some python tensorflow stuff on the side, and watch how it works. the simple ones like if it is a pre-trained convolution neural network (CNN), it would run for a fraction of a second and one would not feel any different. but if it is they other way round say if you are training a complex CNN network with lots of data (say images) the gpu can run at full speeds (loud fans) maximum loads for hours consuming more than a hundred watts (the top tier ones probably consume many hundreds of watts ) . if electricity costs isn't after all cheap, doing such computation can be expensive in electricity bills. gpus are used where their use are relevant and appropriate, e.g. those CNN stuff, and a lot of those CNN models are rather huge, and the training / update process are so data intensive it would generate terabytes of network data if traiing distributed across the network even for a rather modest / small CNN model. so for those it would be more appropriate to just have it run in the GPU rather than spill terabytes of data in conventional inter-networks in minutes, flooding and choking the whole networks. ID: 96729 · Rating: 0 · rate: / Reply Quote

robertmiles Send message Joined: 16 Jun 08 Posts: 1265 Credit: 14,424,358 RAC: 0	Message 96731 - Posted: 22 May 2020, 19:06:59 UTC - in response to Message 96728. [snip] I run Folding on the GPU on all my machines with BOINC on the CPU work units. It is no more difficult than the usual annoyances with Folding. That is, you have to set it up and then delete the "CPU" slot, or it will run by default (and check it again - you usually have to do it twice). And you of course have to reserve a CPU core to support the GPU, as with most setups. But they have a new version of their app recently, which may ease the setup. It won't take long to get the hang of it. And their forums are up, and have been for some time. Maybe you were not trying the SSL version? https://foldingforum.org/index.php I'm not sure if I was or not. However, that link allows me to read the forums, but I still can't log in to post anything there. ID: 96731 · Rating: 0 · rate: / Reply Quote

[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 2204 Credit: 13,720,774 RAC: 8	Message 96734 - Posted: 22 May 2020, 21:10:48 UTC - in response to Message 96669. The Open Pandemics subproject at World Community Grid currently does COVID-19 work, using CPUs only, but is thinking of creating a GPU version of their software. 'cause the project is based on Autodock. And Autodock has a gpu version Also project Quarantine@Home (it's not a boinc project) is using gpu. ID: 96734 · Rating: 0 · rate: / Reply Quote

robertmiles Send message Joined: 16 Jun 08 Posts: 1265 Credit: 14,424,358 RAC: 0	Message 96735 - Posted: 22 May 2020, 21:51:07 UTC - in response to Message 96734. The Open Pandemics subproject at World Community Grid currently does COVID-19 work, using CPUs only, but is thinking of creating a GPU version of their software. 'cause the project is based on Autodock. And Autodock has a gpu version Also project Quarantine@Home (it's not a boinc project) is using gpu. I've read that Autodock development has gone in two different directions, producing one version that can use a GPU and another version with the changes needed for COVID-19 work. IF they can find someone who can merge the two sets of changes, THEN Open Pandemics should have a GPU version they can use. A Google search did not find Quarantine@Home. Can you give me a link to that project? Is it able to share a GPU with Folding@Home? ID: 96735 · Rating: 0 · rate: / Reply Quote

Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0	Message 96736 - Posted: 22 May 2020, 22:01:05 UTC - in response to Message 96735. A Google search did not find Quarantine@Home. Can you give me a link to that project? Is it able to share a GPU with Folding@Home? https://quarantine.infino.me/ But the GPU version is only for Linux. The Windows version is only on the CPU at the moment. ID: 96736 · Rating: 0 · rate: / Reply Quote

Falconet Send message Joined: 9 Mar 09 Posts: 355 Credit: 1,669,337 RAC: 0	Message 96737 - Posted: 22 May 2020, 22:28:55 UTC - in response to Message 96735. Last modified: 22 May 2020, 22:32:33 UTC The Open Pandemics subproject at World Community Grid currently does COVID-19 work, using CPUs only, but is thinking of creating a GPU version of their software. 'cause the project is based on Autodock. And Autodock has a gpu version Also project Quarantine@Home (it's not a boinc project) is using gpu. I've read that Autodock development has gone in two different directions, producing one version that can use a GPU and another version with the changes needed for COVID-19 work. IF they can find someone who can merge the two sets of changes, THEN Open Pandemics should have a GPU version they can use. A Google search did not find Quarantine@Home. Can you give me a link to that project? Is it able to share a GPU with Folding@Home? They are working on the GPU version: https://twitter.com/ForliLab/status/1261194223811887109 Edit: One of the people on the OPN research team is a CUDA/OpenCL developer. ID: 96737 · Rating: 0 · rate: / Reply Quote

[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 2204 Credit: 13,720,774 RAC: 8	Message 97456 - Posted: 19 Jun 2020, 8:59:17 UTC Interesting article about C++/Sycl/OpenCl ID: 97456 · Rating: 0 · rate: / Reply Quote

[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 2204 Credit: 13,720,774 RAC: 8	Message 97839 - Posted: 30 Jun 2020, 21:42:37 UTC Sycl 2020 provisional specification SYCL is a standard C++ based heterogeneous parallel programming framework for accelerating High Performance Computing (HPC), machine learning, embedded computing, and compute-intensive desktop applications on a wide range of processor architectures, including CPUs, GPUs, FPGAs, and AI processors. SYCL 2020 is based on C++17 and includes new programming abstractions, such as unified shared memory, reductions, group algorithms, and sub-groups to enable high-performance applications across diverse hardware architectures. ID: 97839 · Rating: 0 · rate: / Reply Quote

robertmiles Send message Joined: 16 Jun 08 Posts: 1265 Credit: 14,424,358 RAC: 0	Message 97841 - Posted: 1 Jul 2020, 0:23:08 UTC Last modified: 1 Jul 2020, 0:42:01 UTC What we really need for GPU use is a compiler that can automatically identify groups of sections of the program that are running operations that CAN safely run without any of the sections within the group writing to any memory location used by any other section of the group. This would make it possible to just recompile the program with that compiler, with no programmer effort to modify the source code first. This is often, but not always, running the same operations on multiple sets of data. A few problems with this: GPU clock speeds are typically about a quarter of the clock speeds of CPUs produced at about the same time. This means that on the average, four threads on the GPU must be running at the same time just to make the GPU do the work as fast as a CPU-only program. At least for NVIDIA-based GPUs, the GPU cores come in groups (warps for NVIDIA). Within each group, if one core is doing an operation, all of the others must either be doing that same operation (probably on different data), or be doing nothing. That means that if there if an if-then-else in the GPU part of the program, the then part and the else part can only be doing different operations simultaneously if they are in different GPU core groups. I have not checked if this is also true for other brands of GPUs, but I suspect that it is. BOINC projects normally offer GPU versions of their programs only if those version will produce the outputs in no more than a tenth of the time required for the CPU versions to do it. The last time the Rosetta@home project tried to produce a GPU version, it gave outputs slightly faster than the CPU version for some users, and slightly slower for others. I've seen nothing since then about whether it has been tried again with the more recent versions of their program. This means using an average of at least 40 GPU cores at a time, which is impossible for GPUs that have less than 40 GPU cores. BOINC has a section to allow GPU work written in CUDA, and a section to allow GPU work written in OpenCL. Adding the capability to run GPU work written in any other computer language requires either a compiler that first transforms the source code to CUDA or OpenCL and then compiles that, or major modifications to BOINC to add yet another section to support GPU work written in that computer language. Such major modifications to BOINC have, in the past, taken a few years each. Unless you can hold your breath for a few years at a time, don't hold your breath waiting for such a major modification. ID: 97841 · Rating: 0 · rate: / Reply Quote

[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 2204 Credit: 13,720,774 RAC: 8	Message 97843 - Posted: 1 Jul 2020, 8:08:03 UTC - in response to Message 97841. BOINC has a section to allow GPU work written in CUDA, and a section to allow GPU work written in OpenCL. Adding the capability to run GPU work written in any other computer language requires either a compiler that first transforms the source code to CUDA or OpenCL and then compiles that, or major modifications to BOINC to add yet another section to support GPU work written in that computer language. Such major modifications to BOINC have, in the past, taken a few years each. Unless you can hold your breath for a few years at a time, don't hold your breath waiting for such a major modification. Indeed, the idea of SYCL is to write app in C++ (does not need any change to boinc infrastructure) and runs it in heterogeneous hw (using Cuda/OpenCl like a sort of "dialect" of C++). Meantime i hold my breath :-P ID: 97843 · Rating: 0 · rate: / Reply Quote

[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 2204 Credit: 13,720,774 RAC: 8	Message 98264 - Posted: 22 Jul 2020, 9:27:23 UTC - in response to Message 97843. Last modified: 22 Jul 2020, 9:33:11 UTC Indeed, the idea of SYCL is to write app in C++ (does not need any change to boinc infrastructure) and runs it in heterogeneous hw (using Cuda/OpenCl like a sort of "dialect" of C++). Meantime i hold my breath :-P And Sycl, often, is faster than Cuda!! Sycl and Cuda ID: 98264 · Rating: 0 · rate: / Reply Quote

[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 2204 Credit: 13,720,774 RAC: 8	Message 98968 - Posted: 11 Sep 2020, 16:48:18 UTC - in response to Message 98264. oneAPI with support to Sycl 2020 ID: 98968 · Rating: 0 · rate: / Reply Quote

mikey Send message Joined: 5 Jan 06 Posts: 1900 Credit: 12,902,147 RAC: 0	Message 98985 - Posted: 13 Sep 2020, 3:13:54 UTC - in response to Message 97841. What we really need for GPU use is a compiler that can automatically identify groups of sections of the program that are running operations that CAN safely run without any of the sections within the group writing to any memory location used by any other section of the group. This would make it possible to just recompile the program with that compiler, with no programmer effort to modify the source code first. This is often, but not always, running the same operations on multiple sets of data. A few problems with this: GPU clock speeds are typically about a quarter of the clock speeds of CPUs produced at about the same time. This means that on the average, four threads on the GPU must be running at the same time just to make the GPU do the work as fast as a CPU-only program. At least for NVIDIA-based GPUs, the GPU cores come in groups (warps for NVIDIA). Within each group, if one core is doing an operation, all of the others must either be doing that same operation (probably on different data), or be doing nothing. That means that if there if an if-then-else in the GPU part of the program, the then part and the else part can only be doing different operations simultaneously if they are in different GPU core groups. I have not checked if this is also true for other brands of GPUs, but I suspect that it is. BOINC projects normally offer GPU versions of their programs only if those version will produce the outputs in no more than a tenth of the time required for the CPU versions to do it. The last time the Rosetta@home project tried to produce a GPU version, it gave outputs slightly faster than the CPU version for some users, and slightly slower for others. I've seen nothing since then about whether it has been tried again with the more recent versions of their program. This means using an average of at least 40 GPU cores at a time, which is impossible for GPUs that have less than 40 GPU cores. BOINC has a section to allow GPU work written in CUDA, and a section to allow GPU work written in OpenCL. Adding the capability to run GPU work written in any other computer language requires either a compiler that first transforms the source code to CUDA or OpenCL and then compiles that, or major modifications to BOINC to add yet another section to support GPU work written in that computer language. Such major modifications to BOINC have, in the past, taken a few years each. Unless you can hold your breath for a few years at a time, don't hold your breath waiting for such a major modification. So are you saying they need to break it down into small chunks of work that can then run independently of each other and report back to the core program that can combine them into the next batch of small chunks of data until the task is complete? Much like committes at workplaces do things? If that could happen several Boinc projects may be able to benefit from that. ID: 98985 · Rating: 0 · rate: / Reply Quote

robertmiles Send message Joined: 16 Jun 08 Posts: 1265 Credit: 14,424,358 RAC: 0	Message 98986 - Posted: 13 Sep 2020, 3:34:29 UTC - in response to Message 98985. Last modified: 13 Sep 2020, 3:37:12 UTC [snip] So are you saying they need to break it down into small chunks of work that can then run independently of each other and report back to the core program that can combine them into the next batch of small chunks of data until the task is complete? Much like committes at workplaces do things? If that could happen several Boinc projects may be able to benefit from that. Not fully independently. The warps in Nvidia GPUs require an even smaller breakdown within each workunit where the cores within each warp must USUALLY be be doing the same operations on separate sets of data, or expect a major slowdown due to limits on how many GPU cores can be active at once. What you described is more like the main principal of BOINC works, whether for CPUs or for GPUs. ID: 98986 · Rating: 0 · rate: / Reply Quote

mikey Send message Joined: 5 Jan 06 Posts: 1900 Credit: 12,902,147 RAC: 0	Message 98991 - Posted: 13 Sep 2020, 11:18:57 UTC - in response to Message 98986. [snip] So are you saying they need to break it down into small chunks of work that can then run independently of each other and report back to the core program that can combine them into the next batch of small chunks of data until the task is complete? Much like committes at workplaces do things? If that could happen several Boinc projects may be able to benefit from that. Not fully independently. The warps in Nvidia GPUs require an even smaller breakdown within each workunit where the cores within each warp must USUALLY be be doing the same operations on separate sets of data, or expect a major slowdown due to limits on how many GPU cores can be active at once. What you described is more like the main principal of BOINC works, whether for CPUs or for GPUs. Ok thanks. ID: 98991 · Rating: 0 · rate: / Reply Quote

[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 2204 Credit: 13,720,774 RAC: 8	Message 99003 - Posted: 14 Sep 2020, 6:32:41 UTC - in response to Message 95280. The previous attempt at a GPU version gave one that ran at about the SAME speed as the CPU version - a little slower on some computers, and a little faster on others. This was not considered fast enough to make further development worthwhile. The only previous attempt that i know was over 5 years ago and a lot of thinghs are changed (hw, sw, etc) I'm wrong. The previous attempt was over 7 years ago. https://boinc.bakerlab.org/rosetta/forum_thread.php?id=6475&postid=76916#76916 ID: 99003 · Rating: 0 · rate: / Reply Quote

[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 2204 Credit: 13,720,774 RAC: 8	Message 99058 - Posted: 20 Sep 2020, 9:56:39 UTC - in response to Message 97843. Indeed, the idea of SYCL is to write app in C++ (does not need any change to boinc infrastructure) and runs it in heterogeneous hw (using Cuda/OpenCl like a sort of "dialect" of C++). As i said, Nvidia released CUDA C++ standard library as open source. works with not only NVIDIA CUDA enabled configurations but also CPUs ID: 99058 · Rating: 0 · rate: / Reply Quote

[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 2204 Credit: 13,720,774 RAC: 8	Message 99203 - Posted: 30 Sep 2020, 10:36:00 UTC - in response to Message 97843. Indeed, the idea of SYCL is to write app in C++ (does not need any change to boinc infrastructure) and runs it in heterogeneous hw (using Cuda/OpenCl like a sort of "dialect" of C++). As i said in Ralph's forum: Intel, with the Heidelberg University, is working on port oneAPI/DPC++ on AMD Gpu thanks to HypSycl Codeplay is working on port oneAPI/DPC++ on Nvidia Gpu thanks to SYCL ID: 99203 · Rating: 0 · rate: / Reply Quote

[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 2204 Credit: 13,720,774 RAC: 8	Message 99205 - Posted: 30 Sep 2020, 13:35:46 UTC - in response to Message 95471. 'cause is a simple rebrand of OpenCl 1.2. They abandoned OpenCl 2.x to his fate. Simply: OpenCl 3.0 is great.....if it was released 5 years ago. The only sunbeam (little sunbeam) is C++ for OpenCl And, today, OpenCL 3.0 is finalised, with initial SDK and a C++ Kernels ID: 99205 · Rating: 0 · rate: / Reply Quote