Message boards : Number crunching : Ryzen 2700 performance on full cores (Ubuntu 18.04.1)
Author | Message |
---|---|
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
I just built a new Ryzen 2700 machine (Ubuntu 18.04.1), and tried it out on 8 full cores for the first batch of 8 work units (24 hours). It did nicely, averaging around 1500 points without any errors, and more importantly the output was consistent. https://boinc.bakerlab.org/rosetta/results.php?hostid=3493061&offset=0&show_names=0&state=4&appid= Next, I enabled SMT in the BIOS, but limited BOINC to using 75% of the CPU cores, so it will be running on 12 cores for the next 46 work units. Often the work units start out well, but the points become erratic thereafter, presumably due to the BOINC/Rosetta scoring system, so we will see. This machine is not overclocked, though I did set the memory in the BIOS to its rated speed of 2800 MHz (15-15-15-24). This is the lscpu output at the moment, thought the CPU speed varies down to about 3400 MHz. Model: 8 Model name: AMD Ryzen 7 2700 Eight-Core Processor Stepping: 2 CPU MHz: 3536.151 CPU max MHz: 3200.0000 CPU min MHz: 1550.0000 BogoMIPS: 6387.62 Virtualization: AMD-V L1d cache: 32K L1i cache: 64K L2 cache: 512K L3 cache: 8192K NUMA node0 CPU(s): 0-15 |
Paul Send message Joined: 29 Oct 05 Posts: 193 Credit: 66,366,511 RAC: 6,764 |
Did you set BOINC to keep work units in RAM when inactive? I found that if I don’t keep the work units in RAM they often fail when restarted. I am not sure why. Thx! Paul |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
Did you set BOINC to keep work units in RAM when inactive? I found that if I don’t keep the work units in RAM they often fail when restarted. I am not sure why. Yes, but it should not matter. This machine runs 24/7 and Rosetta is the only project running. (And normally, I do not have to restart it.) Also, I have set "Switch between applications" to 1600 minutes, so there should be no work units placed on "pause", they should just run straight through to the end. But it has picked up a couple of errors already since switching to virtual cores (SMT enabled). My previous experience with a Ryzen 1700 a year ago was similar. It had no errors on full cores, but picked up a few (about one in ten as I recall) with virtual cores. Of course, these might be just bad work units, but I expect they will be completed successfully by others. The real question is do you gain more than you lose with virtual cores? It will take a few days to see. |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
I now see the credits varying from 900 points (good) to 180 points (bad) at random. That is not at all unusual; I see it all the time on my Intel boards (Ivy Bridge, Haswell, Coffee Lake). So I will reduce the number of cores on the Ryzen 2700 from 12 to 8, to see if they will behave like the full cores while still leaving SMT enabled. |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
So I will reduce the number of cores on the Ryzen 2700 from 12 to 8, to see if they will behave like the full cores while still leaving SMT enabled. No luck. The credits are still below 200 points, so I am going back to full cores (SMT disabled), but only seven cores this time, as I am reserving one to support a GPU on Folding. But it may not work. The credits are rather random, I am just curious about the Ryzen 2700 performance. |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
No luck. The credits are still below 200 points, so I am going back to full cores (SMT disabled), but only seven cores this time, as I am reserving one to support a GPU on Folding. By the way, things aren't much better on my i7-8700. I am operating it on only 6 out of 12 cores also (with hyper-threading enabled), and got only one good result today with a credit of 1409 points. The others were all under 200 points. https://boinc.bakerlab.org/rosetta/results.php?hostid=3493841&offset=0&show_names=0&state=4&appid= So it is not so much a question of CPU type, but why are the points so seemingly random? Just saying that the BOINC credit is calculated against other cards does not explain why it is random. |
rjs5 Send message Joined: 22 Nov 10 Posts: 273 Credit: 23,054,272 RAC: 6,536 |
No luck. The credits are still below 200 points, so I am going back to full cores (SMT disabled), but only seven cores this time, as I am reserving one to support a GPU on Folding. The Rosetta team went to a lot of trouble to make sure their credit system was "fair" and did not award "undeserved credits". Many (most?) Roestta WU run longer than the maximum 24 hour maximum selection and so multiple machines bite off chunks of work and the results are glued together upon completion. it might be interesting to have a Rosetta option that just "ran until completion" instead of X hours. The Linux "perf top" command is rather interesting and shows you what is happening. Rosetta strips their binary so perf will not disassemble using symbols, but it will disassemble the assembly code to let you see what is happening. |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
The Rosetta team went to a lot of trouble to make sure their credit system was "fair" and did not award "undeserved credits". Many (most?) Roestta WU run longer than the maximum 24 hour maximum selection and so multiple machines bite off chunks of work and the results are glued together upon completion. it might be interesting to have a Rosetta option that just "ran until completion" instead of X hours. I will have to take your word on the "perf top", and I am glad that undeserved credit is not awarded. But that appears to imply that there are real performance differences between the work units, not just a difference in the credits. I have not been able to figure out (yet) any way to make all the work units run equally well; it seems to be inherent in their structure. Maybe it is a case of many experimenters using the same software, but implementing different experiments? In the final analysis, I don't care about the credits themselves, as long as my hardware is being used efficiently. The numbers are meaningless otherwise. |
rjs5 Send message Joined: 22 Nov 10 Posts: 273 Credit: 23,054,272 RAC: 6,536 |
The Rosetta team went to a lot of trouble to make sure their credit system was "fair" and did not award "undeserved credits". Many (most?) Roestta WU run longer than the maximum 24 hour maximum selection and so multiple machines bite off chunks of work and the results are glued together upon completion. it might be interesting to have a Rosetta option that just "ran until completion" instead of X hours. Credits are the metric you are using to determine correctness of the results. If the "credits" are wrong, then you certainly care about "credits". You are trying to solve a problem with an unstable metric. Rosetta job credits are incorrect. EXAMPLE: 1037824898 jelva19_mut_5_5tcssm_61_K_0251_0001_0007_fragments_fold_SAVE_ALL_OUT_700570_871_1 1037824900 jelva24_mut_5_7tcssm_61_K_0251_0001_0005_fragments_fold_SAVE_ALL_OUT_700576_871_1 BOTH of the TASKS (1037824898 and 1037824900 ) are Rosetta Mini v3.78 jobs. Both jobs ran 24 hours on your machine, 400 seconds difference. One processed 176 decoys and the other processed 170 decoys. One got 181 credits and the other got 903 credits. 1037824898 933522118 30 Oct 2018, 13:28:36 UTC 1 Nov 2018, 13:21:50 UTC Completed and validated 86,045.52 85,927.81 181.98 Rosetta Mini v3.78 x86_64-pc-linux-gnu 1037824900 933522146 30 Oct 2018, 13:28:36 UTC 1 Nov 2018, 13:46:06 UTC Completed and validated 86,465.53 86,329.85 903.40 Rosetta Mini v3.78 x86_64-pc-linux-gnu 1037824898 ====================================================== DONE :: 1 starting structures 85927.1 cpu seconds This process generated 176 decoys from 176 attempts ====================================================== BOINC :: WS_max 0 1037824900 ====================================================== DONE :: 1 starting structures 86328.8 cpu seconds This process generated 170 decoys from 170 attempts ====================================================== BOINC :: WS_max 2.89782e-70 |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
Thanks for the input. It was my feeling that something was wrong, but I don't know how to reach it. My Ryzen 2700 on full cores has done about the same as on virtual cores; that is, some of the work units give high credits, and others low. Just using full cores unfortunately does not guarantee consistent results, as I had hoped. It seems to do about as well as my i7-8700 overall. I had a spate of errors a couple of days ago. That was due to memory timing problems; I had four 2800 MHz DDR4 modules, and set the timing manually in the BIOS. But, as is often the case, four modules were not stable at their rated setting. So I replaced them with two 2666 MHz DDR4 modules, and all appears to be well know. I will be putting this machine on WCG, where it was intended before this test and my i7-8700 on Rosetta. My i7-3770 on Win7 64-bit does almost as well (per core) as any of them. Thanks again for all the input. I would not hesitate to use a Ryzen or Threadripper on Rosetta; they will work as well as anything. |
rjs5 Send message Joined: 22 Nov 10 Posts: 273 Credit: 23,054,272 RAC: 6,536 |
Thanks for the input. It was my feeling that something was wrong, but I don't know how to reach it. Several of the WCG projects are based on Rosetta code. 8-) If you look at them closely, you can see which ones. WCG just does not invest the extra project overhead to compute random wrong credits though (sarcasm). When I see that someone has changed one of the manufacturer default settings, I think about all the times I had to teach young engineers that ... "Just because it is not FAILING ... does not mean that it is WORKING". Floating point operations toggle many, many transistors. Changes could make the CPU sensitive to a floating point operation on certain numbers. It generates the wrong answer, but who knows how wrong. One of the first CPU bugs I found was when it was running the old Microsoft EDIT program. EDIT had a spin loop where it was polling the keyboard. The IN and OUT instructions in the loop caused a software induced momentary drop in the power which caused the system to hang. Microsoft and Intel both implemented fixes. I run WCG too. I got out of the system building business and buying systems is now my vice. I configure my own bare bones PC from portatech. Pretty good prices and it works when it arrives. 8-) |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
Several of the WCG projects are based on Rosetta code. 8-) If you look at them closely, you can see which ones. WCG just does not invest the extra project overhead to compute random wrong credits though (sarcasm). I know about MIP, and have recently discussed my problems there too (in the Ebola discussion). I now limit MIP to two at a time with an app_config. I have not found any others yet, but you needn't tell me. I will find them the hard way. |
Message boards :
Number crunching :
Ryzen 2700 performance on full cores (Ubuntu 18.04.1)
©2024 University of Washington
https://www.bakerlab.org