Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 278 · 279 · 280 · 281 · 282 · 283 · 284 . . . 301 · Next
Author | Message |
---|---|
dcs1955 Send message Joined: 2 Dec 22 Posts: 13 Credit: 5,886,106 RAC: 12,319 |
Thanks.. Do you know if it is significantly more memory? Currently, 50% of my tasks are VS. Two VS are running (others are 8a-e__hal ) one of 4 processes is using 1.8-2.2G the others are using 100-300M |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1677 Credit: 17,775,208 RAC: 22,781 |
Thanks.. Do you know if it is significantly more memory? Currently, 50% of my tasks are VS.You just answered your own question. Generally they need between 500MB & 2.5GB, depending on the Task.1-1.5GB tends to be more common. Grant Darwin NT |
dcs1955 Send message Joined: 2 Dec 22 Posts: 13 Credit: 5,886,106 RAC: 12,319 |
Thanks I tweaked the computer preferences to up the memory use percentage. Something I have not needed to do before. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1677 Credit: 17,775,208 RAC: 22,781 |
I've had mine set to When computer is in use, use at most 95 % When computer is not in use, use at most 95 %without issues (with fair bit more RAM per core/thread than your 8GB RAM systems). Grant Darwin NT |
dcs1955 Send message Joined: 2 Dec 22 Posts: 13 Credit: 5,886,106 RAC: 12,319 |
I wimped out and stopped at 90%. :) |
äxl Send message Joined: 30 Dec 08 Posts: 11 Credit: 497,080 RAC: 0 |
Rosetta Beta 6.05 I've had to put RAM usage to 25% for now since it would crash my PC. (Could be faulty modules.) I even aborted 3 of 4 WUs since they would stay in RAM and I don't think I could have finished them anyway. The one I kept is still at 24% and it says Elapsed Time ~5h, Remaining Time ~6h, Deadline is in ~6h. It's running through ScienceUnited so here are the WUs if someone cares: RosettaVS_SAVE_ALL_OUT_NOJRAN_UBA5_3H8V_fulldb_IGNORE_THE_REST_WwaHIZ_1_3192_2978231_3 The ones I stopped: RosettaVS_SAVE_ALL_OUT_NOJRAN_UBA5_3H8V_fulldb_IGNORE_THE_REST_WwaHIZ_3_1857_2978234_3_0 RosettaVS_SAVE_ALL_OUT_NOJRAN_UBA5_3H8V_fulldb_IGNORE_THE_REST_WwaHIZ_6_7045_2978237_3_0 RosettaVS_SAVE_ALL_OUT_NOJRAN_UBA5_3H8V_fulldb_IGNORE_THE_REST_WwaHIZ_4_6655_2978235_3_0 It's an old computer: https://scienceunited.org/su_hosts.php?action=detail&host_id=87101 Running BOINC because: 1) I'm using 100% green energy (no certificates or other non-sense) 2) My computer runs mostly anyway (due to BT and other non-sense) 3) To help |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1677 Credit: 17,775,208 RAC: 22,781 |
The fact that it's through Science United makes it impossible to see what's going on (we can't view you computer without being logged in to your account there), and will probably affect what you're able to do about it. I've had to put RAM usage to 25% for now since it would crash my PC. (Could be faulty modules.) Under Preferences, Computing preferences, make sure Memory, "Leave non-GPU tasks in memory while suspended" is not selected. When running more than one project, no cache is best. Less chance of deadline issues. Preferences, Computing Preferences, Other, Store at least 0.1 days of work Store up to an additional 0.01 days of work Run Memtest on the system to see if there is an issue with the memory, most likely it's a lack of memory on the system as most of the RosettaVS_ and Rosetta 4.20 Tasks need plenty of RAM- 500GB to 2.5GB (1-1.5GB tends to be most common). And reducing the amount of memory that BOINC can use, will just make things worse. Luckily, there have been very few of those Tasks released in the last 24hrs or so. Also check your completed Valid Tasks and compare the Run time to the CPU time- if there's more than a few minutes difference, it means you're using your system a bit. If there's 30min or so then you're using it a lot. Hours+, you or something else on the computer is making a huge use of your CPU's time. Grant Darwin NT |
Jean-David Beyer Send message Joined: 2 Nov 05 Posts: 188 Credit: 6,410,103 RAC: 5,477 |
Run Memtest on the system to see if there is an issue with the memory, most likely it's a lack of memory on the system as most of the RosettaVS_ and Rosetta 4.20 Tasks need plenty of RAM- 500GB to 2.5GB (1-1.5GB tends to be most common). And reducing the amount of memory that BOINC can use, will just make things worse. I have not run memtest in years. Back when I had 8 GBytes of RAM and dual Intel Xeon processors, it took almost a day to run memtest. Now that this machine has 128 GBytes of RAM, it would probably take over a week to run it. This machine has 8 memory modules, and when I raised it from 64 GBytes to 128 GByte it was a little flakey, but it was pretty easy to find which module it was and the RAM supplier replaced it free of charge. As far as RosettaVS tasks are concerned, I have only two of them waiting to start out of 22 tasks on the machine. At times, half of the tasks on my machine have been RosettaVS, and sometimes two of them have run at the same time. Right now, I have one Rosetta 4.20 Task waiting to run. The biggest tasks I have run have been CPDN like this one: Task 22317868 Name oifs_43r3_bl_a4ck_2016092300_15_991_12212423_2 Workunit 12212423 Created 15 Apr 2023, 5:23:15 UTC Sent 15 Apr 2023, 5:24:02 UTC Report deadline 14 Jun 2023, 5:24:02 UTC Received 15 Apr 2023, 12:23:18 UTC Server state Over Outcome Success Client state Done Exit status 0 (0x00000000) Computer ID 1511241 Run time 6 hours 18 min 49 sec CPU time 6 hours 13 min 2 sec Validate state Valid Credit 1,813.14 Device peak FLOPS 6.06 GFLOPS Application version OpenIFS 43r3 Baroclinic Lifecycle v1.11 x86_64-pc-linux-gnu Peak working set size 5,592.19 MB Peak swap size 5,930.79 MB Peak disk usage 1,277.90 MB |
äxl Send message Joined: 30 Dec 08 Posts: 11 Credit: 497,080 RAC: 0 |
Grant (SSSF) wrote: The fact that it's through Science United makes it impossible to see what's going on (we can't view you computer without being logged in to your account there), and will probably affect what you're able to do about it. Even I can't see much. I can't see done WUs for example. Under Preferences, Computing preferences, make sure Memory, "Leave non-GPU tasks in memory while suspended" is not selected. Yes, that helped. When running more than one project, no cache is best. Less chance of deadline issues. Yes, this is the default, isn't it? Run Memtest on the system to see if there is an issue with the memory, I'm running memtester on 1GB since yesterday. I don't think it covers much but it's a start, I guess. most likely it's a lack of memory on the system as most of the RosettaVS_ and Rosetta 4.20 Tasks need plenty of RAM- 500GB to 2.5GB (1-1.5GB tends to be most common). And reducing the amount of memory that BOINC can use, will just make things worse. I've finished the 1 WU ~3h before deadline. (I think the only thing that got me into trouble was that the system froze and then I didn't have time over the weekend.) But you're saying because I didn't do parts 2 to 4 it's bad for the project? Also check your completed Valid Tasks and compare the Run time to the CPU time- if there's more than a few minutes difference, it means you're using your system a bit. If there's 30min or so then you're using it a lot. I can at least check the running WUs. Are you saying that if the difference is too big I shouldn't crunch at all? Jean-David Beyer wrote: it was pretty easy to find which module it was You mean by turning the computer off, pulling a module, turning the computer on, turning it off again etc.? Running BOINC because: 1) I'm using 100% green energy (no certificates or other non-sense) 2) My computer runs mostly anyway (due to BT and other non-sense) 3) To help |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1677 Credit: 17,775,208 RAC: 22,781 |
Nope.When running more than one project, no cache is best. Less chance of deadline issues. No idea what the defaults are, but it certainly isn't those values- if it were, people wouldn't have nearly as many problems as they do. If you are running Windows 7 or later, you can just use it's Memory Diagnostic tool. It doesn't work the memory as hard, so it doesn't take as long, but it will still show up dodgy memory. If it's only borderline it may not pick up a problem, then you make use of the F1 key to change the default test options, which will take longer.Run Memtest on the system to see if there is an issue with the memory,I'm running memtester on 1GB since yesterday. I don't think it covers much but it's a start, I guess. But you're saying because I didn't do parts 2 to 4 it's bad for the project?Sorry, but i've got no idea what it is you're asking bout there. No, just that you should find out what else is chewing up your CPU time.Also check your completed Valid Tasks and compare the Run time to the CPU time- if there's more than a few minutes difference, it means you're using your system a bit. If there's 30min or so then you're using it a lot. Taking 3 hrs 10 min to do 3hrs work isn't an issue. But if you're taking 9hrs+ to do only 3 hrs worth of work, it's really something you should look in to. Jean-David Beyer wrote: it was pretty easy to find which module it wasYou mean by turning the computer off, pulling a module, turning the computer on, turning it off again etc.?[/quote]Nope. Turn the computer off, remove all but one module, then power back up & test that module. If it's faulty- job done. If not, power down, pull that module, fit another one. Test again. Etc, etc. On systems with huge amounts of RAM & multiple modules, testing one at a time lets you do other things in between testing modules if you have to- otherwise all you can do it start the test & then wait for it to finish, hours (or days) later. Grant Darwin NT |
Jean-David Beyer Send message Joined: 2 Nov 05 Posts: 188 Credit: 6,410,103 RAC: 5,477 |
Jean-David Beyer wrote: I could do it a little more effidiently than that. It ran with 4 modules for many months. The problem occurred when I added 4 new modules. So I took out all 4 new modules and the problem went away. I put in two of the new modules and still no problems. I moved those two new modules to the other two memory slots (was it a slot problem or a module problem?) and it still worked, So I put another new module in and it still worked, so it was probably the last new module. And so it proved. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,570,739 RAC: 7,184 |
Some daemons are down...so, no validation |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2122 Credit: 41,183,435 RAC: 10,025 |
Hello. I'm back :) Now, where were we... I have to tell you, I'm absolutely amazed that you think Boinc scheduling being wrong by 50% one way or 2-300% the other way for the bulk of the time a task is processing - and 100% of the time it's sitting waiting in the cache - is no kind of problem,And I am absolutely amazed & astounded you would think something that at no stage have I ever said or I suggested. Well, it started when I talked about reducing target runtime from 12 to 8hrs, thereby reducing wallclock time of running tasks by 7-11hrs each and reducing wallclock time of all cached tasks by 7-11hrs each as well , and you said 'all it would do is reduce tasks tripping into panic mode', as if that wasn't the pragmatic solution. Taking 14-22hrs out of runtime goes a long way - in al l likelihood all the way - to prevent deadlines being missed and panic mode being tripped without changing anything else, while running all the projects the user wanted to run. By saying "all it would do..." you're saying it's not a solution, when it might be the entire solution to missed deadlines and panic mode. It is a problem for Scheduling. The entire problem is one of scheduling and the failure to meet deadlines. Everything else you talk about is purely academic and entirely irrelevant to the user if deadlines are met and all CPU time is maximised for projects important to the user. Which they are. It's an issue for the computer to solve, which it's perfectly capable of and it will always do for itself better than any attempt to micromanage it with other settings. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2122 Credit: 41,183,435 RAC: 10,025 |
But for anyone else that's been reading these posts... Well, that's obviously not true. If you limit the number of cores available to Boinc, your Boinc processing will be limited to the maximum # of cores you've allocated, while your unallocated cores won't be used for Boinc and may or may not be fully utilised, depending what else is going on. Use all your cores all the time. Your computer will decide millions of times per second what it should do with its capability better than any human ever will. Do not listen to the man behind the curtain. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2122 Credit: 41,183,435 RAC: 10,025 |
Some daemons are down... so, no validation They were, probably for about 16hrs today. I think it's now fixed. 175k backlog when I looked earlier, now below 100k. Edit: Server status page says there's still a 96k backlog, but it's not a live figure. Checking all my hosts, all tasks have been validated for each of them. |
Bryn Mawr Send message Joined: 26 Dec 18 Posts: 391 Credit: 12,083,231 RAC: 4,831 |
But for anyone else that's been reading these posts... One thing I’ve noticed is that my Ryzens appear to be power limited, the TDP is 65w and the PTT comes out at 88w so with all 24 cores running each core is getting about 3.67w but with only 20 cores running each core gets about 4.4w and the power draw is still 88w with the cores running a higher frequency. Also, running 23 cores allows the os to have its two pennies worth without having to swap out the data for a running WU and then swap it back in again, the WUs can run closer to 100% than the 97/98% they get when running 24 cores. That being said, I always run at 24 cores and let the computer sort itself out. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1677 Credit: 17,775,208 RAC: 22,781 |
Taking 14-22hrs out of runtime goes a long way - in al l likelihood all the way - to prevent deadlines being missed and panic mode being tripped without changing anything else, while running all the projects the user wanted to run.Which is exactly what my advice does- it reduces the Runtime for each and every Task, for all projects that the person does. It doesn't just do it for one Project, but for all of them. As i said before & i will say again- your suggestion addresses the symptom, mine fixes what is actually causing the problem. Most people prefer to fix the problem. If you're ok with just fixing the symptom, then so be it. It's obviously true if you actually understand what is going on.But for anyone else that's been reading these posts...Well, that's obviously not true. If you don't understand, then it it's not going to be obvious. One final attempt to point out the obvious- People doing GPU processing have known this for over a decade. If a GPU application requires a CPU core/thread to keep it fed, then losing the output from that core thread running the CPU application in order to support each Task running on the GPU, and you can get 10-20 times more work done from the GPU, and you don't reduce your CPU output by a large amount because they aren't all fighting to for CPU processing time that just isn't available. If you don't reserve that core/thread, your GPU output is way less than it could be, and your CPU output takes a massive dive as well. For the CPU- Needing only 3 hours to do 3 hours worth of work means you will do way more work than it if takes you 12 hours to do 3 hours worth of work- it's that simple. By losing the output of 1 thread, you end up doing 4 times the amount of work on each of the remaining cores/threads. So the amount of work done each day by not using all the cores/threads is many, many times greater than the amount of work done if you try to use all cores/threads for BOINC work on a system that is also doing large amounts of other CPU intensive work. So the statement you claim is not true, is true & factual, as evidenced by the output of thousands of computers over many years of crunching. Grant Darwin NT |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1677 Credit: 17,775,208 RAC: 22,781 |
Also, running 23 cores allows the os to have its two pennies worth without having to swap out the data for a running WU and then swap it back in again, the WUs can run closer to 100% than the 97/98% they get when running 24 cores.Unfortunately this will just muddy the waters for those that don't understand the issue of an over committd system. What i've been talking about is systems that are doing a lot of non-BOINC work while BOINC is running. Hence the massive difference between Run time & CPU time- for the person that started all this with Denis it was 4 times as long. 4 times... For a system that is lightly used, or just a dedicated cruncher, then using all cores & threads all the time will result in the greatest amount of work being done each day. But if it's heavily used for other things, reserving a thread or 2 will result in a massive increase in output of BOINC work, even with the loss of BOINC output from that thread/ those threads. Grant Darwin NT |
Bill F Send message Joined: 29 Jan 08 Posts: 44 Credit: 1,563,926 RAC: 799 |
You two guys are having too much fun to be doing this by your selves, Since we have lots or users with different configuration's I figured that I would help muddy the waters a little more. I you are running Windows 10 or newer some GPU's can do more of there own scheduling without as much CPU involvement.... please see below this is mostly old news. ----------------------------------- On Windows 10 version 2004 Hardware accelerated GPU scheduling was added if your Video card supported it and you had (NVIDIA version 451.48, AMD version 20.5.1 Beta) (or newer) installed. Theory being off loading the CPU with the GPU scheduling that the GPU could do for it's self. You can Google "Hardware accelerated GPU scheduling" https://www.howtogeek.com/756935/how-to-enable-hardware-accelerated-gpu-scheduling-in-windows-11/ Have Fun Bill F |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1677 Credit: 17,775,208 RAC: 22,781 |
You two guys are having too much fun to be doing this by your selves, Since we have lots or users with different configuration's I figured that I would help muddy the waters a little more.Yep, bringing up something that is only tangentially relevant to the discussion at best certainly doesn't help in the slightest. Here we are talking about Scheduling work between different BOINC applications & sharing time with non-BOINC applications. The link you posted to is about Operating System scheduling. The first line of that article- Windows 10 and Windows 11 come with an advanced setting, called Hardware-Accelerated GPU Scheduling, which can boost gaming and video performance using your PC's GPU.It's there to boost your Video card's performance. If you're playing a CPU limited game while running BOINC work in the background, it might provide some very slight benefit, if any. Limiting the number of cores/threads BOINC can use so it doesn't compete with the game would provide much, much more benefit. Using the BOINC settings to suspend BOINC while gaming would be better still. The whole idea behind BOINC was to make use of unused CPU resources, not to try to use them even while they're being used heavily by other applications. As i have been pointing out over & over gin in this discussion, trying to do so results in BOINC not actually getting much work done. And as any gamer would tell you, you don't want anything else running in the background while gaming as it will impact on your gaming experience (no matter how many cores/threads you have, although it probably wouldn't be that much of an issue with Threadrippers & greater. but when people get worked up over the difference between 200 frames per second and 203 frames per sconed...). Grant Darwin NT |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2024 University of Washington
https://www.bakerlab.org