Message boards : Number crunching : What to throttle it to
| Author | Message |
|---|---|
|
just1vet Send message Joined: 13 Nov 05 Posts: 7 Credit: 7,424,392 RAC: 61,675 |
I have 32 gig ram, on a AMD 3950x running Linux Mint. Computer locks up when running 32 threads of Rosetta and only a reboot unlocks it. What would be a safe number of Rosetta to run to prevent this from happening and still use the other threads for different projects? I already know how to create the app-config file. Thanks! |
|
Sid Celery Send message Joined: 11 Feb 08 Posts: 2519 Credit: 46,790,522 RAC: 17,012 |
I have 32 gig ram, on a AMD 3950x running Linux Mint. Computer locks up when running 32 threads of Rosetta and only a reboot unlocks it. What would be a safe number of Rosetta to run to prevent this from happening and still use the other threads for different projects? One thing we noticed with the more recent Rosetta batches is they're extremely RAM-hungry, using 1-1.3Gb per task Not only that, but they checkpoint very infrequently too - maybe once every 3hrs at best, On my 5700U 8C/16T W11 laptop with 16Gb RAM it's running with just 8 threads and sometimes reports it's waiting on memory for 1 or 2 tasks My 5800X 8C/16T W10 desktop with 32Gb RAM manages 16 tasks without memory issues I'd look to restrict Rosetta on your 3950X 16C/32T to half the threads and see how it goes. Hopefully your other projects can fit into whatever remaining RAM you have
|
|
Bill Swisher Send message Joined: 10 Jun 13 Posts: 90 Credit: 63,761,181 RAC: 65,010 |
Dunno nuthin about mint...but (thanks to some wise people here) on openSUSE (Leap 15.7) I had to go in and modify a file. As root: cd /var/lib/boinc/projects/boinc.bakerlab.org_rosetta vi app_config.xml and put <app_config> #<app> #<name>rosetta_beta</name> #<max_concurrent>12</max_concurrent> #</app> <app> <name>rosetta</name> <max_concurrent>16</max_concurrent> </app> </app_config> then :wq Don't forget to chown boinc:boinc app_config.xml As you can see the rosetta beta stuff is commented out and since this may be an openSUSE only solution, your mileage may vary. Another fun thing...I've done this for Einstein, put in the wrong name for the name. The boincmgr will tell you that the name is no good and provide you with a list of names that work (or at least it did). Oops...forgot to say that the config file needs to be reread, that's under the Options tab in the boincmgr. If things are working as normal I just go in and mv app_config.xml app_config.xlm, which is where it's at now.
|
[VENETO] bobovizSend message Joined: 1 Dec 05 Posts: 2147 Credit: 12,707,842 RAC: 12,422 |
Not only that, but they checkpoint very infrequently too - maybe once every 3hrs at best They have no checkpoints at all... |
|
Sid Celery Send message Joined: 11 Feb 08 Posts: 2519 Credit: 46,790,522 RAC: 17,012 |
Not only that, but they checkpoint very infrequently too - maybe once every 3hrs at best Is that right? Maybe I'm getting mixed up with when each decoy is completed. I've got one task that's been running for 11h 52m and it's showing a checkpoint 1h 27m ago, but within that 11h 52m it will probably end up reporting 7 or 8 decoys completed, so maybe it's that and not an intermediate point within each decoy, as it should be. The point being a lot of processing can be lost if there's a crash, although I have to say these tasks seem very stable. No errors while computing as long as I've got enough RAM to run them in (which I have). Edit: Now finished and 8 decoys completed - very likely checkpointing after each decoy. Not the worst, but not the best either.
|
[VENETO] bobovizSend message Joined: 1 Dec 05 Posts: 2147 Credit: 12,707,842 RAC: 12,422 |
Is that right? Maybe I'm getting mixed up with when each decoy is completed. I've tried some reboot and, every time, the wus restarted from 0%- Maybe the cause is that my default runtime is 4 hrs, so i don't know if the wus creates correctly the checkpoints Edit: Now finished and 8 decoys completed - very likely checkpointing after each decoy. Not the worst, but not the best either. Maybe if i increase the runtime.... |
|
Sid Celery Send message Joined: 11 Feb 08 Posts: 2519 Credit: 46,790,522 RAC: 17,012 |
Is that right? Maybe I'm getting mixed up with when each decoy is completed. You know I believe the runtime of Rosetta Beta 6.06 tasks has been incorrectly set to 4hrs when the project's default has been 8hrs for some years. My main objection is that the few tasks we rarely receive are used up far too early because everyone's runtime comes up short. I notice your reported runtimes are between 3 and 4hrs and you usually report 2 decoys completed. While my objection could be described as administrative, the problem for users comes when we have tasks like we have this month, with such poor checkpointing that half or all their work gets discarded if the PC gets shut down at the end of the day or needs to be rebooted - or crashes due to the high RAM demand of current tasks, as reported by the OP. I've chosen to set my tasks to 12hr runtimes, but the ideal is to set it to Rosetta's 8hr intended default, which also matches Boinc's assumed runtime for scheduling purposes. When Edit Preferences is selected at Rosetta@home preferences it says Target CPU run time (not selected defaults to 8 hours) which isn't true for Rosetta Beta tasks. It needs to be explicitly set to 8hrs There are no advantages to having a shorter runtime. Not in how long each batch of tasks lasts, not in Boinc scheduling, not in runtime, not in credit received and not in the amount of processing lost when the PC is shutdown or rebooted or crashes. I'm not saying there are advantages in all those factors - they're neutral at worst - but whatever differences there may be are advantages, either to you personally or to all other users of Rosetta. Everyone should make this change to their default runtimes imo - no ifs or buts
|
[VENETO] bobovizSend message Joined: 1 Dec 05 Posts: 2147 Credit: 12,707,842 RAC: 12,422 |
There are no advantages to having a shorter runtime. I configured 4hrs years ago, when the code was not so stable and remain with this for comfort (above all in Ralph@Home) with my hw not always on. Maybe i can increment the runtime gradually, to 6 hrs and, after, to 8 hrs I'll think about it |
[VENETO] bobovizSend message Joined: 1 Dec 05 Posts: 2147 Credit: 12,707,842 RAC: 12,422 |
Not in how long each batch of tasks lasts, not in Boinc scheduling, not in runtime, not in credit received and not in the amount of processing lost when the PC is shutdown or rebooted or crashes. If you know me, i'm not interested in credit, but in help the scientists If you think that 8hrs will be better, i will consider it |
Grant (SSSF)Send message Joined: 28 Mar 20 Posts: 1911 Credit: 18,534,891 RAC: 0 |
If you know me, i'm not interested in credit, but in help the scientistsI just set mine to the default. If 8, 6, or 12 hours is what they reckon they need, then that's what my system does. Grant Darwin NT |
|
Sid Celery Send message Joined: 11 Feb 08 Posts: 2519 Credit: 46,790,522 RAC: 17,012 |
There are no advantages to having a shorter runtime. You're certainly right that we've had some problematic batches earlier this year, but it does vary from batch to batch. The current batch seems very reliable, as long as you have sufficient RAM for the demands it makes. Increasing to 6hrs first is reasonable to give you confidence you can continue to be successful, but as a generalisation I don't think limiting runtime is the solution to issues that arise from time to time.
|
|
Sid Celery Send message Joined: 11 Feb 08 Posts: 2519 Credit: 46,790,522 RAC: 17,012 |
Not in how long each batch of tasks lasts, not in Boinc scheduling, not in runtime, not in credit received and not in the amount of processing lost when the PC is shutdown or rebooted or crashes. As I understand things, when a task is running, after each decoy is completed, there's an exercise that averages the time taken to run each decoy, and if the remaining target runtime allows another one to be completed, then it continues, otherwise it ends. So, with the current batch, it seems a 3rd decoy can't be completed with a 4hr target runtime, so it ends after 2. With my 12hr runtimes, it completes 8 decoys. And I'd estimate that it completes 5 with an 8hr runtime. Over 24hrs, 4hr tasks would complete 12 decoys, 8hr tasks 15 decoys and 12hr tasks 16 decoys. And if a batch of a million tasks were issued, that would be 2m decoys in 4m hours of 4hr tasks, 5m decoys in 8m hours of 8hr tasks and 8m decoys in 12m hours of 12hr tasks. From a scientific results point of view, the longer runtimes produce significantly more representative output. From a user point of view, more credits, but more importantly imo, the batches last a lot longer when supply has been so irregular over recent months and years. To me, it's a no-brainer. Everyone seems to win. That's why I keep going on about it.
|
|
Sid Celery Send message Joined: 11 Feb 08 Posts: 2519 Credit: 46,790,522 RAC: 17,012 |
If you know me, i'm not interested in credit, but in help the scientistsI just set mine to the default. Under normal conditions I'd agree. In my time here Rosetta tasks used to be 3hrs long, 4hrs, 6hrs, 8hrs and, for a day or two, 16hrs before reverting back to 8hrs. Rosetta preferences explicitly state that "Not selected" defaults to 8hrs for all tasks Boinc scheduling is fixed by Rosetta at a default 8hrs All past discussion from the Admins here indicated 8hrs was the default for all tasks The fact Rosetta Beta tasks suddenly and inexplicably changed to 4hrs, while Rosetta tasks remained at 8hrs, to me is plainly a mistake, not an intentional change. I know this happened at a time Admins haven't been talking to us, but there's no reason why it needed to be changed, especially while everything everywhere else still points to an 8hr default. Even aside from any reasons the Project might have, from a user perspective, 8hrs matches Boinc's default scheduling time for unstarted tasks and means batches of work will last us longer when we have so much downtime between batches. Absolutely everything points to 8hr runtimes for all tasks and nothing anywhere points to anything else. Frankly, I'm dumbfounded why anyone would waste a second of their time arguing against the most obvious, simple and advantageous one-off tweak that will only benefit themselves, everyone else and the project we're all signed up to. There's no accounting for people...
|
Grant (SSSF)Send message Joined: 28 Mar 20 Posts: 1911 Credit: 18,534,891 RAC: 0 |
Absolutely everything points to 8hr runtimes for all tasks and nothing anywhere points to anything else.The fact that the Beta Tasks run for 4 hours when you select the default indicates that's what the person that sends them out considers long enough. If people want to run Tasks for longer in order to try to keep their systems busy with Rostta Tasks for longer, OK. But that's whole point of the default run time- it's a setting that whoever sends out a batch of work can make use of. If they need more time, they can make it longer, If they don't, then they can make it shorter. And that's what they have chosen to do with the Beta Tasks- Most run for 4 hours. Some batches have run for 8. They determine what gives them the results they need. If people want to play around with that that, the option is there, but i was more than happy enough to go along with what the person that releases the work decides is right for that batch. To argue that the changed times is a mistake, and not an intentional change, is arguing against the evidence. For the complete and utter lack of attention the project pays to Rosattea@home, to assume that because they haven't told us that things have changed, doesn't mean they haven't. If 4 hours wasn't producing useful results, after over 2 years they would have increased it. As they now appear to be running for 6 hours, it looks like after all that time they've decided 6 hours is enough (take a look at the top computer stats- systems now can do 3-4 times more work per core/thread than systems from 8 years ago- ie a single core on a modern system can do the same amount of work (or more) in 2.5 hours as an older system could do in 8 hours). Frankly, I'm dumbfounded why anyone would waste a second of their time arguing against the most obvious, simple and advantageous one-off tweak that will only benefit themselves, everyone else and the project we're all signed up to.It doesn't benefit others that don't prioritise Rosetta, as it blocks them from doing work for other projects and those that do prioritise Rosetta take any available work with their increased Runtimes, which deceases the chance for getting work for those that don't prioritise Rosetta work when they have done their work for other projects. And any benefit to the project is minimal, as if they actually wanted the extra work done on each Task, they would have increased the default runtime. The one and only benefit of longer than default Runtimes is for those that want to process Rosetta as their first priority- it increases their chance for getting work when there's bugger all to be had. No more, no less. Grant Darwin NT |
|
Bryn Mawr Send message Joined: 26 Dec 18 Posts: 435 Credit: 15,056,833 RAC: 9,731 |
Not in how long each batch of tasks lasts, not in Boinc scheduling, not in runtime, not in credit received and not in the amount of processing lost when the PC is shutdown or rebooted or crashes. <Spock>Does not compute.</Spock> What you’re missing is that when the 4 hour task ends after 2 tasks it gives back the extra time and another 4 hour task starts early. |
Message boards :
Number crunching :
What to throttle it to
©2025 University of Washington
https://www.bakerlab.org