Rosetta uses only 1 core

Questions and Answers : Preferences : Rosetta uses only 1 core

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Garpusha

Send message
Joined: 31 Jan 24
Posts: 2
Credit: 552
RAC: 0
Message 108826 - Posted: 1 Feb 2024, 21:08:03 UTC

Dear Team,
I have 8 Core AMD 8350 and Rosetta is using only one core - I checked Virtual Box settings. I tried to provide 4 or even 8 cores to this VM but after re-start Rosetta reverts these changes and still uses only 1 core. Are there any ways to fix this case?

Thanks, and sorry if this was discussed earlier, did not find.
ID: 108826 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1684
Credit: 17,932,799
RAC: 22,884
Message 108830 - Posted: 2 Feb 2024, 7:51:52 UTC

Exit BOINC, then restart it.
Then check the Event log (View, Advanced View. Tools, Event log) and see what messages are there. Rosetta VirtualBox Tasks require roughly 8GB of disk space, per Task.
Grant
Darwin NT
ID: 108830 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Garpusha

Send message
Joined: 31 Jan 24
Posts: 2
Credit: 552
RAC: 0
Message 108831 - Posted: 2 Feb 2024, 8:22:33 UTC - in response to Message 108830.  

Well, I see now Max CPU used: 8, but overall CPU load is still about 15-20%. Although the cores are seem to be working more or less. VirtualBox is still using 1 CPU. I have no idea.

https://drive.google.com/file/d/15s0M2VK0vFQf2aw6izJOOgdEkO8K2KCd/view?usp=sharing

02.02.2024 11:09:31 |  | cc_config.xml not found - using defaults
02.02.2024 11:09:31 |  | Starting BOINC client version 7.24.1 for windows_x86_64
02.02.2024 11:09:31 |  | log flags: file_xfer, sched_ops, task
02.02.2024 11:09:31 |  | Libraries: libcurl/8.2.1-DEV Schannel zlib/1.2.13
02.02.2024 11:09:31 |  | Data directory: C:ProgramDataBOINC
02.02.2024 11:09:31 |  | Running under account alexa
02.02.2024 11:09:31 |  | OpenCL: AMD/ATI GPU 0: Radeon RX550/550 Series (driver version 3444.0 (PAL,HSAIL), device version OpenCL 2.0 AMD-APP (3444.0), 4096MB, 4096MB available, 1232 GFLOPS peak)
02.02.2024 11:09:31 |  | Windows processor group 0: 8 processors
02.02.2024 11:09:31 |  | Host name: HOMEPC
02.02.2024 11:09:31 |  | Processor: 8 AuthenticAMD AMD FX(tm)-8350 Eight-Core Processor [Family 21 Model 2 Stepping 0]
02.02.2024 11:09:31 |  | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 htt pni ssse3 fma cx16 sse4_1 sse4_2 popcnt aes f16c syscall nx lm avx svm sse4a osvw ibs xop skinit wdt fma4 tce tbm topx page1gb rdtscp bmi1
02.02.2024 11:09:31 |  | OS: Microsoft Windows 10: Professional x64 Edition, (10.00.19045.00)
02.02.2024 11:09:31 |  | Memory: 15.90 GB physical, 18.28 GB virtual
02.02.2024 11:09:31 |  | Disk: 231.71 GB total, 132.35 GB free
02.02.2024 11:09:31 |  | Local time is UTC +3 hours
02.02.2024 11:09:31 |  | No WSL found.
02.02.2024 11:09:31 |  | VirtualBox version: 7.0.12
02.02.2024 11:09:31 | Rosetta@home | General prefs: from Rosetta@home (last modified 02-Feb-2024 00:21:19)
02.02.2024 11:09:31 | Rosetta@home | Host location: none
02.02.2024 11:09:31 | Rosetta@home | General prefs: using your defaults
02.02.2024 11:09:31 |  | Reading preferences override file
02.02.2024 11:09:32 |  | Preferences:
02.02.2024 11:09:32 |  | -  When computer is in use
02.02.2024 11:09:32 |  | -     'In use' means mouse/keyboard input in last 3.00 minutes
02.02.2024 11:09:32 |  | -     don't use GPU
02.02.2024 11:09:32 |  | -     max CPUs used: 8
02.02.2024 11:09:32 |  | -     Use at most 100% of the CPU time
02.02.2024 11:09:32 |  | -     suspend if non-BOINC CPU load exceeds 25%
02.02.2024 11:09:32 |  | -     max memory usage: 7.95 GB
02.02.2024 11:09:32 |  | -  When computer is not in use
02.02.2024 11:09:32 |  | -     max CPUs used: 8
02.02.2024 11:09:32 |  | -     Use at most 100% of the CPU time
02.02.2024 11:09:32 |  | -     suspend if non-BOINC CPU load exceeds 25%
02.02.2024 11:09:32 |  | -     max memory usage: 14.31 GB
02.02.2024 11:09:32 |  | -  Store at least 0.10 days of work
02.02.2024 11:09:32 |  | -  Store up to an additional 0.50 days of work
02.02.2024 11:09:32 |  | -  max disk usage: 158.26 GB
02.02.2024 11:09:32 |  | -  (to change preferences, visit a project web site or select Preferences in the Manager)
02.02.2024 11:09:32 |  | Setting up project and slot directories
02.02.2024 11:09:32 |  | Checking active tasks
02.02.2024 11:09:32 | Rosetta@home | Task RF_SAVE_ALL_OUT_NOJRAN_IGNORE_THE_REST_validation_1_13_3568_2970582_2_1 is 0.02 days overdue; you may not get credit for it.  Consider aborting it.
02.02.2024 11:09:32 | Rosetta@home | URL https://boinc.bakerlab.org/rosetta/; Computer ID 6279383; resource share 100
02.02.2024 11:09:32 |  | Setting up GUI RPC socket
02.02.2024 11:09:32 |  | Checking presence of 18 project files
02.02.2024 11:09:32 |  | Suspending GPU computation - computer is in use
02.02.2024 11:09:32 | Rosetta@home | Sending scheduler request: To fetch work.
02.02.2024 11:09:32 | Rosetta@home | Requesting new tasks for CPU
02.02.2024 11:09:33 | Rosetta@home | Scheduler request completed: got 0 new tasks
02.02.2024 11:09:33 | Rosetta@home | No tasks sent
02.02.2024 11:09:33 | Rosetta@home | Project requested delay of 31 seconds
ID: 108831 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1684
Credit: 17,932,799
RAC: 22,884
Message 108832 - Posted: 2 Feb 2024, 21:23:11 UTC - in response to Message 108831.  

02.02.2024 11:09:31 | | Reading preferences override file
This means you have set your preferences on the system itself, so all Web based preferences will be ignored.



02.02.2024 11:09:32 | | - suspend if non-BOINC CPU load exceeds 25%
Unless you're doing video encoding or similar work, there's really no need to suspend BOINC at all, even when it is in use for other things. If you really feel the need, then set it to something more reasonable such as 75% or 85% non-BOINC load before suspending.
When the computer is not in use, there's no need to suspend BOINC at all.



02.02.2024 11:09:32 | | - Store at least 0.10 days of work
02.02.2024 11:09:32 | | - Store up to an additional 0.50 days of work
Setting that to "Store at least 0.50 days work" and "Store up to an additional 0.01 days of work" will result in it actually carrying that much work.
The way you have it at present means BOINC will allow the work to drop down to around 0.10 days of work, before getting another 0.50 days worth.

If running more than one project, having no cache is best for getting your Resource share setting honoured quickly.



02.02.2024 11:09:32 | Rosetta@home | Task RF_SAVE_ALL_OUT_NOJRAN_IGNORE_THE_REST_validation_1_13_3568_2970582_2_1 is 0.02 days overdue; you may not get credit for it. Consider aborting it.
02.02.2024 11:09:32 | Rosetta@home | URL https://boinc.bakerlab.org/rosetta/; Computer ID 6279383; resource share 100[/code]
At the time you posted this log, you only had 1 Task in progress.
As of me posting this reply, you no longer have any Tasks- they've all timed out, been cancelled by the server, or returned, reported & Validated.

It'll just be the luck of the draw if you get work the next time some is sent out.
Grant
Darwin NT
ID: 108832 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
hadron

Send message
Joined: 4 Sep 22
Posts: 68
Credit: 1,559,185
RAC: 239
Message 109281 - Posted: 24 May 2024, 18:46:53 UTC - in response to Message 108832.  

02.02.2024 11:09:32 | | - Store at least 0.10 days of work
02.02.2024 11:09:32 | | - Store up to an additional 0.50 days of work
Setting that to "Store at least 0.50 days work" and "Store up to an additional 0.01 days of work" will result in it actually carrying that much work.
The way you have it at present means BOINC will allow the work to drop down to around 0.10 days of work, before getting another 0.50 days worth.

If running more than one project, having no cache is best for getting your Resource share setting honoured quickly.

I'm running 3 projects from LHC, one from Einstein, and one here. I frequently see Einstein or Rosetta tasks filling the queue while the other projects disappear from the queue because it's constantly full.
Are you suggesting that I could resolve this issue by setting both the "days of work" settings to 0?
ID: 109281 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1684
Credit: 17,932,799
RAC: 22,884
Message 109282 - Posted: 24 May 2024, 23:29:32 UTC - in response to Message 109281.  
Last modified: 24 May 2024, 23:32:21 UTC

I'm running 3 projects from LHC, one from Einstein, and one here. I frequently see Einstein or Rosetta tasks filling the queue while the other projects disappear from the queue because it's constantly full.
Are you suggesting that I could resolve this issue by setting both the "days of work" settings to 0?
Pretty much.
The whole reason for having a cache is to keep your system busy if the Project runs out of work for a while, or goes down. When you've got multiple Projects, the odds of all of them going down at the same time is pretty remote.
If one goes down, BOINC will do extra work for the others. When that one comes back, it will do extra work for that Project, to bring it back up on par with the other Projects & your Resource Share settings. Having a cache just makes that take longer.


The thing to keep in mind is your Resource Share settings- if all projects have the same Resource Share values (keep in mind they are ratios, not percentages), then they will be organised to do the same amount of work as each other. Notice i said work, not time. So for some Projects with an extremely efficient application, they may only have to run 1-2 Tasks to do the same amount of work as other Projects that might have to run 10-20 Tasks, or 1 or 2 Tasks that take much, much longer to process, in order to do the same amount of work as the first Project.
The Credit you get for each Task is meant to be the indicator of work done- but some projects underpay, many projects over pay (some massively). So when it comes to scheduling work, REC (Recent Estimated Credit) is used, not the Credit awarded by the Projects.
The more Projects you have, and the larger the cache, the longer it will take for your Resource Share settings to be honoured, in extreme cases it can take months. and that's if you don't tweak things (suspending & unsuspending Projects or individual Tasks)- then it will take even longer still.

So for your Resource Share to be honoured as soon as possible (ie days instead of weeks or months), the smallest possible cache is best. The Scheduler doesn't have to do nearly as much juggling of different deadlines (days, weeks, months depending on the Project & application). You don't end up with Tasks often going into High priority mode to avoid missing deadlines, and the then Scheduler having to juggle the Tasks that were postponed to get those other Tasks done.



You say you are running 3 Projects from LHC- keep in mind that to BOINC, LHC is just 1 Project. So as far as BOINC is concerned you are running 4 Projects.
Different applications within a Project are just that- applications under that Project.

The number and type of applications being run for LHC will depend on what work is available at the time BOINC requests more work from LHC to meet your Resuorce Share settings.
If you want to do more LHC work, then bump up it's Resource Share value. Keep in mind- the larger your cache, the longer it will take for things to settle down.



I'd suggest
Store at least 0.10 days of work
Store up to an additional 0.01 days of work
That way your Resource Share settings will be met within a few days instead of several weeks (or longer), but there will still be a few Tasks in the cache so when one Task finishes, another will be ready to go straight away (you won't have to wait for it to download before it can start after the last one has finished).
I'd give that a shot, give it a week or so, and see if that produces the result you're after. If not, then Increase the Resource Share for LHC and give that a few days to see the effect it has (keep in mind- the values are a ratio, not a percentage).
Grant
Darwin NT
ID: 109282 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
hadron

Send message
Joined: 4 Sep 22
Posts: 68
Credit: 1,559,185
RAC: 239
Message 109283 - Posted: 25 May 2024, 2:34:43 UTC - in response to Message 109282.  

I'm running 3 projects from LHC, one from Einstein, and one here. I frequently see Einstein or Rosetta tasks filling the queue while the other projects disappear from the queue because it's constantly full.
Are you suggesting that I could resolve this issue by setting both the "days of work" settings to 0?
Pretty much.
The whole reason for having a cache is to keep your system busy if the Project runs out of work for a while, or goes down. When you've got multiple Projects, the odds of all of them going down at the same time is pretty remote.
If one goes down, BOINC will do extra work for the others. When that one comes back, it will do extra work for that Project, to bring it back up on par with the other Projects & your Resource Share settings. Having a cache just makes that take longer.

<snip>

You say you are running 3 Projects from LHC- keep in mind that to BOINC, LHC is just 1 Project. So as far as BOINC is concerned you are running 4 Projects.
Different applications within a Project are just that- applications under that Project.

The number and type of applications being run for LHC will depend on what work is available at the time BOINC requests more work from LHC to meet your Resuorce Share settings.
If you want to do more LHC work, then bump up it's Resource Share value. Keep in mind- the larger your cache, the longer it will take for things to settle down.



I'd suggest
Store at least 0.10 days of work
Store up to an additional 0.01 days of work
That way your Resource Share settings will be met within a few days instead of several weeks (or longer), but there will still be a few Tasks in the cache so when one Task finishes, another will be ready to go straight away (you won't have to wait for it to download before it can start after the last one has finished).
I'd give that a shot, give it a week or so, and see if that produces the result you're after. If not, then Increase the Resource Share for LHC and give that a few days to see the effect it has (keep in mind- the values are a ratio, not a percentage).


Excellent explanation; thanks.
The "days of work" settings are already at 0.1/.01, so I'm definitely now thinking about changing those resource share settings; since I have 3 LHC projects and 1 each for Einstein and Rosetta, I'm thinking that something like a 60/30/30 ratio might be appropriate. Your thoughts?
ID: 109283 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1684
Credit: 17,932,799
RAC: 22,884
Message 109284 - Posted: 25 May 2024, 3:22:25 UTC - in response to Message 109283.  
Last modified: 25 May 2024, 3:43:28 UTC

The "days of work" settings are already at 0.1/.01,
For a week or so?
If they've been changed recently, depending on what they were, i'd expect it to take 2-3 days for the changes to show up in completion times.


so I'm definitely now thinking about changing those resource share settings; since I have 3 LHC projects and 1 each for Einstein and Rosetta, I'm thinking that something like a 60/30/30 ratio might be appropriate. Your thoughts?
If that's the way you want to do it, then go for it.

I'd suggest 200/100/100 (the default value is 100, so instead of having to change the Resource Share value at each Project, you only need to change it once at LHC). Keep in mind it's still no guarantee of getting work for all 3 applications. If the Project doesn't have any Tasks for that application, then you won't be able to get any for it (eg at present LHC is showing less than 200 Tasks for CMS, while there are thousands for the other applications- which means the odds of getting any CMS work are very, very, very slight compared to the other two).
Just make sure to select LHC in the BOINC Manager and click on Update to make the changes take effect straight away, instead of waiting for the next time a Task is done & the Manager gets around to reporting it.

As i said, what you call 3 LHC projects, aren't actually Projects- they're just different science applications for that Project. Just like here at Rosetta- there are 3 different applications- Rosetta 4.20, Rosetta Beta, and Python (which we haven't seen for ages thankfully (requires VBox, and is a pig for memory & disk space)).



One thing i didn't check on- are you using all your cores/threads for BOINC work? The mores cores/threads in use, the faster your Resource Share is honoured. And of course, the more Tasks of any given Project that can be done at any given time.

EDIT- this appears to be especially so for Atlas & CMS Tasks as they are multithreaded and depending on your settings in your LHC account, can require 4 or 8 cores/threads for each running Task.
Grant
Darwin NT
ID: 109284 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
hadron

Send message
Joined: 4 Sep 22
Posts: 68
Credit: 1,559,185
RAC: 239
Message 109287 - Posted: 25 May 2024, 4:50:52 UTC - in response to Message 109284.  

I'm using 22 threads out of 24 for Boinc. That seems to still leave me plenty of processing power for the rest of the system.

The current days of work settings have been in place only for a few days; I'd been toying with stupidly large values trying to even things out, but I've given up trying -- now Boinc can do what it will do, and I will try to control that as best i can.

There aren't that many CMS tasks right now because they're still trying to work out all the kinks with the new multi-core tasks. However, compared with Atlas and Theory, CMS has a rather small cadre of crunchers, so I always seem to be able to get something.

"Multi-thread tasks" is somewhat of misnomer. My experience shows that such tasks are capable of using up to a set maximum number of threads, and ultimately, the number of threads actually used by the VM is passed as a command-line parameter, with the number of threads being sent by the project site, or by being fixed in a setting in the appropriate app_config.xml file.
For example, I have LHC set to send "single-thread tasks". That merely means the default setting is to run them in a single thread, unless a local over-ride is given. I do so in my app_config file as follows (this is the setting for CMS tasks):
 <app_version>
        <app_name>CMS</app_name>
        <avg_ncpus>4</avg_ncpus>
        <plan_class>vbox64_mt_mcore_cms</plan_class>
        <cmdline>--nthreads 4</cmdline>
    </app_version>
ID: 109287 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1684
Credit: 17,932,799
RAC: 22,884
Message 109288 - Posted: 25 May 2024, 5:19:57 UTC - in response to Message 109287.  

I'm using 22 threads out of 24 for Boinc. That seems to still leave me plenty of processing power for the rest of the system.
It appears to be plenty- the difference between CPU time & Run time for your Tasks is around 10 min for the 3hr Tasks, which would indicate a moderately busy system if all cores/threads were available to BOINC.


The current days of work settings have been in place only for a few days; I'd been toying with stupidly large values trying to even things out, but I've given up trying -- now Boinc can do what it will do, and I will try to control that as best i can.
If it had some lagre values, then it will take longer for all of that to clear, and the Scheduler to sort things out.
Give it a few more days with the 0.1 & 0.01 values & things should settle down nicely.

If it still doesn't pick up as much LHC work as you would like, then bump up it's Resource Share, and then give it another few days to see what the effect is.
Grant
Darwin NT
ID: 109288 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
hadron

Send message
Joined: 4 Sep 22
Posts: 68
Credit: 1,559,185
RAC: 239
Message 109302 - Posted: 28 May 2024, 0:57:45 UTC - in response to Message 109288.  

I'm using 22 threads out of 24 for Boinc. That seems to still leave me plenty of processing power for the rest of the system.
It appears to be plenty- the difference between CPU time & Run time for your Tasks is around 10 min for the 3hr Tasks, which would indicate a moderately busy system if all cores/threads were available to BOINC.


The current days of work settings have been in place only for a few days; I'd been toying with stupidly large values trying to even things out, but I've given up trying -- now Boinc can do what it will do, and I will try to control that as best i can.
If it had some lagre values, then it will take longer for all of that to clear, and the Scheduler to sort things out.
Give it a few more days with the 0.1 & 0.01 values & things should settle down nicely.

If it still doesn't pick up as much LHC work as you would like, then bump up it's Resource Share, and then give it another few days to see what the effect is.



well, it's 3 days now, and things look like they've settled down. I am not satisfied with the results.
Current settings are days of work 0.1/.0l; resource shares LHC 400, Rosetta 200, Einstein 100.
Currently running tasks are: Rosetta and Einstein 8 each (controlled by app_config files); LHC - 1 Atlas (2 cores) and 1 Theory
Job queue 12 Einstein and 1 Rosetta.
Boinc will only pull in a new task from LHC when it reports a finished task; all other times, it won't request new tasks because it "doesn't need" them.
I don't know if task run times are at all relevant, but in case they are:
Theory can take anywhere from half an hour to several days. The one that's running now has been running for 5-1/2 days, and still has over 4 days to complete. It may be one of those all-too-frequent Theory tasks that will run to just under its drop-dead time, then fail with a computation error. However, it's too early right now to be able to tell. Maybe I'll just assume it's gonna die, and kill it off so I can get a new one.
Atlas tasks take 10 or 11 hours to complete, Rosetta 3 hours, and Einstein 1-1/2 to 2 hours. I have no way of knowing how much "real work" each of those represents.
My RAC on LHC is dropping like a rock.

I'll leave things as they are for now if you think it best, but I have to say, I am sorely tempted to blow that LHC resource share through the roof, say to something like 5000, to see what happens.
ID: 109302 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1684
Credit: 17,932,799
RAC: 22,884
Message 109303 - Posted: 28 May 2024, 6:20:01 UTC - in response to Message 109302.  

Currently running tasks are: Rosetta and Einstein 8 each (controlled by app_config files)
In what way are the being controller by app_config files?
Whatever settings you've got there will impact on work Scheduling, as the Scheduler will have to work around those app_config settings.


I don't know if task run times are at all relevant
Yep, they are.
In theory- A Task on one Project that runs for a month should result in the same Credit as Tasks on another Project that run for only a couple of minutes each, over that month.
But if the long running Task doesn't make use of trickles (where Credit is granted as it progresses through the Task), then since the Taks that are very short running will be the majority of those on the system and being regularly returned and granted Credit, while there will be just that one long running Task that won't get any Credit until it is completed & reported, then you'll get the effect you're describing at present- one Task running for a Project, multiple Tasks running for other Projects, and their RAC going up while the single Task Project's RAC goes down.
Until that Task is completed, and it gets Credit, then the RAC will spike (or until the other Projects get ahead of the work debt owed to the Project with one long running Task, in which case it will pick up another task (or more) for that Project).

Having lots of cores, no cache help Resource share to be honoured sooner rather than later. Unfortunately big difference in processing time between projects means it takes longer than it otherwise would. And big difference between Task processing time for different Applications for the same Project just adds to the complications.



I'll leave things as they are for now if you think it best, but I have to say, I am sorely tempted to blow that LHC resource share through the roof, say to something like 5000, to see what happens.
Give it until that exceptionally long running Task is done, and hopefully you'll pick up a more usual runtime Task, then see how it behaves.
Otherwise, no more than double the present LHC resource Share value, otherwise LHC will spike up, and the others will fall down, so you'll bump theirs up (or the LHC down again which would be better), then they'll pick up & LHC will fall, so you bump up it's share again, it'll spike the others will fall, rinse and repeat for months till eventually you get tired of it or things get close to what you want.
Grant
Darwin NT
ID: 109303 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
hadron

Send message
Joined: 4 Sep 22
Posts: 68
Credit: 1,559,185
RAC: 239
Message 109304 - Posted: 28 May 2024, 7:12:49 UTC - in response to Message 109303.  

Currently running tasks are: Rosetta and Einstein 8 each (controlled by app_config files)
In what way are the being controller by app_config files?
Whatever settings you've got there will impact on work Scheduling, as the Scheduler will have to work around those app_config settings.
LHC;
<app_config>
     <app>
        <name>CMS</name>
        <max_concurrent>2</max_concurrent>
    </app>
    <app_version>
        <app_name>CMS</app_name>
        <avg_ncpus>4</avg_ncpus>
        <plan_class>vbox64_mt_mcore_cms</plan_class>
        <cmdline>--nthreads 4</cmdline>
    </app_version>
   <app>
        <name>Theory</name>
        <max_concurrent>8</max_concurrent>
    </app>
    <app>
        <name>ATLAS</name>
        <max_concurrent>2</max_concurrent>
    </app>
    <app_version>
        <app_name>ATLAS</app_name>
        <avg_ncpus>2</avg_ncpus>
        <plan_class>vbox64_mt_mcore_atlas</plan_class>
        <cmdline>--nthreads 2</cmdline>
    </app_version>
</app_config>
Rosetta:
<app_config>
   <project_max_concurrent>8</project_max_concurrent>
</app_config>
Einstein:
<app_config>
   <app>
       <name>hsgamma_FGRP5</name>
       <max_concurrent>8</max_concurrent>
   </app>
   <app>
       <name>einstein_O3MD1</name>
       <max_concurrent>2</max_concurrent>
   </app>
<app>
    <name>einsteinbinary_BRP4G</name>
       <max_concurrent>4</max_concurrent>
</app>
</app_config>

Only FGRP5 is relevant; O3MD1 (running data gathered by LIGO) was replaced by a GPU-only project, and Boinc just turns up its nose at my GPU (RX560 with 2 GB). BRP4G is a binary pulsar project, which doesn't really interest me very much. For that matter, FGRP5 (crunching gamma ray pulsar data) doesn't really interest me either, but one must do something while waiting to afford another GPU, yes? ;)
I really should clear out all the Einstein tasks, remove those two from the config file, and do a reset, but I've rather too lazy to do that :D



I don't know if task run times are at all relevant
Yep, they are.
In theory- A Task on one Project that runs for a month should result in the same Credit as Tasks on another Project that run for only a couple of minutes each, over that month.
But if the long running Task doesn't make use of trickles
<snip>

Not quite what I had in mind, but it makes a great deal of sense. Make use of trickles? Yeah, we'll see that the day after Bezos and Musk donate all their money to the poor. So Boinc really has no way of knowing what credit that lone Theory task will yield, so it basically has to guess what task loads it should grab. For that matter, the same can pretty much be said of the Atlas tasks too, which until recently were running on one thread, now on two. But even with 2 threads, they still take 4 times as long as one of those Einstein tasks.


I'll leave things as they are for now if you think it best, but I have to say, I am sorely tempted to blow that LHC resource share through the roof, say to something like 5000, to see what happens.
Give it until that exceptionally long running Task is done, and hopefully you'll pick up a more usual runtime Task, then see how it behaves.
Otherwise, no more than double the present LHC resource Share value, otherwise LHC will spike up, and the others will fall down, so you'll bump theirs up (or the LHC down again which would be better), then they'll pick up & LHC will fall, so you bump up it's share again, it'll spike the others will fall, rinse and repeat for months till eventually you get tired of it or things get close to what you want.

Which is why I have somehow managed to resist that temptation.
ID: 109304 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1684
Credit: 17,932,799
RAC: 22,884
Message 109305 - Posted: 28 May 2024, 9:09:57 UTC - in response to Message 109304.  
Last modified: 28 May 2024, 9:17:43 UTC

Currently running tasks are: Rosetta and Einstein 8 each (controlled by app_config files)
In what way are the being controller by app_config files?
Whatever settings you've got there will impact on work Scheduling, as the Scheduler will have to work around those app_config settings.
LHC;
<max_concurrent>2</max_concurrent>
<max_concurrent>8</max_concurrent>
<max_concurrent>2</max_concurrent>
<project_max_concurrent>8</project_max_concurrent>
<max_concurrent>8</max_concurrent>
<max_concurrent>2</max_concurrent>
<max_concurrent>4</max_concurrent>
Ok, there's what's limiting how much work & of what type your system can do.

By limiting how many of any given application can be run at any given time, and by limiting the number of Tasks that can be run at any given time for a particular Project, you are tying the Scheduler up in knots.
Your Resource Share settings tell it one thing, but then you limit the amount of work it can do for multiple Projects, and you don't want cores sitting there unused, so it does what it can to keep cores busy by processing work for other Projects that aren't so limited in what they can do.
Without those limits, the Manager will be able to do more of everything- if a couple of Projects run out of work, or go down, it will pick up extra from the remaining Project(s). When the down ones come back, it'll pick up more work from them, less from the other, until it catches up. As things are many cores & threads would be unused if that were to occur.

As things are, you've told it what you want to happen with your Resource Share values, but then pretty much tied both it's arms behind it's back by limiting how much it is able to do (even though it's capable of doing much more), and it's just doing what it can, within those very restrictive limits.




If you're feeling game, i'd suggest making a copy of all of your app_config files and put them somewhere safe.
Then remove the max_concurrent, project_max_concurrent from all of them, exit & restart.

Given the Runtime of some of the Tasks, i would expect chaos to ensue for 12-24 hours, but after that things should settle down, much more along the lines of what you want- more LHC Tasks running, less Rosetta, and even less Einstein (eventually- it's always given the BOINC Manager work Scheduler issues).


Edit- you may need to re-apply max_concurrent for a particular application if it has Tasks that use huge amounts of RAM. But as your system shows 64GB of RAM; with 24 cores/threads it's unlikely to be necessary.
Grant
Darwin NT
ID: 109305 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
hadron

Send message
Joined: 4 Sep 22
Posts: 68
Credit: 1,559,185
RAC: 239
Message 109306 - Posted: 28 May 2024, 23:32:43 UTC - in response to Message 109305.  

Currently running tasks are: Rosetta and Einstein 8 each (controlled by app_config files)
In what way are the being controller by app_config files?
Whatever settings you've got there will impact on work Scheduling, as the Scheduler will have to work around those app_config settings.
LHC;
<max_concurrent>2</max_concurrent>
<max_concurrent>8</max_concurrent>
<max_concurrent>2</max_concurrent>
<project_max_concurrent>8</project_max_concurrent>
<max_concurrent>8</max_concurrent>
<max_concurrent>2</max_concurrent>
<max_concurrent>4</max_concurrent>
Ok, there's what's limiting how much work & of what type your system can do.

OK, let's take this in stages.
First off those last 2 lines represent Einstein sub-projects that I will never run again. They are still present in the app_config file only because I haven't got around to removing them -- to do so would require a complete project reset.
Next, the first and third lines are for CMS and Atlas tasks, respectively, which are set to run on 4 and 2 threads, respectively. Thus those first 3 lines actually represent the potential use of, not 12, but 20 threads. If I was actually getting stuff from LHC, there would be no problem, but for some reason I cannot grasp, and which you have not yet tried to address, Boinc seems to think that I don't want/need any tasks from LHC at all, except when one completed task is being reported.
The extreme case with those app_config files would be if LHC went down completely, and no new tasks of any kind were available. Then Einstein and Rosetta are allotted up to 16 tasks on 1 thread each -- leaving 6 unused. Do I care? Not really, so long as Boinc asks both projects to continue to deliver the work. If this were to happen, and it appeared that LHC tasks would be unavailable for some time (say, more than 3 days or so), then a quick change to the other two app_config files would be sufficient to take up the slack -- and then change them back once LHC returned.

If you're feeling game, i'd suggest making a copy of all of your app_config files and put them somewhere safe.
Then remove the max_concurrent, project_max_concurrent from all of them, exit & restart.

Umm... no.
In the Boinc user manual go to Client Configuration, and scroll down to the bottom of the section titled "Project-level configuration." This is the section which discusses the app_config file.
At the bottom, you will find this: If you remove app_config.xml, or one of its entries, you must reset the project in order to restore the proper values. However, it is never really necessary to do this just to change the max number of running tasks for any project. In this instance, and in several other parameters, the value '0' represents "unlimited". It is pretty much equivalent to not having the line in there at all.

So that is what I have done, for everything. I also reset the Resource Share values to default, and will leave those for the foreseeable future. When I get all current and near-future bills sorted out, I will be looking for a GPU capable of running the new Einstein LIGO tasks -- 4 of which look like they'll be able to run on a 16GB card. Talk about crazy -- Einstein as you probably know hands out a constant credit value for each task. For the gamma ray pulsars, it is 693 -- but for the LIGO ones, it's 10,000!! Then I can take another look at the Resource Share values, plus it will free up a few CPU threads for other projects, as I will no longer be running the pulsar stuff (I'm a gravitation theorist, not an astrophysicist -- I don't really care too much about pulsars, other than they make pretty pictures when you point the Webb telescope at them :D )

PS,Boinc seems to be working hard to overcome the massive beating my RAC on LHC took over the past few days -- 21 threads are allotted to LHC as I type.
ID: 109306 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1684
Credit: 17,932,799
RAC: 22,884
Message 109307 - Posted: 29 May 2024, 5:58:17 UTC - in response to Message 109306.  

Thus those first 3 lines actually represent the potential use of, not 12, but 20 threads.
Ok, but so what?
If BOINC needs them to honour your Resource Share it can make use of them, if not, then the other projects will make use of them.
But by putting in limits on the number of Tasks of a particular type that can be run, you hamstring BOINC's ability to get work in order to meet your Resource Share settings.



but for some reason I cannot grasp, and which you have not yet tried to address, Boinc seems to think that I don't want/need any tasks from LHC at all, except when one completed task is being reported.
All i know is that you have limited it's ability to process work by using max_concurrent, for both applications and on a Project (while some may no longer have work, the others do so they will be limited in the work that can be done to meet your Resource Share settings).
With no such limitations, and all but two cores/threads available for it's use, it would quickly be able to meet your Resource Share targets. Limiting the number of cores/threads, means it takes longer to work out the balance between Projects.

If you want to know what the Manager is thinking when it comes to fetching work, then in the BOINC Manager, Options, Event log options, cpu_sched_debug (sched_op_debug for an even deeper dive into the goings on) and work_fetch_debug is another one that might be of use.
Prepare for an avalanche of event log output.



If you're feeling game, i'd suggest making a copy of all of your app_config files and put them somewhere safe.
Then remove the max_concurrent, project_max_concurrent from all of them, exit & restart.

Umm... no.
In the Boinc user manual go to Client Configuration, and scroll down to the bottom of the section titled "Project-level configuration." This is the section which discusses the app_config file.
At the bottom, you will find this: If you remove app_config.xml, or one of its entries, you must reset the project in order to restore the proper values. However, it is never really necessary to do this just to change the max number of running tasks for any project. In this instance, and in several other parameters, the value '0' represents "unlimited". It is pretty much equivalent to not having the line in there at all.
Then set them to 0, Options, Read config files. (or exit BOINC& restart BOINC).
Regardless of what the manual might say, personally, several times over the years, i have removed the the app_config file, exited & restarted BOINC with no issues. If there is no app_config file, then there are no values the Manager can read & then use. So it just runs with the applications defaults (And then later i've put in a modified app_info and just read it in for the values to take effect. I never tried it with the value set to 0, exiting BOINC and restarting after removing the file did the job of undoing what it had done).



PS,Boinc seems to be working hard to overcome the massive beating my RAC on LHC took over the past few days -- 21 threads are allotted to LHC as I type.
Or just let things be for a while see how things end up.
Grant
Darwin NT
ID: 109307 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
hadron

Send message
Joined: 4 Sep 22
Posts: 68
Credit: 1,559,185
RAC: 239
Message 109309 - Posted: 29 May 2024, 11:10:10 UTC - in response to Message 109307.  

Thus those first 3 lines actually represent the potential use of, not 12, but 20 threads.
Ok, but so what?
If BOINC needs them to honour your Resource Share it can make use of them, if not, then the other projects will make use of them.
But by putting in limits on the number of Tasks of a particular type that can be run, you hamstring BOINC's ability to get work in order to meet your Resource Share settings.

I was addressing this point you made -- I see it was in what I trimmed out:
As things are, you've told it what you want to happen with your Resource Share values, but then pretty much tied both it's arms behind it's back by limiting how much it is able to do (even though it's capable of doing much more), and it's just doing what it can, within those very restrictive limits.

You keep going back to the total amount of work being done here, suggesting that at times I will have periods during which there are several unused CPU threads.
I assure you, that has never been the case, and pretty much the only time it could possibly have happened is if LHC suddenly stopped delivering any tasks at all -- in which case, I could simply increase the limit on the other two projects until LHC returned to life.
Rather, the only problem I was having Boinc not requesting new LHC tasks, while it continued to grab all the Einstein stuff it could.
You keep talking about Resource Share settings, but have never asked me just what those settings were. Except for that very brief period when I set them to 400/200/100 (which I did mention), they were always the same for all three projects -- and, as I mentioned, they are all the same once more.
So I am having a hard time understanding why this is such an issue for you; regardless of any max_cpu settings, Boinc should always have been fetching work from all three projects, I would think in more or less in equal proportions.
And for the very last time -- the max_concurrent settings should not ever have any influence over Boinc requesting enough work for each project to be able to meet the resource share targets. The only effect they can possibly have is in how long it takes to reach those targets -- which I have painfully tried to suggest, I simply do not care.

Prepare for an avalanche of event log output.

Yeah, I already knew this -- I'll pass, thank you. It's all probably of use to no one but the folks who wrote the code.

Regardless of what the manual might say, personally, several times over the years, i have removed the the app_config file, exited & restarted BOINC with no issues. If there is no app_config file, then there are no values the Manager can read & then use. So it just runs with the applications defaults (And then later i've put in a modified app_info and just read it in for the values to take effect. I never tried it with the value set to 0, exiting BOINC and restarting after removing the file did the job of undoing what it had done).

Well, I figure it must be meaningful in some way, otherwise the warning would not be there in the first plce
Besides, it's a helluva lot easier just to replace a non-zero number with 0, save the file, then tell Boinc to read it in :D


PS,Boinc seems to be working hard to overcome the massive beating my RAC on LHC took over the past few days -- 21 threads are allotted to LHC as I type.
Or just let things be for a while see how things end up.

You may have misunderstood what I was saying there -- apologies if I caused any confusion. Those 21 threads were all Boinc's doing, not mine.
ID: 109309 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1684
Credit: 17,932,799
RAC: 22,884
Message 109317 - Posted: 30 May 2024, 6:32:40 UTC - in response to Message 109309.  

And for the very last time -- the max_concurrent settings should not ever have any influence over Boinc requesting enough work for each project to be able to meet the resource share targets. The only effect they can possibly have is in how long it takes to reach those targets -- which I have painfully tried to suggest, I simply do not care.
I agree- limiting the number of Tasks that can run for a particular application won't stop your Resource Share settings from being met, it just increases the amount of time till it can be done (and that increase can be significant- with low core/thread systems with lots of active project at can take months. But even with your limits, the total number of cores/threads you've got available with only a few projects i would expect a week or two at most for your system (assuming no project issues- then all bets are off)).

And until you explicitly stated you didn't care how long it took, i had no idea that was your position. From the posts you were making i was under the impression you wanted things to sort themselves out sooner rather than later, hence my suggestion to remove the limits.



PS,Boinc seems to be working hard to overcome the massive beating my RAC on LHC took over the past few days -- 21 threads are allotted to LHC as I type.
Or just let things be for a while see how things end up.
You may have misunderstood what I was saying there -- apologies if I caused any confusion. Those 21 threads were all Boinc's doing, not mine.
Yep, i understood it as BOINC's doing, that's why i suggested there leaving things as they are instead of changing them further as i had suggested earlier in the post.
Grant
Darwin NT
ID: 109317 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
hadron

Send message
Joined: 4 Sep 22
Posts: 68
Credit: 1,559,185
RAC: 239
Message 109320 - Posted: 30 May 2024, 14:15:00 UTC - in response to Message 109317.  

And for the very last time -- the max_concurrent settings should not ever have any influence over Boinc requesting enough work for each project to be able to meet the resource share targets. The only effect they can possibly have is in how long it takes to reach those targets -- which I have painfully tried to suggest, I simply do not care.
I agree- limiting the number of Tasks that can run for a particular application won't stop your Resource Share settings from being met, it just increases the amount of time till it can be done (and that increase can be significant- with low core/thread systems with lots of active project at can take months. But even with your limits, the total number of cores/threads you've got available with only a few projects i would expect a week or two at most for your system (assuming no project issues- then all bets are off)).

And until you explicitly stated you didn't care how long it took, i had no idea that was your position. From the posts you were making i was under the impression you wanted things to sort themselves out sooner rather than later, hence my suggestion to remove the limits.


My apologies. I thought I was making my case clear -- evidently not.
Things are running fairly smoothly right now, but alas, no gamma ray pulsar tasks are currently available from Einstein, so I'm basically just running in wait-and-see mode until they return. In the interim, I've re-enabled the binary pulsar tasks (allowing 2 of them concurrently) just to keep the project's presence in Boinc's calculations. I might just bite the bullet now and get a second GPU to run the gravitational wave tasks. I have my eye on this one: https://www.newegg.ca/p/pl?N=100007708%20601110192%20601292089%20600100181%2050001315 but I need to check on Einstein to find out how many tasks I can run on it.

PS,Boinc seems to be working hard to overcome the massive beating my RAC on LHC took over the past few days -- 21 threads are allotted to LHC as I type.
Or just let things be for a while see how things end up.
You may have misunderstood what I was saying there -- apologies if I caused any confusion. Those 21 threads were all Boinc's doing, not mine.
Yep, i understood it as BOINC's doing, that's why i suggested there leaving things as they are instead of changing them further as i had suggested earlier in the post.

OK, just wanted to be sure.
Things may have settled down already. ATM, there are 11 Rosetta tasks running, along with 4 Atlas (2 threads each) and 1 Theory from LHC, and the 2 "placeholder" pulsar tasks from Einstein. If I do get that GPU now, I might just dispense with running CPU-only tasks from Einstein, as the gravitational wave tasks seem to be taking 3/4 of an hour to complete, and earn 10K credits each!
ID: 109320 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1684
Credit: 17,932,799
RAC: 22,884
Message 109322 - Posted: 31 May 2024, 4:39:52 UTC - in response to Message 109320.  

If I do get that GPU now, I might just dispense with running CPU-only tasks from Einstein, as the gravitational wave tasks seem to be taking 3/4 of an hour to complete, and earn 10K credits each!
A good GPU application can produce a huge amount of work.

Back on Seti, a Task that would take about 4 hours with an optimised application on what was a high-end CPU at the time, a high end GPU could do it in around 30 min from memory.
Then someone decided to have a go at optimising the GPU application. By the time Set ended, they were being churned out every 30 seconds on that same GPU.
From 4 hours, to 30 seconds, is a rather amazing improvement.
Grant
Darwin NT
ID: 109322 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
1 · 2 · Next

Questions and Answers : Preferences : Rosetta uses only 1 core



©2024 University of Washington
https://www.bakerlab.org