Questions and Answers : Unix/Linux : All tasks in scheduler state uninitialized
Author | Message |
---|---|
SGAI-CSIC Send message Joined: 4 Apr 20 Posts: 19 Credit: 15,069,615 RAC: 0 |
Hi, I have observed that all tasks are in scheduler state: uninitialized. It's normal? What does it mean? 1) ----------- name: hgfp_dimer_5x_254_fold_SAVE_ALL_OUT_906972_605_0 WU name: hgfp_dimer_5x_254_fold_SAVE_ALL_OUT_906972_605 project URL: https://boinc.bakerlab.org/rosetta/ received: Fri Apr 10 17:07:41 2020 report deadline: Mon Apr 13 17:07:41 2020 ready to report: no state: downloaded scheduler state: uninitialized active_task_state: UNINITIALIZED app version num: 378 resources: 1 CPU estimated CPU time remaining: 22168.405735 2) ----------- name: 7v1nm_gg_c274_7mer_gb_00510_SAVE_ALL_OUT_907410_372_0 WU name: 7v1nm_gg_c274_7mer_gb_00510_SAVE_ALL_OUT_907410_372 project URL: https://boinc.bakerlab.org/rosetta/ received: Fri Apr 10 17:07:41 2020 report deadline: Mon Apr 13 17:07:41 2020 ready to report: no state: downloaded scheduler state: uninitialized active_task_state: UNINITIALIZED app version num: 412 resources: 1 CPU estimated CPU time remaining: 81737.392995 3) ----------- name: hgfp_dimer_3x_317_fold_SAVE_ALL_OUT_906914_611_0 WU name: hgfp_dimer_3x_317_fold_SAVE_ALL_OUT_906914_611 project URL: https://boinc.bakerlab.org/rosetta/ received: Fri Apr 10 17:23:20 2020 report deadline: Mon Apr 13 17:23:20 2020 ready to report: no state: downloaded scheduler state: uninitialized active_task_state: UNINITIALIZED app version num: 378 resources: 1 CPU |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
I believe it just means that they have not yet started to run. Are there other tasks (perhaps from other BOINC projects) that are running now? Have you set the preferences to limit the hours when BOINC can run? Which host are you talking about? Rosetta Moderator: Mod.Sense |
SGAI-CSIC Send message Joined: 4 Apr 20 Posts: 19 Credit: 15,069,615 RAC: 0 |
Past days the tasks are running normally and It doesn't run other BOINC projects. I have 5 (clone) system with CentOS and in all the task are in uninitialized state. ID:4061992 ID: 4062010 ID: 4061905 ID: 4062012 ID: 4061963 I have not limit when BOINC can run. The log boincmd --get-messages it's seems normal: 294: 12-Apr-2020 12:48:47 (user notification) [Rosetta@home] This project is using an old URL. When convenient, remove the project, then add https://boinc.bakerlab.org/rosetta/ 295: 12-Apr-2020 12:48:47 (low) [Rosetta@home] General prefs: from Rosetta@home (last modified 12-Apr-2020 11:51:30) 296: 12-Apr-2020 12:48:47 (low) [Rosetta@home] Host location: none 297: 12-Apr-2020 12:48:47 (low) [Rosetta@home] General prefs: using your defaults 298: 12-Apr-2020 12:48:47 (low) [] Preferences: 299: 12-Apr-2020 12:48:47 (low) [] max memory usage when active: 1894.50 MB 300: 12-Apr-2020 12:48:47 (low) [] max memory usage when idle: 3410.10 MB 301: 12-Apr-2020 12:48:47 (low) [] max disk usage: 7.56 GB 302: 12-Apr-2020 12:48:47 (low) [] don't use GPU while active 303: 12-Apr-2020 12:48:47 (low) [] suspend work if non-BOINC CPU load exceeds 75% 304: 12-Apr-2020 12:48:47 (low) [] (to change preferences, visit a project web site or select Preferences in the Manager) 305: 12-Apr-2020 12:48:49 (low) [Rosetta@home] Started download of hgfp_het2_215_data.zip 306: 12-Apr-2020 12:48:49 (low) [Rosetta@home] Started download of hgfp_good_frag_184_data.zip 307: 12-Apr-2020 12:49:00 (low) [Rosetta@home] Finished download of hgfp_het2_215_data.zip 308: 12-Apr-2020 12:49:03 (low) [Rosetta@home] Finished download of hgfp_good_frag_184_data.zip I have adjust in Rosetta@home preferences Target CPU run time 1 day, It's ok? |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Your runtime preference will not effect when tasks run. BOINC Manager decides that, and is unaware of the runtime preferences. Looks like your systems have 4 cores, and 4 GB of memory. What happens if you have the CPU preference to use at most 50% of the CPUs? Rosetta Moderator: Mod.Sense |
SGAI-CSIC Send message Joined: 4 Apr 20 Posts: 19 Credit: 15,069,615 RAC: 0 |
I have adjusted the "Use at most 50 % of the CPUs" and in one host reset the project: boinccmd --project "https://boinc.bakerlab.org/rosetta/" reset but unlucky the tasks continue uninitialized: boinccmd --get_tasks ======== Tasks ======== 1) ----------- name: hgfp_good_frag_52_fold_SAVE_ALL_OUT_909178_181_0 WU name: hgfp_good_frag_52_fold_SAVE_ALL_OUT_909178_181 project URL: https://boinc.bakerlab.org/rosetta/ received: Mon Apr 13 11:28:38 2020 report deadline: Thu Apr 16 11:28:37 2020 ready to report: no state: downloaded scheduler state: uninitialized active_task_state: UNINITIALIZED app version num: 378 resources: 1 CPU estimated CPU time remaining: 27485.364576 2) ----------- [...] The logs doesn't show nothing strange: # boinccmd --get_messages 13-Apr-2020 11:28:25 (low) [Rosetta@home] Resetting project 31: 13-Apr-2020 11:28:30 (low) [Rosetta@home] Master file download succeeded 32: 13-Apr-2020 11:28:35 (low) [Rosetta@home] Sending scheduler request: To fetch work. 33: 13-Apr-2020 11:28:35 (low) [Rosetta@home] Requesting new tasks for CPU 34: 13-Apr-2020 11:28:38 (low) [Rosetta@home] Scheduler request completed: got 4 new tasks 35: 13-Apr-2020 11:28:38 (user notification) [Rosetta@home] This project is using an old URL. When convenient, remove the project, then add https://boinc.bakerlab.org/rosetta/ 36: 13-Apr-2020 11:28:38 (low) [Rosetta@home] General prefs: from Rosetta@home (last modified 13-Apr-2020 11:13:01) 37: 13-Apr-2020 11:28:38 (low) [Rosetta@home] Host location: none 38: 13-Apr-2020 11:28:38 (low) [Rosetta@home] General prefs: using your defaults 39: 13-Apr-2020 11:28:38 (low) [] Preferences: 40: 13-Apr-2020 11:28:38 (low) [] max memory usage when active: 1894.50 MB 41: 13-Apr-2020 11:28:38 (low) [] max memory usage when idle: 3410.10 MB 42: 13-Apr-2020 11:28:38 (low) [] max disk usage: 7.56 GB 43: 13-Apr-2020 11:28:38 (low) [] Number of usable CPUs has changed from 4 to 2. 44: 13-Apr-2020 11:28:38 (low) [] max CPUs used: 2 45: 13-Apr-2020 11:28:38 (low) [] don't use GPU while active 46: 13-Apr-2020 11:28:38 (low) [] suspend work if non-BOINC CPU load exceeds 75% 47: 13-Apr-2020 11:28:38 (low) [] (to change preferences, visit a project web site or select Preferences in the Manager) 48: 13-Apr-2020 11:28:40 (low) [Rosetta@home] Started download of rosetta_4.15_x86_64-pc-linux-gnu 49: 13-Apr-2020 11:28:40 (low) [Rosetta@home] Started download of rosetta_graphics_4.15_x86_64-pc-linux-gnu 50: 13-Apr-2020 11:30:44 (low) [Rosetta@home] Finished download of rosetta_graphics_4.15_x86_64-pc-linux-gnu 51: 13-Apr-2020 11:30:44 (low) [Rosetta@home] Started download of database_357d5d93529_n_methyl.zip 52: 13-Apr-2020 11:31:01 (low) [Rosetta@home] Finished download of rosetta_4.15_x86_64-pc-linux-gnu |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
The log says that for R@h you are "...using an old URL". Yet the project URL has not changed. What URL did you attach to? Rosetta Moderator: Mod.Sense |
SGAI-CSIC Send message Joined: 4 Apr 20 Posts: 19 Credit: 15,069,615 RAC: 0 |
I'm using this URL in order to attach to Rosetta: boinccmd --project_attach "https://boinc.bakerlab.org/rosetta/" What's the correct URL? |
SGAI-CSIC Send message Joined: 4 Apr 20 Posts: 19 Credit: 15,069,615 RAC: 0 |
Ok the URL it's with http not https, I have detach: boinccmd --project https://boinc.bakerlab.org/rosetta/ detach And again attach to http ... # boinccmd --project_attach "https://boinc.bakerlab.org/rosetta/" "..." I try to resume: # boinccmd --project https://boinc.bakerlab.org/rosetta/ resume But again all tasks it's uninitialized: 1) ----------- name: 5e2680b18d3ff769ed2d8d58de5013ef_start_1900_20_04_19_27_51_globalDocking_2_SAVE_ALL_OUT_913185_24_0 WU name: 5e2680b18d3ff769ed2d8d58de5013ef_start_1900_20_04_19_27_51_globalDocking_2_SAVE_ALL_OUT_913185_24 project URL: https://boinc.bakerlab.org/rosetta/ received: Mon Apr 13 15:56:02 2020 report deadline: Thu Apr 16 15:56:01 2020 ready to report: no state: downloading scheduler state: uninitialized active_task_state: UNINITIALIZED app version num: 415 resources: 1 CPU estimated CPU time remaining: 20759.735125 2) ----------- |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
On the website, in your profile, there is a link for Rosetta@home preferences. In there, you can define up to 4 venues. If you click on your hosts, you can see the "venue" shown in the "location" column. When you look at the Rosetta preferences for the venue of the host, is the box checked for "Use CPU"? (it should show a checkmark). In fact, for R@h, this box should be checked for all venues you have defined. I see your hosts all have credit. When did they stop working? Do you have a cc_config or app_config file setup? If so, please show what is in them. Rosetta Moderator: Mod.Sense |
SGAI-CSIC Send message Joined: 4 Apr 20 Posts: 19 Credit: 15,069,615 RAC: 0 |
|
SGAI-CSIC Send message Joined: 4 Apr 20 Posts: 19 Credit: 15,069,615 RAC: 0 |
I have detected a change, in one of the host I see running tasks: ======== Tasks ======== 1) ----------- name: Mini_Protein_binds_IL1R_COVID-19_design5_SAVE_ALL_OUT_IGNORE_THE_REST_4qo3dr9i_909067_3_0 WU name: Mini_Protein_binds_IL1R_COVID-19_design5_SAVE_ALL_OUT_IGNORE_THE_REST_4qo3dr9i_909067_3 project URL: https://boinc.bakerlab.org/rosetta/ received: Mon Apr 13 15:56:13 2020 report deadline: Thu Apr 16 15:56:12 2020 ready to report: no state: downloaded scheduler state: scheduled active_task_state: EXECUTING app version num: 415 resources: 1 CPU estimated CPU time remaining: 33887.100296 CPU time at last checkpoint: 52676.290000 current CPU time: 52753.410000 fraction done: 0.609875 swap size: 898 MB working set size: 723 MB 2) ----------- name: Mini_Protein_binds_IL1R_COVID-19_design4_SAVE_ALL_OUT_IGNORE_THE_REST_3gp4jq7y_908587_3_0 WU name: Mini_Protein_binds_IL1R_COVID-19_design4_SAVE_ALL_OUT_IGNORE_THE_REST_3gp4jq7y_908587_3 project URL: https://boinc.bakerlab.org/rosetta/ received: Mon Apr 13 15:56:13 2020 report deadline: Thu Apr 16 15:56:12 2020 ready to report: no state: downloaded scheduler state: scheduled active_task_state: EXECUTING app version num: 415 resources: 1 CPU estimated CPU time remaining: 46711.422053 CPU time at last checkpoint: 39847.050000 current CPU time: 39937.950000 fraction done: 0.462027 swap size: 845 MB |
SGAI-CSIC Send message Joined: 4 Apr 20 Posts: 19 Credit: 15,069,615 RAC: 0 |
But only in one hosts, in the others continue in uninitialized state :-( |
Bryn Mawr Send message Joined: 26 Dec 18 Posts: 393 Credit: 12,110,248 RAC: 6,015 |
But only in one hosts, in the others continue in uninitialized state :-( So, is that host, 4062010, either the one you changed the cpu limit to 50% on or the one you changed the host url on? |
SGAI-CSIC Send message Joined: 4 Apr 20 Posts: 19 Credit: 15,069,615 RAC: 0 |
It's the same ... Now I have changed the URL in all hosts ... but with the same situation. I don't understand anything ... I have force the update in one hosts: 52: 14-Apr-2020 18:14:38 (low) [Rosetta@home] update requested by user 53: 14-Apr-2020 18:14:41 (low) [Rosetta@home] Sending scheduler request: Requested by user. 54: 14-Apr-2020 18:14:41 (low) [Rosetta@home] Requesting new tasks for CPU 55: 14-Apr-2020 18:14:42 (low) [Rosetta@home] Scheduler request completed: got 4 new tasks 56: 14-Apr-2020 18:14:44 (low) [Rosetta@home] Started download of local_docking_20_04_15_28_09.xml 57: 14-Apr-2020 18:14:44 (low) [Rosetta@home] Started download of chainA_chainB_20_04_15_28_09.pdb 58: 14-Apr-2020 18:14:47 (low) [Rosetta@home] Finished download of local_docking_20_04_15_28_09.xml 59: 14-Apr-2020 18:14:47 (low) [Rosetta@home] Finished download of chainA_chainB_20_04_15_28_09.pdb 60: 14-Apr-2020 18:14:47 (low) [Rosetta@home] Started download of Mini_Protein_binds_IL1R_COVID-19_design7_SAVE_ALL_OUT_IGNORE_THE_REST_0ys6zi4m.zip 61: 14-Apr-2020 18:14:47 (low) [Rosetta@home] Started download of Mini_Protein_binds_IL1R_COVID-19_design7_SAVE_ALL_OUT_IGNORE_THE_REST_0ys6zi4m.flags 62: 14-Apr-2020 18:14:49 (low) [Rosetta@home] Finished download of Mini_Protein_binds_IL1R_COVID-19_design7_SAVE_ALL_OUT_IGNORE_THE_REST_0ys6zi4m.flags 63: 14-Apr-2020 18:14:49 (low) [Rosetta@home] Started download of Mini_Protein_binds_IL1R_COVID-19_design7_SAVE_ALL_OUT_IGNORE_THE_REST_0uc4pt5i.zip 64: 14-Apr-2020 18:14:53 (low) [Rosetta@home] Finished download of Mini_Protein_binds_IL1R_COVID-19_design7_SAVE_ALL_OUT_IGNORE_THE_REST_0ys6zi4m.zip 65: 14-Apr-2020 18:14:53 (low) [Rosetta@home] Finished download of Mini_Protein_binds_IL1R_COVID-19_design7_SAVE_ALL_OUT_IGNORE_THE_REST_0uc4pt5i.zip 66: 14-Apr-2020 18:14:53 (low) [Rosetta@home] Started download of Mini_Protein_binds_IL1R_COVID-19_design7_SAVE_ALL_OUT_IGNORE_THE_REST_0uc4pt5i.flags 67: 14-Apr-2020 18:14:53 (low) [Rosetta@home] Started download of Mini_Protein_binds_IL1R_COVID-19_design8_SAVE_ALL_OUT_IGNORE_THE_REST_4zr9fy2g.zip 68: 14-Apr-2020 18:14:55 (low) [Rosetta@home] Finished download of Mini_Protein_binds_IL1R_COVID-19_design7_SAVE_ALL_OUT_IGNORE_THE_REST_0uc4pt5i.flags 69: 14-Apr-2020 18:14:55 (low) [Rosetta@home] Started download of Mini_Protein_binds_IL1R_COVID-19_design8_SAVE_ALL_OUT_IGNORE_THE_REST_4zr9fy2g.flags 70: 14-Apr-2020 18:14:56 (low) [Rosetta@home] Finished download of Mini_Protein_binds_IL1R_COVID-19_design8_SAVE_ALL_OUT_IGNORE_THE_REST_4zr9fy2g.flags 71: 14-Apr-2020 18:14:59 (low) [Rosetta@home] Finished download of Mini_Protein_binds_IL1R_COVID-19_design8_SAVE_ALL_OUT_IGNORE_THE_REST_4zr9fy2g.zip And all the task are again uninitialized: ======= Tasks ======== 1) ----------- name: Mini_Protein_binds_IL1R_COVID-19_design7_SAVE_ALL_OUT_IGNORE_THE_REST_9nn9gy6g_909112_4_0 WU name: Mini_Protein_binds_IL1R_COVID-19_design7_SAVE_ALL_OUT_IGNORE_THE_REST_9nn9gy6g_909112_4 project URL: https://boinc.bakerlab.org/rosetta/ received: Tue Apr 14 11:50:36 2020 report deadline: Fri Apr 17 11:50:36 2020 ready to report: no state: downloaded scheduler state: uninitialized active_task_state: UNINITIALIZED app version num: 415 resources: 1 CPU estimated CPU time remaining: 15686.105082 2) ----------- name: Mini_Protein_binds_IL1R_COVID-19_design4_SAVE_ALL_OUT_IGNORE_THE_REST_8rz6dh5f_908717_4_0 WU name: Mini_Protein_binds_IL1R_COVID-19_design4_SAVE_ALL_OUT_IGNORE_THE_REST_8rz6dh5f_908717_4 project URL: https://boinc.bakerlab.org/rosetta/ received: Tue Apr 14 11:50:47 2020 report deadline: Fri Apr 17 11:50:47 2020 ready to report: no state: downloaded scheduler state: uninitialized active_task_state: UNINITIALIZED app version num: 415 resources: 1 CPU estimated CPU time remaining: 15686.105082 3) ----------- |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
In your preferences, (again, by venue of machine) have you setup specific time periods during the day when BOINC is allowed to use the CPU? Rosetta Moderator: Mod.Sense |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1681 Credit: 17,854,150 RAC: 22,647 |
In your preferences, (again, by venue of machine) have you setup specific time periods during the day when BOINC is allowed to use the CPU?A quick search shows you seem to be on the right track. SGAI-CSIC, make sure use CPU is selected. Make sure no settings that block BOINC from processing work are selected- eg Use at most xx % of the CPUs, Use at most x % of CPU time should both be 100%, Suspend when computer is on battery, Suspend when computer is in use, Suspend when non-BOINC CPU usage is above ---%, Compute only between --- should all be unselected/blank. Edit- oh, and any local settings will override the web based ones. Grant Darwin NT |
SGAI-CSIC Send message Joined: 4 Apr 20 Posts: 19 Credit: 15,069,615 RAC: 0 |
|
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1681 Credit: 17,854,150 RAC: 22,647 |
From what you've posted there i can't see any reason for Tasks not running. The default CPU Target time is 8 hours, but i can't see why 24 hours should cause things to not run. I don't know how you would do it for a headless system, but you can select which functions get recorded to the Event log. I'd exit BOINC, give it 10ssec or so and restart it. Then post the contents of the Event log here to see what messages are there, and someone with more experience might have a suggestion as to which options would be best to enable sorting this out. One more thought- if you click on Details for the problem system & then compare them to a working system, down the bottom there is some info on when the system can process work (only you can see it on your system, not others). The ones of interest- Fraction of time BOINC is running 99.91% While BOINC is running, fraction of time computing is allowed 100.00% Grant Darwin NT |
SGAI-CSIC Send message Joined: 4 Apr 20 Posts: 19 Credit: 15,069,615 RAC: 0 |
Thank you for your help, I try to stop boinc daemon a restart again and I see these messages in the log, perhaps it's a bug of the boinc-client on CentOS? # systemctl stop boinc-client # systemctl start boinc-client # boinccmd --get_messages 1: 15-Apr-2020 14:33:14 (low) [] cc_config.xml not found - using defaults 2: 15-Apr-2020 14:33:14 (low) [] Starting BOINC client version 7.16.1 for x86_64-pc-linux-gnu 3: 15-Apr-2020 14:33:14 (low) [] log flags: file_xfer, sched_ops, task 4: 15-Apr-2020 14:33:14 (low) [] Libraries: libcurl/7.29.0 NSS/3.44 zlib/1.2.7 libidn/1.28 libssh2/1.8.0 5: 15-Apr-2020 14:33:14 (low) [] Data directory: /var/lib/boinc 6: 15-Apr-2020 14:33:14 (low) [] No usable GPUs found 7: 15-Apr-2020 14:33:14 (low) [] [libc detection] gathered: 2.17, GNU libc 8: 15-Apr-2020 14:33:14 (low) [] Host name: rosetta3 9: 15-Apr-2020 14:33:14 (low) [] Processor: 4 GenuineIntel QEMU Virtual CPU version 2.5+ [Family 6 Model 13 Stepping 3] 10: 15-Apr-2020 14:33:14 (low) [] Processor features: fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pse36 clflush mmx fxsr sse sse2 syscall nx lm rep_good nopl xtopology eagerfpu pni cx16 x2apic hypervisor lahf_lm 11: 15-Apr-2020 14:33:14 (low) [] OS: Linux CentOS Linux: CentOS Linux 7 (Core) [3.10.0-1062.18.1.el7.x86_64|libc 2.17 (GNU libc)] 12: 15-Apr-2020 14:33:14 (low) [] Memory: 3.70 GB physical, 1.20 GB virtual 13: 15-Apr-2020 14:33:14 (low) [] Disk: 10.22 GB total, 7.85 GB free 14: 15-Apr-2020 14:33:14 (low) [] Local time is UTC +2 hours 15: 15-Apr-2020 14:33:14 (low) [Rosetta@home] General prefs: from Rosetta@home (last modified 15-Apr-2020 09:58:03) 16: 15-Apr-2020 14:33:14 (low) [Rosetta@home] Computer location: work 17: 15-Apr-2020 14:33:14 (low) [] General prefs: using separate prefs for work 18: 15-Apr-2020 14:33:14 (low) [] Preferences: 19: 15-Apr-2020 14:33:14 (low) [] max memory usage when active: 1894.49 MB 20: 15-Apr-2020 14:33:14 (low) [] max memory usage when idle: 3410.09 MB 21: 15-Apr-2020 14:33:14 (low) [] max disk usage: 7.56 GB 22: 15-Apr-2020 14:33:14 (low) [] (to change preferences, visit a project web site or select Preferences in the Manager) 23: 15-Apr-2020 14:33:14 (low) [] Setting up project and slot directories 24: 15-Apr-2020 14:33:14 (low) [] Checking active tasks 25: 15-Apr-2020 14:33:14 (low) [Rosetta@home] URL https://boinc.bakerlab.org/rosetta/; Computer ID 4062012; resource share 100 26: 15-Apr-2020 14:33:14 (low) [] Setting up GUI RPC socket 27: 15-Apr-2020 14:33:14 (low) [] Checking presence of 23 project files |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
It looks like you have 4-core systems with 4GB of memory. Try setting one system to use at most 25% of the CPUs, and set another to use at most 50% of the CPUs. Some WUs are reserving more than a GB of memory to ensure they run well (once you get them to start running). I would suggest adding one or more other BOINC projects to these systems, where the WUs require less memory. The BOINC Manager will then find a mix of WUs that can run with the resources available. Rosetta Moderator: Mod.Sense |
Questions and Answers :
Unix/Linux :
All tasks in scheduler state uninitialized
©2024 University of Washington
https://www.bakerlab.org