Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 270 · 271 · 272 · 273 · 274 · 275 · 276 . . . 302 · Next
Author | Message |
---|---|
Bryn Mawr Send message Joined: 26 Dec 18 Posts: 393 Credit: 12,110,248 RAC: 6,015 |
I tried again after doing a reset and got computation errors for each task after under a minute. Is there anything else I can do to debug? You are running a very old version of the Boinc Manager. Try updating to a more current version (7.16.n or 7.20.n). |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1681 Credit: 17,854,150 RAC: 22,647 |
I can't see that having any effect- the BOINC Manger does just that- manage the science applications. It's the science applications that do the work, and are what are crashing out.I tried again after doing a reset and got computation errors for each task after under a minute. Is there anything else I can do to debug? And if resetting the project & excluding the data folders from the AV programme haven't sorted it, i would give the long shot of doing a memory test just to make sure it's not some sort of memory issue (although a i said before- Rosetta 4.20 uses much more RAM). How to do a memory test in WIn10 Edit- maybe run some hardware monitoring software & check the temperature of your CPU? Rosetta Beta may be making use of instructions that your other project doesn't, so it doesn't push it over the edge where Rosetta Beta does (although i'm grasping at straws here) Grant Darwin NT |
Raj Send message Joined: 5 Dec 05 Posts: 7 Credit: 514,862 RAC: 163 |
You are running a very old version of the Boinc Manager. Try updating to a more current version (7.16.n or 7.20.n). Sorry, that was a typo, I'm running 7.24.1 |
Raj Send message Joined: 5 Dec 05 Posts: 7 Credit: 514,862 RAC: 163 |
I ran the memory test and it reported no errors. |
kotenok2000 Send message Joined: 22 Feb 11 Posts: 259 Credit: 497,274 RAC: 1,201 |
What about cpu test? https://www.mersenne.org/download/#:~:text=CPU%20Stress%20/%20Torture%20Testing |
Raj Send message Joined: 5 Dec 05 Posts: 7 Credit: 514,862 RAC: 163 |
I ran that (in stress mode) for about an hour and received no errors or warnings at the end of it. |
Bryn Mawr Send message Joined: 26 Dec 18 Posts: 393 Credit: 12,110,248 RAC: 6,015 |
Just downloaded 4 beta 6.05 tasks one of which immediately (0.02 seconds CPU) failed with :- <core_client_version>7.24.1</core_client_version> <![CDATA[ <message> process exited with code 1 (0x1, -255)</message> <stderr_txt> command: ../../projects/boinc.bakerlab.org_rosetta/rosetta_beta_6.05_x86_64-pc-linux-gnu @7a_hal_c_hal_7aa_12899_d40_0001.flags -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -mute all -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 Using database: database_0f7f01a1b07/database ERROR: Error in protocols::cyclic_peptide_predict::SimpleCycpepPredictpplication::set_up_n_to_c_cyclization_mover() function: residue 1 does not have a LOWER_CONNECT. ERROR:: Exit from: src/protocols/cyclic_peptide_predict/SimpleCycpepPredictApplication.cc line: 2442 BOINC:: Error reading and gzipping output datafile: default.out 21:15:06 (176255): called boinc_finish(1) </stderr_txt> ]]> Boinc 7.24.1 and Ubuntu 22.04.4 |
Raj Send message Joined: 5 Dec 05 Posts: 7 Credit: 514,862 RAC: 163 |
Just an update - I've gone back in to look at my tasks on the website, and since yesterday I've had several successful completions, although also a lot of failures that show status "Error while computing". I'm not sure if this is a public URL, but this is what I'm checking: https://boinc.bakerlab.org/rosetta/results.php?hostid=3481412 |
kotenok2000 Send message Joined: 22 Feb 11 Posts: 259 Credit: 497,274 RAC: 1,201 |
Someone serverside made incorrect workunits in which residue 1 does not have a LOWER_CONNECT. |
BlackPoison357 Send message Joined: 5 Mar 24 Posts: 1 Credit: 1,674,561 RAC: 0 |
That must been why I've had 31 errors within the last 2 days. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
Just got 21 of those. It is this series: 7a_hal_c_hal_7aa_................. Just downloaded 4 beta 6.05 tasks one of which immediately (0.02 seconds CPU) failed with :- |
adrianxw Send message Joined: 18 Sep 05 Posts: 653 Credit: 11,840,739 RAC: 42 |
A flock of work units arrived recently that are behaving oddly, well, all but one of them. They came with a run time of 8 hours, of those which are running, one has elapsed 6 hours and remaining 6 hours odd, this figure wobbles about for a while, then drops some, then starts wobbling again and repeat. The others have remaining times of 2 to 6 days increasing . The percentage complete figure is increasing, but very slowly. <edit> The task is Beta 6.04 running on Windows 10 x64. Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1681 Credit: 17,854,150 RAC: 22,647 |
A flock of work units arrived recently that are behaving oddly, well, all but one of them. They came with a run time of 8 hours, of those which are running, one has elapsed 6 hours and remaining 6 hours odd, this figure wobbles about for a while, then drops some, then starts wobbling again and repeat. The others have remaining times of 2 to 6 days increasing . The percentage complete figure is increasing, but very slowly.And the answer is the same as always, but since you have ignored the answer for over 4 years now, there's no point repeating it. Run time 1 days 20 hours 40 min 36 sec CPU time 11 hours 59 min 57 secAlmost 2 days to do 12 hours work, all because of settings you have made & chose not to fix. Grant Darwin NT |
kotenok2000 Send message Joined: 22 Feb 11 Posts: 259 Credit: 497,274 RAC: 1,201 |
What else do you have running? |
adrianxw Send message Joined: 18 Sep 05 Posts: 653 Credit: 11,840,739 RAC: 42 |
The only thing running that has any real impact is Folding@Home, this is set to its minimum activity level, but still grabs quite a lot of resources. The only other things I can see are Firefox, and the VPN service I use which has a small window that normally sits open on the desktop. I've no idea what Grant is talking about, (above), but it gives me something to look for. I don't have any weird settings by the way. Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
kotenok2000 Send message Joined: 22 Feb 11 Posts: 259 Credit: 497,274 RAC: 1,201 |
Reduce folding@home to 4 cores(if running on CPU) and Rosetta to 4 cores |
TPCBF Send message Joined: 29 Nov 10 Posts: 111 Credit: 5,084,721 RAC: 1,942 |
Please report any issues with work units in this thread. All recent WUs on at least 3 different hosts all got cut short with the "error while computing"... :( |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,281,662 RAC: 1,807 |
Please report any issues with work units in this thread. "Error while computing" means that an error was detected, but gives no information about WHAT error. There's generally at least one more line saying something about what error. |
Notarick Send message Joined: 19 Nov 06 Posts: 1 Credit: 2,823,895 RAC: 3,297 |
I'm not sure what is going on but about 1/4 of the work units are reporting "Error while computing" and I am no longer getting new tasks (which I assume is due to the errors). I've been running boinc for years without issue, but I'm still running some CPU stress tests/diagnostics to see if it is a hardware thing. Anway: My CPU is a Ryzen 7 3700X on a Gigabye X570 UD mobo with 32 GB of ram and a Radeon RX 570 video card. I'm running Windows 10. The only thing that has changed recently is a UEFI update. Here is the diagnostic information from one of them (they all have the same error code and similar Stderr output): Name 7a_hal_d_hal_7aa_13011_d120_0001_SAVE_ALL_OUT_2977649_108_1 Workunit 1383425641 Created 23 Mar 2024, 8:12:33 UTC Sent 23 Mar 2024, 8:12:35 UTC Report deadline 26 Mar 2024, 8:12:35 UTC Received 23 Mar 2024, 8:14:02 UTC Server state Over Outcome Computation error Client state Compute error Exit status 1 (0x00000001) Unknown error code Computer ID 6222889 Run time 11 sec CPU time 2 sec Validate state Invalid Credit 0.00 Device peak FLOPS 5.15 GFLOPS Application version Rosetta Beta v6.04 windows_x86_64 Peak working set size 117.73 MB Peak swap size 88.94 MB Peak disk usage 0.01 MB Stderr output <core_client_version>7.24.1</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1)</message> <stderr_txt> command: projects/boinc.bakerlab.org_rosetta/rosetta_beta_6.04_windows_x86_64.exe @7a_hal_d_hal_7aa_13011_d120_0001.flags -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -mute all -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 Using database: database_0f7f01a1b07database ERROR: Error in protocols::cyclic_peptide_predict::SimpleCycpepPredictpplication::set_up_n_to_c_cyclization_mover() function: residue 1 does not have a LOWER_CONNECT. ERROR:: Exit from: src/protocols/cyclic_peptide_predict/SimpleCycpepPredictApplication.cc line: 2442 BOINC:: Error reading and gzipping output datafile: default.out 02:12:51 (5004): called boinc_finish(1) </stderr_txt> ]]> |
MJH333 Send message Joined: 29 Jan 21 Posts: 18 Credit: 6,290,467 RAC: 13,756 |
I'm not sure what is going on but about 1/4 of the work units are reporting "Error while computing" and I am no longer getting new tasks (which I assume is due to the errors).I doubt that there is anything wrong with your system. The latest big batch of work units has run out. You can see this from the Server Status section on the Rosetta home page: https://boinc.bakerlab.org/rosetta/, which currently shows "Total queued jobs" of 0. It can also be seen from the Project status page: https://boinc.bakerlab.org/rosetta/server_status.php, which currently shows "Tasks ready to send" of 0. You may, of course, pick up the odd resend, or some Robetta tasks, but there will be no steady flow of tasks until another big batch of work units is released. We may get another batch if/when the work units you identified with the "residue 1 does not have a LOWER_CONNECT" error are corrected and reissued. When trying to get an idea of how much work is available, I tend to look at the "Total queued jobs" figure on the Rosetta home page because that shows all the work units available to be crunched, which may be in the millions. Whereas the "Tasks ready to send" figure on the Project status page shows just the tasks ready to be distributed, which is usually no more than 5,000. |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2024 University of Washington
https://www.bakerlab.org