Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 270 · 271 · 272 · 273 · 274 · 275 · 276 . . . 308 · Next
Author | Message |
---|---|
BlackPoison357 Send message Joined: 5 Mar 24 Posts: 1 Credit: 1,674,561 RAC: 0 |
That must been why I've had 31 errors within the last 2 days. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
Just got 21 of those. It is this series: 7a_hal_c_hal_7aa_................. Just downloaded 4 beta 6.05 tasks one of which immediately (0.02 seconds CPU) failed with :- |
adrianxw Send message Joined: 18 Sep 05 Posts: 653 Credit: 11,840,739 RAC: 6 |
A flock of work units arrived recently that are behaving oddly, well, all but one of them. They came with a run time of 8 hours, of those which are running, one has elapsed 6 hours and remaining 6 hours odd, this figure wobbles about for a while, then drops some, then starts wobbling again and repeat. The others have remaining times of 2 to 6 days increasing . The percentage complete figure is increasing, but very slowly. <edit> The task is Beta 6.04 running on Windows 10 x64. Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1720 Credit: 18,351,686 RAC: 24,923 |
A flock of work units arrived recently that are behaving oddly, well, all but one of them. They came with a run time of 8 hours, of those which are running, one has elapsed 6 hours and remaining 6 hours odd, this figure wobbles about for a while, then drops some, then starts wobbling again and repeat. The others have remaining times of 2 to 6 days increasing . The percentage complete figure is increasing, but very slowly.And the answer is the same as always, but since you have ignored the answer for over 4 years now, there's no point repeating it. Run time 1 days 20 hours 40 min 36 sec CPU time 11 hours 59 min 57 secAlmost 2 days to do 12 hours work, all because of settings you have made & chose not to fix. Grant Darwin NT |
kotenok2000 Send message Joined: 22 Feb 11 Posts: 271 Credit: 507,897 RAC: 496 |
What else do you have running? |
adrianxw Send message Joined: 18 Sep 05 Posts: 653 Credit: 11,840,739 RAC: 6 |
The only thing running that has any real impact is Folding@Home, this is set to its minimum activity level, but still grabs quite a lot of resources. The only other things I can see are Firefox, and the VPN service I use which has a small window that normally sits open on the desktop. I've no idea what Grant is talking about, (above), but it gives me something to look for. I don't have any weird settings by the way. Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
kotenok2000 Send message Joined: 22 Feb 11 Posts: 271 Credit: 507,897 RAC: 496 |
Reduce folding@home to 4 cores(if running on CPU) and Rosetta to 4 cores |
TPCBF Send message Joined: 29 Nov 10 Posts: 111 Credit: 5,142,074 RAC: 3,111 |
Please report any issues with work units in this thread. All recent WUs on at least 3 different hosts all got cut short with the "error while computing"... :( |
robertmiles Send message Joined: 16 Jun 08 Posts: 1233 Credit: 14,338,560 RAC: 2,994 |
Please report any issues with work units in this thread. "Error while computing" means that an error was detected, but gives no information about WHAT error. There's generally at least one more line saying something about what error. |
Notarick Send message Joined: 19 Nov 06 Posts: 1 Credit: 2,932,430 RAC: 6,892 |
I'm not sure what is going on but about 1/4 of the work units are reporting "Error while computing" and I am no longer getting new tasks (which I assume is due to the errors). I've been running boinc for years without issue, but I'm still running some CPU stress tests/diagnostics to see if it is a hardware thing. Anway: My CPU is a Ryzen 7 3700X on a Gigabye X570 UD mobo with 32 GB of ram and a Radeon RX 570 video card. I'm running Windows 10. The only thing that has changed recently is a UEFI update. Here is the diagnostic information from one of them (they all have the same error code and similar Stderr output): Name 7a_hal_d_hal_7aa_13011_d120_0001_SAVE_ALL_OUT_2977649_108_1 Workunit 1383425641 Created 23 Mar 2024, 8:12:33 UTC Sent 23 Mar 2024, 8:12:35 UTC Report deadline 26 Mar 2024, 8:12:35 UTC Received 23 Mar 2024, 8:14:02 UTC Server state Over Outcome Computation error Client state Compute error Exit status 1 (0x00000001) Unknown error code Computer ID 6222889 Run time 11 sec CPU time 2 sec Validate state Invalid Credit 0.00 Device peak FLOPS 5.15 GFLOPS Application version Rosetta Beta v6.04 windows_x86_64 Peak working set size 117.73 MB Peak swap size 88.94 MB Peak disk usage 0.01 MB Stderr output <core_client_version>7.24.1</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1)</message> <stderr_txt> command: projects/boinc.bakerlab.org_rosetta/rosetta_beta_6.04_windows_x86_64.exe @7a_hal_d_hal_7aa_13011_d120_0001.flags -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -mute all -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 Using database: database_0f7f01a1b07database ERROR: Error in protocols::cyclic_peptide_predict::SimpleCycpepPredictpplication::set_up_n_to_c_cyclization_mover() function: residue 1 does not have a LOWER_CONNECT. ERROR:: Exit from: src/protocols/cyclic_peptide_predict/SimpleCycpepPredictApplication.cc line: 2442 BOINC:: Error reading and gzipping output datafile: default.out 02:12:51 (5004): called boinc_finish(1) </stderr_txt> ]]> |
MJH333 Send message Joined: 29 Jan 21 Posts: 18 Credit: 6,745,546 RAC: 22,305 |
I'm not sure what is going on but about 1/4 of the work units are reporting "Error while computing" and I am no longer getting new tasks (which I assume is due to the errors).I doubt that there is anything wrong with your system. The latest big batch of work units has run out. You can see this from the Server Status section on the Rosetta home page: https://boinc.bakerlab.org/rosetta/, which currently shows "Total queued jobs" of 0. It can also be seen from the Project status page: https://boinc.bakerlab.org/rosetta/server_status.php, which currently shows "Tasks ready to send" of 0. You may, of course, pick up the odd resend, or some Robetta tasks, but there will be no steady flow of tasks until another big batch of work units is released. We may get another batch if/when the work units you identified with the "residue 1 does not have a LOWER_CONNECT" error are corrected and reissued. When trying to get an idea of how much work is available, I tend to look at the "Total queued jobs" figure on the Rosetta home page because that shows all the work units available to be crunched, which may be in the millions. Whereas the "Tasks ready to send" figure on the Project status page shows just the tasks ready to be distributed, which is usually no more than 5,000. |
kotenok2000 Send message Joined: 22 Feb 11 Posts: 271 Credit: 507,897 RAC: 496 |
I see this in stderr.txt command: projects/boinc.bakerlab.org_rosetta/rosetta_beta_6.04_windows_x86_64.exe @7ahall_e_hal_7aa_15545_d239_0001.flags -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -mute all -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 Extracting in project directory: database_0f7f01a1b07.zip error: cannot delete old E:/ProgramData/BOINC/projects/boinc.bakerlab.org_rosetta/database_0f7f01a1b07/database/rotamer/bbdep02.May.sortlib Permission denied error: cannot delete old E:/ProgramData/BOINC/projects/boinc.bakerlab.org_rosetta/database_0f7f01a1b07/database/rotamer/peptoid_rotlibs/001.rotlib Permission denied error: cannot delete old E:/ProgramData/BOINC/projects/boinc.bakerlab.org_rosetta/database_0f7f01a1b07/database/chemical/pdb_components/components.R.cif Permission denied error: cannot delete old E:/ProgramData/BOINC/projects/boinc.bakerlab.org_rosetta/database_0f7f01a1b07/database/chemical/pdb_components/components.6.cif Permission denied error: cannot delete old E:/ProgramData/BOINC/projects/boinc.bakerlab.org_rosetta/database_0f7f01a1b07/database/chemical/pdb_components/components.1.cif Permission denied error: cannot delete old E:/ProgramData/BOINC/projects/boinc.bakerlab.org_rosetta/database_0f7f01a1b07/database/chemical/pdb_components/components.D.cif Permission denied error: cannot delete old E:/ProgramData/BOINC/projects/boinc.bakerlab.org_rosetta/database_0f7f01a1b07/database/chemical/pdb_components/components.V.cif Permission denied error: cannot delete old E:/ProgramData/BOINC/projects/boinc.bakerlab.org_rosetta/database_0f7f01a1b07/database/chemical/pdb_components/components.4.cif Permission denied error: cannot delete old E:/ProgramData/BOINC/projects/boinc.bakerlab.org_rosetta/database_0f7f01a1b07/database/sampling/disulfide_jump_database_wip.dat Permission denied error: cannot delete old E:/ProgramData/BOINC/projects/boinc.bakerlab.org_rosetta/database_0f7f01a1b07/database/sampling/fragpicker_rama_tables/L_QP.counts.gz Permission denied error: cannot delete old E:/ProgramData/BOINC/projects/boinc.bakerlab.org_rosetta/database_0f7f01a1b07/database/sampling/vall.jul19.2011.torsions.gz Permission denied error: cannot delete old E:/ProgramData/BOINC/projects/boinc.bakerlab.org_rosetta/database_0f7f01a1b07/database/protocol_data/tensorflow_graphs/gcn_test_model/gcn_test_model_plot.png Permission denied Extracting in slot directory: minirosetta_database.zip Using database: database looks like each task tried to extract database to E:/ProgramData/BOINC/projects/boinc.bakerlab.org_rosetta/database_0f7f01a1b07 all at once, gave up, and extracted to slot directory. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2136 Credit: 41,518,559 RAC: 15,775 |
It's a strange thing, but every time tasks run out recently, another ~million seem to be added to the queue to keep us going. I realise many have given up a bit on expecting reliability from Rosetta, but it almost seems like someone is paying a little attention on the quiet. Or maybe I'm just wishing that was the case. Either way, it's appreciated. And there are still enough people around to blast through and return them quickly too. (Comparatively) good times... |
kotenok2000 Send message Joined: 22 Feb 11 Posts: 271 Credit: 507,897 RAC: 496 |
When i run graphics app it closes immediately. I see this in stderrgfx.txt user@ubuntu:/var/lib/boinc/slots/19$ cat stderrgfx.txt ERROR: Unable to open file: /var/lib/boinc/projects/boinc.bakerlab.org_rosetta/../database/chemical/residue_type_sets/fa_standard/residue_types.txt ERROR:: Exit from: src/core/chemical/GlobalResidueTypeSet.cc line: 145 13:38:44 (25733): called boinc_finish(0) |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 2001 Credit: 9,780,807 RAC: 8,163 |
And there are still enough people around to blast through and return them quickly too. +1 But, after months and hundreds of thousands of wus, maybe it's the time to let the app out from beta stage |
kotenok2000 Send message Joined: 22 Feb 11 Posts: 271 Credit: 507,897 RAC: 496 |
Tasks finish in 3 hours for me. I have set "Target CPU run time" to "not selected" |
just1vet Send message Joined: 13 Nov 05 Posts: 4 Credit: 3,673,481 RAC: 26,688 |
Big problems with Rosetta on Linux Mint 20 and 21. Had to remove the project from the client on both of my machines. It would freeze the computers to where it had to be rebooted, only to lock up again, soon as it started on Rosetta. I narrowed it down to the Rosetta project after replacing hard drive, mother board and ram. they run fine on the other projects. This has been going on for a while. Runs fine on my Windows computers. Any ideas? |
kotenok2000 Send message Joined: 22 Feb 11 Posts: 271 Credit: 507,897 RAC: 496 |
Maybe you can reduce number of cpus allocated from 100% to 75%? |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2136 Credit: 41,518,559 RAC: 15,775 |
Tasks finish in 3 hours for me. Has something changed then? I noticed this elsewhere too. Doesn't apply to me - since tasks have been harder to come by I've changed my default to 12hrs. Maybe it's time to be explicit on runtime and change it to 8hrs, rather than let it run at a dodgy default value. |
just1vet Send message Joined: 13 Nov 05 Posts: 4 Credit: 3,673,481 RAC: 26,688 |
Right now they are 32 core with 16gb of RAM. Which should be enough for crunching. |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2024 University of Washington
https://www.bakerlab.org