Rosetta 4.1+ and 4.2+

Author	Message
sgaboinc Send message Joined: 2 Apr 14 Posts: 282 Credit: 208,966 RAC: 0	Message 96100 - Posted: 5 May 2020, 10:24:50 UTC - in response to Message 96097. Last modified: 5 May 2020, 10:25:11 UTC 3 finish file present too long errors on Pi4 Rosetta v4.20 aarch64-unknown-linux-gnu That’s a BOINC issue. If you can get a 7.16.5 or later one might help. They extended the time limit before BOINC complains about the files still being in the slot directory. thanks, i'd check that out ID: 96100 · Rating: 0 · rate: / Reply Quote

reindl Send message Joined: 31 Mar 20 Posts: 1 Credit: 1,765,751 RAC: 0	Message 96108 - Posted: 5 May 2020, 12:47:36 UTC - in response to Message 96086. Last modified: 5 May 2020, 12:49:05 UTC Can you reduce the size of task for Android phones? I have Samsung S20 equiped with Qualcomm flagship processor Snapdragon 865, and it could take more than half day to finish one task. And the deadline was set to about 3 days after task downloaded. I have to keep my phone charged most time of a day to finish the tasks received. This is not reasonable and gave me a lot of pressure. So, could you please reduce the size of each task? Thanks There are 2 things you can do: 1. Get the 17.16.3 Android App and set the buffer to 0 or close to 0 2. Go to your settings and create a seperate profile for your phones with a shorter target runtime ID: 96108 · Rating: 0 · rate: / Reply Quote

Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0	Message 96187 - Posted: 6 May 2020, 23:36:36 UTC Can the servers be updated such that a wingman is only created once the originally created task is unable to report results? Otherwise first guy reports late, but gets in before the second guy, and then the second guy gets the same WU reporting back. See discussion here, and sample wu here Rosetta Moderator: Mod.Sense ID: 96187 · Rating: 0 · rate: / Reply Quote

Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1939 Credit: 18,534,891 RAC: 0	Message 96210 - Posted: 7 May 2020, 9:32:50 UTC - in response to Message 96187. Last modified: 7 May 2020, 9:39:18 UTC Can the servers be updated such that a wingman is only created once the originally created task is unable to report results? Project options <report_grace_period>x</report_grace_period> <grace_period_hours>x</grace_period_hours> A "grace period" (in seconds or hours respectively) for task reporting. A task is considered time-out (and a new replica generated) if it is not reported by client_deadline + x. So my thought is the Grace period needs to be 12 hours. The deadline can be 3 Days, 7 days etc, then there is the Watchdog timer which is presently 10 hours. Allow another couple of hours (just because...) and that gives you 12 hours for the grace_period_hours x. So a new Task won't be created until 12 hours after the deadline for the initial replication has passed (thinking about it even 6 hours would probably be long enough most of the time). Grant Darwin NT ID: 96210 · Rating: 0 · rate: / Reply Quote

sgaboinc Send message Joined: 2 Apr 14 Posts: 282 Credit: 208,966 RAC: 0	Message 96230 - Posted: 7 May 2020, 14:00:51 UTC i'm getting more finish file too long errors on Pi4 4.20, i've not upgraded boinc-client. can't find a binary package that would install problem free, many dependencies. however, i noticed one thing about the finish file too long errors. they seem related to the Junior_HalfRoid tasks https://boinc.bakerlab.org/rosetta/result.php?resultid=1172540347 https://boinc.bakerlab.org/rosetta/result.php?resultid=1172395662 and when these wu run, my Pi4 is close to using up all ram available. I'm not too sure if memory may after all be involved. e,g. that they generate many error messages in the 'finish file' due to low memory conditions it doesn't seem to be an easy way to solve it if it is due to memory short of running fewer tasks. but the point is when the tasks start memory consumption normally looks ok and it grows as the work progress. ID: 96230 · Rating: 0 · rate: / Reply Quote

Sid Celery Send message Joined: 11 Feb 08 Posts: 2598 Credit: 47,220,881 RAC: 0	Message 96232 - Posted: 7 May 2020, 14:04:32 UTC - in response to Message 96210. then there is the Watchdog timer which is presently 10 hours Minor diversion from the topic: I know this is what the watchdog is set to the last time we heard, but wasn't it for a very specific reason? Does that reason apply any more? Because if it doesn't, it's a really long time for nominally 8hr task runtimes. My sense of the watchdog was it's to allow for relatively short overruns that happen from time to time, but provides a cutoff for tasks if they've kind of gone rogue for some unknown reason. 10hrs doesn't really do the job any more and should be reduced to something more appropriate (was 4hrs) ID: 96232 · Rating: 0 · rate: / Reply Quote

Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0	Message 96233 - Posted: 7 May 2020, 14:10:58 UTC - in response to Message 96232. If I'm not mistaken, I believe the watchdog was extended to 10 hours, specifically for these potentially long-running Halfroids. Rosetta Moderator: Mod.Sense ID: 96233 · Rating: 0 · rate: / Reply Quote

Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0	Message 96234 - Posted: 7 May 2020, 14:17:20 UTC - in response to Message 96230. One of those WUs used over 1GB and the other used over 2GB. What was in the out file about memory? It would seem that running fewer threads would be better than failing WUs. But I would suspect that BOINC client would have had to put the others to "waiting for memory" in order to run the larger one anyway. So, reducing the number of threads should basically be occurring automatically, and only when the specific WU requires it. Rosetta Moderator: Mod.Sense ID: 96234 · Rating: 0 · rate: / Reply Quote

Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1939 Credit: 18,534,891 RAC: 0	Message 96245 - Posted: 7 May 2020, 18:46:48 UTC - in response to Message 96230. and when these wu run, my Pi4 is close to using up all ram available. I'm not too sure if memory may after all be involved. Low available system RAM would impact on RAM available for disk caching. Grant Darwin NT ID: 96245 · Rating: 0 · rate: / Reply Quote

Sid Celery Send message Joined: 11 Feb 08 Posts: 2598 Credit: 47,220,881 RAC: 0	Message 96260 - Posted: 8 May 2020, 8:31:15 UTC - in response to Message 96233. If I'm not mistaken, I believe the watchdog was extended to 10 hours, specifically for these potentially long-running Halfroids. So the reason may still apply in future? I haven't seen one for a while. Ok ID: 96260 · Rating: 0 · rate: / Reply Quote

dduggan47 Send message Joined: 18 Sep 05 Posts: 12 Credit: 4,904,245 RAC: 2	Message 96271 - Posted: 8 May 2020, 18:01:10 UTC I apologize for asking a question that's probably already been asked and answered but, despite having been running BOINC since the early days (and SETI before BOINC existed), I'm not always sufficiently technical to follow all the details discussed here. My problem is that I'm getting many tasks which get "timed out - no response". For a while I was trying to look ahead and abort a lot of tasks, started and unstarted, which weren't going to finish by the deadline. I gather though that that might not be my best strategy for resolving this. On one machine a couple of days ago I changed the "store at least" and "... additional" to 1 day and 0.25 days respectively, but on the other box I forgot and didn't make that change until today. At the moment I have 3 running tasks on the 1st machine that will not make it. On the other machine it's 12 running and about that many more which haven't started yet but won't make the deadline. Am I right in assuming that BOINC will eventually figure this out? In the meantime, what's my best move? Abort all that won't make it? Abort only the unstarted? Let them all go until BOINC figures it out? Thanks. ID: 96271 · Rating: 0 · rate: / Reply Quote

Raistmer Send message Joined: 7 Apr 20 Posts: 49 Credit: 798,155 RAC: 0	Message 96275 - Posted: 8 May 2020, 19:09:28 UTC - in response to Message 96271. Last modified: 8 May 2020, 19:14:24 UTC Am I right in assuming that BOINC will eventually figure this out? In the meantime, what's my best move? Abort all that won't make it? Abort only the unstarted? Let them all go until BOINC figures it out? Thanks. This project (instead of SETI you familiar with) allows to reduce lenght of already received tasks. Best option for your host is to set them to minimal possible length anfd then gradually increase as long as you don't miss deadline. This can be done in project options here: https://clip2net.com/s/47qBO85 As you could see I have 2 different sets of options - for powerful hosts (big task length) and for netbooks/smartphones (short length, 4 hours per task currently) P.S. You need to update project settings (update project from BOINC) and then restart BOINC client itself to update already downloaded tasks length. Newly downloaded will be of new length already. ID: 96275 · Rating: 0 · rate: / Reply Quote

dduggan47 Send message Joined: 18 Sep 05 Posts: 12 Credit: 4,904,245 RAC: 2	Message 96289 - Posted: 9 May 2020, 4:47:00 UTC - in response to Message 96275. Thanks for your help, Raistmer. Am I right in assuming that BOINC will eventually figure this out? In the meantime, what's my best move? Abort all that won't make it? Abort only the unstarted? Let them all go until BOINC figures it out? Thanks. This project (instead of SETI you familiar with) allows to reduce lenght of already received tasks. Best option for your host is to set them to minimal possible length anfd then gradually increase as long as you don't miss deadline. This can be done in project options here: https://clip2net.com/s/47qBO85 As you could see I have 2 different sets of options - for powerful hosts (big task length) and for netbooks/smartphones (short length, 4 hours per task currently) This seems counterintuitive. Wouldn't I be better off to increase the expected length and then (I hope) run them in less time than to decrease the time and risk not making the deadlines? P.S. You need to update project settings (update project from BOINC) and then restart BOINC client itself to update already downloaded tasks length. Newly downloaded will be of new length already. I changed the expected times before reading your note but did it the opposite way as I described above. I can redo that if you advise that it would work better, even though I can't say I understand why. I also aborted anything that didn't look like it was going to make the deadline. After seeing your post I stopped and restarted the BOINC client. This seemed to increase the expected times by a lot more than my change on some (but not all) running tasks but had little or no effect on unstarted tasks. In my decades of running BOINC on around 40 different projects I've never run into this problem before. I'm finding it quite confusing. OTOH I was decades younger then too. Age tends not to reduce confusion! :-) Thanks again. ID: 96289 · Rating: 0 · rate: / Reply Quote

Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1939 Credit: 18,534,891 RAC: 0	Message 96290 - Posted: 9 May 2020, 5:13:39 UTC - in response to Message 96289. Am I right in assuming that BOINC will eventually figure this out? In the meantime, what's my best move? Abort all that won't make it? Abort only the unstarted? Let them all go until BOINC figures it out? Thanks. This project (instead of SETI you familiar with) allows to reduce lenght of already received tasks. Best option for your host is to set them to minimal possible length anfd then gradually increase as long as you don't miss deadline. The best option is just to use the default Target CPU Runtime, and to have no cache at all, given the number of projects you are running. Even if Rosetta were your only project, 0.5 days & 0.02 days extra is plenty. Grant Darwin NT ID: 96290 · Rating: 0 · rate: / Reply Quote

Raistmer Send message Joined: 7 Apr 20 Posts: 49 Credit: 798,155 RAC: 0	Message 96294 - Posted: 9 May 2020, 7:29:12 UTC - in response to Message 96289. Last modified: 9 May 2020, 7:31:16 UTC This seems counterintuitive. Wouldn't I be better off to increase the expected length and then (I hope) run them in less time than to decrease the time and risk not making the deadlines? Expected length is the amount of CPU time task will allowed to run. And here is the big difference with SETI and most other projects. Task doesn't contain fixed number of calculations to complete it. If CPU time allows, new model will be started for same task (slightly different initial atoms configuration or smth alike). So, if you allow 8 hours per task it will run 8 hours. Only 2h - then it will end in 2 hours. And yes, to avoid cache overflow in the future better to set BOINC cache size as small as it could be. But changing cache size will not help with already downloaded tasks. ID: 96294 · Rating: 0 · rate: / Reply Quote

Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1939 Credit: 18,534,891 RAC: 0	Message 96352 - Posted: 11 May 2020, 4:27:49 UTC rb_05_09_24541_24116_ab_t000__robetta_cstwt_5.0_FT_IGNORE_THE_REST_05_10_927507_5_0 <core_client_version>7.6.22</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1) </message> <stderr_txt> command: projects/boinc.bakerlab.org_rosetta/rosetta_4.20_windows_x86_64.exe @rb_05_09_24541_24116_ab_t000__robetta_FLAGS -in::file::fasta t000_.fasta -jumps:pairing_file t000_.fasta.bbcontacts.jumps -jumps:random_sheets 2 -constraints::cst_file t000_.fasta.CB.cst -constraints:cst_weight 5.0 -constraints::cst_fa_file t000_.fasta.MIN.cst -constraints:cst_fa_weight 5.0 -in:file:boinc_wu_zip rb_05_09_24541_24116_ab_t000__robetta.zip -frag3 rb_05_09_24541_24116_ab_t000__robetta.200.3mers.index.gz -fragA rb_05_09_24541_24116_ab_t000__robetta.200.10mers.index.gz -fragB rb_05_09_24541_24116_ab_t000__robetta.200.5mers.index.gz -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -mute all -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 1576447 Using database: database_357d5d93529_n_methylminirosetta_database [ ERROR ]: Caught exception: File: C:cygwin64homeboinc4.17Rosettamainsourcesrccore/pack/dunbrack/SingleResidueDunbrackLibrary.hh:306 chi angle must be between -180 and 180: -nan(ind) ------------------------ Begin developer's backtrace ------------------------- BACKTRACE: ------------------------- End developer's backtrace -------------------------- AN INTERNAL ERROR HAS OCCURED. PLEASE SEE THE CONTENTS OF ROSETTA_CRASH.log FOR DETAILS. </stderr_txt> ]]> This is the second time i've had this particular error message- last time it was dodgy WU, the other system that got it also got the same error. Waiting to see if that's the case again this time around. Grant Darwin NT ID: 96352 · Rating: 0 · rate: / Reply Quote

Ivailo Bonev Send message Joined: 9 May 07 Posts: 16 Credit: 6,196,220 RAC: 0	Message 96357 - Posted: 11 May 2020, 8:29:10 UTC https://boinc.bakerlab.org/rosetta/result.php?resultid=1176852042 <core_client_version>7.16.5</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1)</message> <stderr_txt> command: projects/boinc.bakerlab.org_rosetta/rosetta_4.20_windows_x86_64.exe -run:protocol jd2_scripting -parser:protocol jhr_boinc_v4.xml @flags -in:file:silent Junior_HalfRoid_design5_COVID-19_SAVE_ALL_OUT_IGNORE_THE_REST_6gx3kn9p.silent -in:file:silent_struct_type binary -silent_gz -mute all -out:file:silent_struct_type binary -out:file:silent default.out -in:file:boinc_wu_zip Junior_HalfRoid_design5_COVID-19_SAVE_ALL_OUT_IGNORE_THE_REST_6gx3kn9p.zip @Junior_HalfRoid_design5_COVID-19_SAVE_ALL_OUT_IGNORE_THE_REST_6gx3kn9p.flags -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 3876534 Using database: database_357d5d93529_n_methylminirosetta_database ERROR: [ERROR] Unable to open constraints file: f39b38c813752ceb1e616c99588b316d_n0_c0_1_0001.MSAcst ERROR:: Exit from: ......srccorescoringconstraintsConstraintIO.cc line: 457 BOINC:: Error reading and gzipping output datafile: default.out 11:22:22 (11520): called boinc_finish(1) </stderr_txt> ]]> ID: 96357 · Rating: 0 · rate: / Reply Quote

Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1939 Credit: 18,534,891 RAC: 0	Message 96383 - Posted: 12 May 2020, 6:17:40 UTC - in response to Message 96352. rb_05_09_24541_24116_ab_t000__robetta_cstwt_5.0_FT_IGNORE_THE_REST_05_10_927507_5_0 <core_client_version>7.6.22</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1) </message> <stderr_txt> command: projects/boinc.bakerlab.org_rosetta/rosetta_4.20_windows_x86_64.exe @rb_05_09_24541_24116_ab_t000__robetta_FLAGS -in::file::fasta t000_.fasta -jumps:pairing_file t000_.fasta.bbcontacts.jumps -jumps:random_sheets 2 -constraints::cst_file t000_.fasta.CB.cst -constraints:cst_weight 5.0 -constraints::cst_fa_file t000_.fasta.MIN.cst -constraints:cst_fa_weight 5.0 -in:file:boinc_wu_zip rb_05_09_24541_24116_ab_t000__robetta.zip -frag3 rb_05_09_24541_24116_ab_t000__robetta.200.3mers.index.gz -fragA rb_05_09_24541_24116_ab_t000__robetta.200.10mers.index.gz -fragB rb_05_09_24541_24116_ab_t000__robetta.200.5mers.index.gz -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -mute all -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 1576447 Using database: database_357d5d93529_n_methylminirosetta_database [ ERROR ]: Caught exception: File: C:cygwin64homeboinc4.17Rosettamainsourcesrccore/pack/dunbrack/SingleResidueDunbrackLibrary.hh:306 chi angle must be between -180 and 180: -nan(ind) ------------------------ Begin developer's backtrace ------------------------- BACKTRACE: ------------------------- End developer's backtrace -------------------------- AN INTERNAL ERROR HAS OCCURED. PLEASE SEE THE CONTENTS OF ROSETTA_CRASH.log FOR DETAILS. </stderr_txt> ]]> This is the second time i've had this particular error message- last time it was dodgy WU, the other system that got it also got the same error. Waiting to see if that's the case again this time around. Looks like it was another dodgy WU- other system had the same error. Grant Darwin NT ID: 96383 · Rating: 0 · rate: / Reply Quote

[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 2206 Credit: 13,720,774 RAC: 5	Message 96421 - Posted: 13 May 2020, 5:49:40 UTC Some "access violation" 1178319689 1178319933 etc ID: 96421 · Rating: 0 · rate: / Reply Quote

James W Send message Joined: 25 Nov 12 Posts: 130 Credit: 1,766,254 RAC: 0	Message 96476 - Posted: 14 May 2020, 7:59:22 UTC Last modified: 14 May 2020, 8:17:20 UTC Reference message 96433, which discusses problems with similar WUs. Edit- having said that, i just had one of those WUs do the same thing on my system, yet was processed OK on another system, and even though i've processed several others of the same type with no problems. 3cl_7aa_6lu7_modified_AVLstub_relaxed_renumbered_0074_110_extract_B_SAVE_ALL_OUT_927956_74_0 (unknown error) - exit code -1073741819 (0xc0000005) Unhandled Exception Detected... Reason: Access Violation (0xc0000005) at address 0x00007FF63B7D1D48 Name: new_3cl_10aa_6lu7_modified_AVLstub_relaxed_renumbered_0674_33_extract_B_SAVE_ALL_OUT_928500_391_1 Application: Rosetta v4.20 windows_x86_64 Device: 3710630 Task: 1178942057. WU: 1058857778 Status: Error while computing. Exit status: -1073741819 (0xC0000005) STATUS_ACCESS_VIOLATION Errors: Too many errors (may have bug) Too many total results. Stderr output: (unknown error) - exit code -1073741819 (0xc0000005) Unhandled Exception Detected... Reason: Access Violation (0xc0000005) at address 0x0000000140348316 read attempt to address 0xFFFFFFFF Engaging BOINC Windows Runtime Debugger... My task was the 2nd try for this WU. The first host got same error, so question issue with this type of WU/task. My host also rec'd the same error with WU 1058853076, with my host again being the 2nd try for the same task. Edit: As mentioned by others, some of the above WUs process normally while others receive the above-mentioned error. My host quoted above normally processed task 1178341520 (new_3cl_10aa_6lu7****). ID: 96476 · Rating: 0 · rate: / Reply Quote