Message boards : Number crunching : Rosetta 4.1+ and 4.2+
Previous · 1 . . . 26 · 27 · 28 · 29 · 30 · 31 · 32 . . . 34 · Next
Author | Message |
---|---|
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,167,753 RAC: 4,033 |
And of course all the failed ones get resent… Mine are all completing just fine, so hopefully you can do them just fine too. |
Brian Nixon Send message Joined: 12 Apr 20 Posts: 293 Credit: 8,432,366 RAC: 0 |
The resends were starting to fail, so I killed all the others that were still running. |
Stevie G Send message Joined: 15 Dec 18 Posts: 107 Credit: 837,888 RAC: 1,343 |
Same here; several have failed with an access violation after a little over an hour. Me too. 1305932126 1169355749 3551508 9 Dec 2020, 17:22:46 UTC 10 Dec 2020, 6:45:28 UTC Error while computing 3,981.64 3,606.70 --- Rosetta v4.20 windows_x86_64 1305932217 1169235239 3551508 9 Dec 2020, 17:22:46 UTC 10 Dec 2020, 5:15:44 UTC Error while computing 6,414.87 6,125.75 --- Rosetta v4.20 windows_x86_64 1305585721 1169160644 3551508 9 Dec 2020, 3:32:50 UTC 9 Dec 2020, 17:22:46 UTC Error while computing 6,736.60 6,458.57 58.00 Rosetta v4.20 windows_x86_64 S. Gaber |
Stevie G Send message Joined: 15 Dec 18 Posts: 107 Credit: 837,888 RAC: 1,343 |
Double post. I double click out of habit. S. Gaber |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1680 Credit: 17,836,696 RAC: 22,982 |
Probably 80% of mine so far have resulted in errors, only 20% actually completing OK.let’s see whether they manage to complete…They did. (Example.) The failed ones might just have been certain input values exposing a bug in an algorithm. Grant Darwin NT |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,611,496 RAC: 9,069 |
Probably 80% of mine so far have resulted in errors, only 20% actually completing OK. Obviously these wus are NOT tested on Ralph@Home. As usual, unfortunately. |
Aravah Send message Joined: 12 Apr 20 Posts: 6 Credit: 1,101,172 RAC: 0 |
Yes, likewise seen a lot fof ailures for 9 Dec work units for example (ComputerId 4466108) : 1169292471, 1169319345, 1169030986, 1169486445, 1169487431, 1169480600, 1169337527, 1169148343 'Hallucinated' failed with "could not open file 00001.200.9mers." 'MOF' failed with "File: src/utility/options/OptionCollection.cc:1398 Option matching -beta_nov15 not found in command line top-level context Did you mean: -corrections:beta_nov16" 'miniprotein_relax7' failed with "process got signal 11" (SIGSEGV - segmentation fault) With no debug/ stack trace so perhaps this is an unhandled exception when run under Fedora. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,611,496 RAC: 9,069 |
Yes, likewise seen a lot fof ailures for 9 Dec work units.... I cannot understand why not use Ralph They have a beta project with dedicated server and the queues are almost always empty. Publish bugged wus on production server (Rosetta) will move away volunteers |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,167,753 RAC: 4,033 |
Yes, likewise seen a lot fof ailures for 9 Dec work units.... I wonder if they hoped they would be fine and now that they aren't they will try and fix them or send them to Ralph for further testing. One problem with testing though is the lack of people over there, so unless they have people always banging on the door asking for tasks which they should be able to monitor, they may not get the diversity of pc's needed to troubleshoot the problem on Ralph. IF that's the case then we could be stuck with them for awhile. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,611,496 RAC: 9,069 |
One problem with testing though is the lack of people over there, so unless they have people always banging on the door asking for tasks which they should be able to monitor, they may not get the diversity of pc's needed to troubleshoot the problem on Ralph. It took years just to get the link of Ralph on Rosetta Home Page... I partecipated to Ralph since 2008. When they release work, it finishes in few hours There is a lot of volunteers who want to help testing, but seems that developers are not interested. |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,167,753 RAC: 4,033 |
One problem with testing though is the lack of people over there, so unless they have people always banging on the door asking for tasks which they should be able to monitor, they may not get the diversity of pc's needed to troubleshoot the problem on Ralph. Well that answers that question then!! |
Bill F Send message Joined: 29 Jan 08 Posts: 44 Credit: 1,569,024 RAC: 1,160 |
I am one of those Ralph users that patiently wait.... Most times I miss out as other members scoop up any work super fast. I have been a Ralph member since Jan 2018 Bill F In October 1969 I took an oath to support and defend the Constitution of the United States against all enemies, foreign and domestic; There was no expiration date. |
Brian Nixon Send message Joined: 12 Apr 20 Posts: 293 Credit: 8,432,366 RAC: 0 |
I cannot understand why not use RalphThe work units in the current batch seem to be minor variations of known-good configurations, so there’s probably an assumption that they will “just work” and don’t need pre-release testing. As we have seen, though, some of the recent WUs have shown that such assumed-good inputs can still expose bugs in Rosetta. By contrast, today’s Ralph tasks have very different command lines so are probably doing something quite new. That is the kind of change that does get tested on Ralph before being released on Rosetta. |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,167,753 RAC: 4,033 |
I cannot understand why not use RalphThe work units in the current batch seem to be minor variations of known-good configurations, so there’s probably an assumption that they will “just work” and don’t need pre-release testing. As we have seen, though, some of the recent WUs have shown that such assumed-good inputs can still expose bugs in Rosetta. Thanks I just got a stack of them!! |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1680 Credit: 17,836,696 RAC: 22,982 |
The horns5's are back. Compute errors galore. Grant Darwin NT |
Kissagogo27 Send message Joined: 31 Mar 20 Posts: 86 Credit: 2,916,897 RAC: 2,587 |
hi, the last i've got horns5_63667_245_945_looped21_165aa_hbnet_0001_0010_testboinc_0000800008_0000001_0_noligpoc2Ala_000000357FOL201217_BOINC_SAVE_ALL_OUT_1053176_4_1 |
Kissagogo27 Send message Joined: 31 Mar 20 Posts: 86 Credit: 2,916,897 RAC: 2,587 |
and then finished with status Succes ! Tâche 1314043584 Nom horns5_63667_245_945_looped21_165aa_hbnet_0001_0010_testboinc_0000800008_0000001_0_noligpoc2Ala_000000357FOL201217_BOINC_SAVE_ALL_OUT_1053176_4_1 Unité de travail (WU) 1176155110 Créé 27 Dec 2020, 12:13:58 UTC Envoyé 27 Dec 2020, 12:13:59 UTC Date limite de rapport 30 Dec 2020, 12:13:59 UTC Reçu 27 Dec 2020, 18:52:43 UTC État du serveur Sur Résultats Succès État du client Fait État à la sortie 0 (0x00000000) ID de l'ordinateur 3984635 Temps de fonctionnement 6 heures 30 min 29 sec Temps de CPU 6 heures 30 min 10 sec Valider l'état Valide Crédit 223.10 FLOPS maximum de l'appareil 4.11 GFLOPS Version de l'application Rosetta v4.20 windows_x86_64 Peak working set size 466.11 MB Peak swap size 442.47 MB Peak disk usage 9.96 MB Stderr output <core_client_version>7.16.11</core_client_version> <![CDATA[ <stderr_txt> command: projects/boinc.bakerlab.org_rosetta/rosetta_4.20_windows_x86_64.exe -run:protocol jd2_scripting -parser:protocol LA_MPM_design_boinc.xml -corrections::beta_nov16 -out:suffix _BoincSeq @flag_fastdesign_boinc -script_vars LIG_ID=159 MSAcst=horns5_63667_245_945_looped21_165aa_hbnet_0001_0010_testboinc_0000800008_0000001_0_noligpoc2Ala_000000357.MSAcst -silent_gz -mute all -out:file:silent_struct_type binary -out:file:silent default.out -in:file:boinc_wu_zip horns165aaFOL2012174218.zip -in:file:s horns5_63667_245_945_looped21_165aa_hbnet_0001_0010_testboinc_0000800008_0000001_0_noligpoc2Ala_000000357.pdb -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 3711909 Using database: database_357d5d93529_n_methylminirosetta_database ERROR: Assertion `active( key )` failed. ERROR:: Exit from: C:cygwin64homeboinc4.17Rosettamainsourcesrcutility/keys/SmallKeyVector.hh line: 548 19:44:33 (452): called boinc_finish(0) </stderr_txt> ]]> but with an error ... |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,611,496 RAC: 9,069 |
All MOF_ wus: 1316797097 <message> |
Brian Nixon Send message Joined: 12 Apr 20 Posts: 293 Credit: 8,432,366 RAC: 0 |
Same here: lots of (but not all) MOF tasks failing with an access violation within a few seconds of starting. The ones that do run finish after about 3 hours (against a default run time of 8). |
Kissagogo27 Send message Joined: 31 Mar 20 Posts: 86 Credit: 2,916,897 RAC: 2,587 |
Another one ...
|
Message boards :
Number crunching :
Rosetta 4.1+ and 4.2+
©2024 University of Washington
https://www.bakerlab.org