Message boards : Number crunching : Validate errors
Author | Message |
---|---|
Khali Send message Joined: 10 Mar 14 Posts: 1 Credit: 1,704,409 RAC: 0 |
I am new to Rosetta and I am getting a lot of Invalid tasks reported as Validate errors. I did a test run of Rosetta a few weeks ago and about 30% of the tasks I ran then were Validate errors. Some of my team mates said it might be because of my mild over clock. Since we were all crunching for a team event on another project I had not had a chance to run any more Rosetta tasks until yesterday. I removed my over clock and tried again. Validate errors are not as prevalent as they were but I am still getting them. Here is Rosetta's explanation of a Validate error. Validate error - The task was reported but could not be validated, typically because the output files were lost on the server. I can only conclude that I am spending three hours plus on each task only to have 25 to 30 percent of them get lost on your server. Not acceptable. This needs fixed asap. Task ID 651174635 Name gr040214_ama1_longee_newhair377_relax_SAVE_ALL_OUT_157229_40_0 Workunit 591389920 Created 3 Apr 2014 11:32:15 UTC Sent 3 Apr 2014 11:35:21 UTC Received 3 Apr 2014 14:05:41 UTC Server state Over Outcome Validate error Client state Done Exit status 0 (0x0) Computer ID 1730267 Report deadline 13 Apr 2014 11:35:21 UTC CPU time 2409.047 stderr out <core_client_version>7.2.42</core_client_version> <![CDATA[ <stderr_txt> [2014- 4- 3 8:22: 1:] :: BOINC:: Initializing ... ok. [2014- 4- 3 8:22: 1:] :: BOINC :: boinc_init() BOINC:: Setting up shared resources ... ok. BOINC:: Setting up semaphores ... ok. BOINC:: Updating status ... ok. BOINC:: Registering timer callback... ok. BOINC:: Worker initialized successfully. command: projects/boinc.bakerlab.org_rosetta/minirosetta_3.48_windows_x86_64.exe -out:file:silent default.out -in:file:s 00001.pdb -frag3 00001.200.3mers -in:file:native 00001.pdb -frag9 00001.200.9mers -silent_gz 1 -ex2aro 1 -relax::default_repeats 15 -in:file:fullatom 1 -run:protocol relax -ex1 1 -in:file:boinc_wu_zip gr040214_ama1_longee_newhair377_data.zip -out:file:silent default.out -silent_gz -mute all -detect_disulf True -in:file:native 00001.pdb -in:file:fullatom -in:file:s 00001.pdb -nstruct 10000 -cpu_run_time 10800 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -run::rng mt19937 -constant_seed -jran 1434278 Registering options.. Registered extra options. Initializing broker options ... Registered extra options. Initializing core... Initializing options.... ok Options::initialize() Options::adding_options() Options::initialize() Check specs. Options::initialize() End reached Loaded options.... ok Processed options.... ok Initializing random generators... ok Initialization complete. Initializing options.... ok Options::initialize() Options::adding_options() Options::initialize() Check specs. Options::initialize() End reached Loaded options.... ok Processed options.... ok Initializing random generators... ok Initialization complete. Setting WU description ... Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_b14204d.zip Unpacking WU data ... Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/gr040214_ama1_longee_newhair377_data.zip Setting database description ... Setting up checkpointing ... Setting up graphics native ... BOINC:: Worker startup. Starting watchdog... Watchdog active. ====================================================== DONE :: 99 starting structures 2408.23 cpu seconds This process generated 99 decoys from 99 attempts ====================================================== BOINC :: WS_max 4.39628e+008 BOINC :: Watchdog shutting down... BOINC :: BOINC support services shutting down cleanly ... called boinc_finish </stderr_txt> ]]> Validate state Invalid Claimed credit 25.3788504634809 Granted credit 0 application version 3.48 |
Betting Slip Send message Joined: 26 Sep 05 Posts: 71 Credit: 5,702,246 RAC: 0 |
I am new to Rosetta and I am getting a lot of Invalid tasks reported as Validate errors. I did a test run of Rosetta a few weeks ago and about 30% of the tasks I ran then were Validate errors. I am getting them as well and I would like to know why, results are either valid or not. |
Killersocke@rosetta Send message Joined: 13 Nov 06 Posts: 29 Credit: 2,579,125 RAC: 0 |
exactly same problems here with some of my tasks since 4th April regards |
Betting Slip Send message Joined: 26 Sep 05 Posts: 71 Credit: 5,702,246 RAC: 0 |
Me to! |
Ananas Send message Joined: 1 Jan 06 Posts: 232 Credit: 752,471 RAC: 0 |
one more And this one (not mine) As Rosetta doesn't have validation on comparison level, I wonder if it could be a problem with the upload handler. If the message "Couldn't resolve host name" is caused by a network problem on the Rosetta site, BOINC server side tasks could be affected as well. Afaik. "finished upload" just means that the upload handler was able to store the file in a temporary storage place, but then the file has to be moved and if this move fails, the uploading host will probably not notice it (that's how http uploads usually work). |
Billy Send message Joined: 29 May 06 Posts: 13 Credit: 1,536,368 RAC: 0 |
Same here. Mac OSX 10.7.5 Boinc Manager 7.3.13 Billy |
Columbus Send message Joined: 2 Jan 10 Posts: 2 Credit: 24,743 RAC: 0 |
Yes, I noticed that too. This seems to be a recent problem, I didn't have any issues in the past. I'm also wondering why my "tasks for user" page lists more active WUs than I have downloaded. At least, I can't find them on my computer or on the BOINC manager's task list. They are just sitting there on the web page doing nothing and when the deadline is reached, they are tagged as "over - no reply". |
Link Send message Joined: 4 May 07 Posts: 356 Credit: 382,349 RAC: 0 |
If the message "Couldn't resolve host name" is caused by a network problem on the Rosetta site, BOINC server side tasks could be affected as well. "Couldn't resolve host name" is DNS problem on your side. The Rosetta servers will very unlikely use the same DNS server as you, if they need one at all (doubt that). . |
Link Send message Joined: 4 May 07 Posts: 356 Credit: 382,349 RAC: 0 |
I'm also wondering why my "tasks for user" page lists more active WUs than I have downloaded. At least, I can't find them on my computer or on the BOINC manager's task list. They are just sitting there on the web page doing nothing and when the deadline is reached, they are tagged as "over - no reply". Are the names of those tasks similar to those from this thread? . |
Columbus Send message Joined: 2 Jan 10 Posts: 2 Credit: 24,743 RAC: 0 |
Are the names of those tasks similar to those from this thread? No, they are more like these: foldit_997258_0009_fold_SAVE_ALL_OUT_155429_717_0 yrssfrv2d3_8_fold_SAVE_ALL_OUT_155181_1806_0 gr033114_ama1_longee_try75_fold_SAVE_ALL_OUT_156790_259_0 MoltnTIA_fold_SAVE_ALL_OUT_152738_13068_1 So, they have relatively short names. But there's also this: C3_1kr4_C2_1sgm_0006_trimer_C3_1kr4_C2_1sgm_0006_dimer_patchdock_split_08_140330_SAVE_ALL_OUT__156751_154_0 However, none of them have two consecutive dots in their names. Perhaps I should have posted this in the other thread. |
Ananas Send message Joined: 1 Jan 06 Posts: 232 Credit: 752,471 RAC: 0 |
nearly 20% validate errors lately, I guess I'll disable Rosetta until this is solved. p.s.: my runtime prefs are set to 8 hours and the broken ones all ran over the full timespan, maybe some results cannot handle that? There are Windows and Linux results with this problem, so it is independant from the OS. |
Ananas Send message Joined: 1 Jan 06 Posts: 232 Credit: 752,471 RAC: 0 |
Second delivery of one of my invalid results : Outcome Client error Validate state Invalid Granted credit 300 Btw. : My invalid WUs show granted credits too, not in the list but on the details page. I'm even more confused now - are the results useful and recoverd manually or are they used to fill the trashcan? |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Rosetta grants credit (in a nightly script) for failed tasks. The idea being that learning about failure is a part of finding success and so it is of value. Often times there are specific models that do not process in a timely mannar. So a given task may have many normal and successful models produced, then the last one gets hung up and runs long. So, yes, there is also such a thing as partial success (BOINC does not really support the concept). And learning about what hangs up an algorithm is useful too. Rosetta Moderator: Mod.Sense |
Ananas Send message Joined: 1 Jan 06 Posts: 232 Credit: 752,471 RAC: 0 |
... So, yes, there is also such a thing as partial success (BOINC does not really support the concept). And learning about what hangs up an algorithm is useful too. A very important information, if crashed results are used to improve the methods, they are not really a waste of energy. Thanks :-) |
Cesar Gil Send message Joined: 5 Apr 14 Posts: 1 Credit: 34,245 RAC: 0 |
If Rosetta grants credit for failed tasks, then I must conclude that they are something different from validate error tasks, since I have validate error tasks even from days ago and none of them receives credits. Since the rate of validate error tasks is high, I guess I'll switch to helping other projects until this gets fixed. It is not reasonable that I see computing power systematically being given for no salvageable result. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
If Rosetta grants credit for failed tasks, then I must conclude that they are something different from validate error tasks, since I have validate error tasks even from days ago and none of them receives credits. The granting of credit for errors is done daily, and when it occurs, the credit is NOT shown on the list of WUs, so looking at the WU details for one of your validate errors more than a day old shows you were granted all of the credit you claimed: https://boinc.bakerlab.org/rosetta/result.php?resultid=654447052 Rosetta Moderator: Mod.Sense |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
Hi. I'll add myself to the list of those getting validate errors now as well, I think this has only started for me anyway since the servers had been moved. It seams to be a mixture of task name types for both my rigs, plus with the trouble getting work that seamed start around the same time, just wondering if something got knocked about on one or more of the servers when they where moved. just my 2c! |
Betting Slip Send message Joined: 26 Sep 05 Posts: 71 Credit: 5,702,246 RAC: 0 |
Hi. Me to and they are getting on my *******'s Not sure how many of the people who run this project care anymore. |
Tex1954 Send message Joined: 3 Apr 11 Posts: 9 Credit: 3,394,752 RAC: 9 |
Add me as well... 2 out of 6 3-hour tasks... letting a single 8-hour run now... https://boinc.bakerlab.org/rosetta/result.php?resultid=656066846 https://boinc.bakerlab.org/rosetta/result.php?resultid=656066843 8-) |
Jozef J Send message Joined: 7 Jun 12 Posts: 3 Credit: 1,156,504 RAC: 0 |
I donated to your project, I'm waiting some response or donation badge |
Message boards :
Number crunching :
Validate errors
©2024 University of Washington
https://www.bakerlab.org