Minirosetta 3.46

Message boards : Number crunching : Minirosetta 3.46

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 75606 - Posted: 12 May 2013, 8:48:16 UTC

And another three of these, this 1 ran for 8hrs my run time is now 4hrs so why didn't it stop earlier.

The other 2 I aborted at over 6hrs because I couldn't see them finishing without getting an error & wasting more time anyway.

B.T.W. they ran non-stop for all that time, so I don't know why it's saying that there is 2 starting structures I think it normally says 1?


rb_05_10_38828_73745__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_80811_594_0

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=527333216

# cpu_run_time_pref: 14400
BOINC:: CPU time: 29192.9s, 14400s + 14400s[2013- 5-12 18:22:12:] :: BOINC
InternalDecoyCount: 12
======================================================
DONE :: 2 starting structures 29192.9 cpu seconds
This process generated 12 decoys from 12 attempts
======================================================
called boinc_finish
SIGSEGV: segmentation violation

</stderr_txt>
]]>

Validate state Invalid
Claimed credit 298.87771452369
Granted credit 0
application version 3.46
ID: 75606 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 75614 - Posted: 14 May 2013, 22:08:51 UTC

Still getting errors on these tasks, I'm aborting any of these I see.

rb_05_12_38289_72979__t000__3_C1_SAVE_ALL_OUT_IGNORE_THE_REST_80960_200_0

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=527807465

# cpu_run_time_pref: 14400
dof_atom1 atomno= 3 rsd= 1
atom1 atomno= 1 rsd= 1
atom2 atomno= 2 rsd= 1
atom3 atomno= 5 rsd= 1
atom4 atomno= 6 rsd= 1
THETA1 nan
THETA3 nan
PHI2 0

ERROR: AtomTree::torsion_angle_dof_id: angle range error
ERROR:: Exit from: src/core/kinematics/AtomTree.cc line: 780

ERROR: unknown atom_name: PRO NV
ERROR:: Exit from: src/core/chemical/ResidueType.cc line: 2016
SIGSEGV: segmentation violation
Stack trace (17 frames):
[0xb2aef87]
[0xf7735400]
[0xa166837]
[0xa1f3edc]
[0xa1f4e3c]
[0x996c8d6]
[0x996df60]
[0x89561af]
[0x867d35e]
[0x992d14f]
[0x9931429]
[0x9aebcad]
[0x9b4f815]
[0x9b4d045]
[0x8054950]
[0xb33f328]
[0x8048131]

Exiting...

</stderr_txt>
]]>
ID: 75614 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 75655 - Posted: 23 May 2013, 10:30:31 UTC

These hyb_cb_bench_(etc) tasks are really poor credit tasks.
they complete ok, but give you only 20 credits for your work.
ID: 75655 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
James W

Send message
Joined: 25 Nov 12
Posts: 130
Credit: 1,766,254
RAC: 0
Message 75657 - Posted: 24 May 2013, 5:04:25 UTC

I've noticed over the last couple weeks that there have been several types of jobs I haven't seen before (some beginning with the "hyb" or "hybred," "cyto," etc.) These jobs are not setting checkpoints, even after crunching up to 11 hours or so (with checkpoint limited to no more than every 60 sec. in computer pref.) The jobs starting "rb_5_17" and other "dates" continue to have checkpoints as usual.

The problem, as noted in another recent thread, is that if I must shut down my system or reboot (such as for doing Windows updates, updating applications, etc.), or if I must close BIONC, I lose all the work in these "new" type jobs without checkpoints.

Were these new jobs set up this way, or was this an oversight? I've already lost a number of hours/days of crunching because of this issue before I got an idea of what was happening. Is there a reasonable workaround for this problem? Thanks.
ID: 75657 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1994
Credit: 9,623,704
RAC: 7,594
Message 75658 - Posted: 24 May 2013, 12:31:51 UTC - in response to Message 75657.  

Were these new jobs set up this way, or was this an oversight? I've already lost a number of hours/days of crunching because of this issue before I got an idea of what was happening. Is there a reasonable workaround for this problem? Thanks.


Target CPU run time: 1 hour..... :-)

ID: 75658 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1232
Credit: 14,281,662
RAC: 1,402
Message 75659 - Posted: 24 May 2013, 12:56:41 UTC - in response to Message 75658.  

Were these new jobs set up this way, or was this an oversight? I've already lost a number of hours/days of crunching because of this issue before I got an idea of what was happening. Is there a reasonable workaround for this problem? Thanks.


Target CPU run time: 1 hour..... :-)


That should work if you're allowed to set it that low. If I remember correctly, though, the minimum is now 3 hours.
ID: 75659 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1994
Credit: 9,623,704
RAC: 7,594
Message 75660 - Posted: 24 May 2013, 15:13:57 UTC - in response to Message 75659.  

That should work if you're allowed to set it that low. If I remember correctly, though, the minimum is now 3 hours.


My run time is 2 hours....

ID: 75660 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1994
Credit: 9,623,704
RAC: 7,594
Message 75774 - Posted: 19 Jun 2013, 7:59:07 UTC

A lot of rb_number wu's cannot start graphic (and, i think, calculation).
There is a simple green line and 0 steps.
I kill these wus
ID: 75774 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1895
Credit: 9,169,305
RAC: 3,078
Message 75778 - Posted: 19 Jun 2013, 15:18:52 UTC - in response to Message 75655.  

These hyb_cb_bench_(etc) tasks are really poor credit tasks.
they complete ok, but give you only 20 credits for your work.


I am SOOOO glad I am almost outa here!! My cryo tasks are again taking 7 hours to finish, I just use the defaults, and I am getting 20 to 25 frickin credits for them. NOW the same thing is happening with the RB units, PATHETIC!!! My eb units are doing okay but it is a pain trying to keep 10 systems clear of all the bad units!! I AM trying to help but my rac is DECLINING and my work output is RISING, that is just NOT RIGHT!!!
ID: 75778 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Speedy
Avatar

Send message
Joined: 25 Sep 05
Posts: 163
Credit: 808,098
RAC: 0
Message 75782 - Posted: 21 Jun 2013, 22:26:00 UTC - in response to Message 75774.  

A lot of rb_number wu's cannot start graphic (and, i think, calculation).
There is a simple green line and 0 steps.
I kill these wus

I am running rb_06_21_39751_75850__t000__3_C1_SAVE_ALL_OUT_IGNORE_THE_REST_87610_60 the default run time (3 hours) and the graphics are working perfectly.
Have a crunching good day!!
ID: 75782 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1895
Credit: 9,169,305
RAC: 3,078
Message 75788 - Posted: 23 Jun 2013, 14:59:03 UTC

How is THIS my fault?
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=534659615

stderr out

<core_client_version>7.0.64</core_client_version>
<![CDATA[
<message>
Maximum disk usage exceeded
</message>
<stderr_txt>
of [-1,+1] sin and cos value legal range
sin_cos_range ERROR: 1.#QNAN00 is outside of [-1,+1] sin and cos value legal range
sin_cos_range ERROR: 1.#QNAN00 is outside of [-1,+1] sin and cos value legal range
sin_cos_range ERROR: 1.#QNAN00 is outside of [-1,+1] sin and cos value legal range
sin_cos_range ERROR: 1.#QNAN00 is outside of [-1,+1] sin and cos value legal range
sin_cos_range ERROR: 1.#QNAN00 is outside of [-1,+1] sin and cos value legal range
sin_cos_range ERROR: 1.#QNAN00 is outside of [-1,+1] sin and cos value legal range
sin_cos_range ERROR: 1.#QNAN00 is outside of [-1,+1] sin and cos value legal range
sin_cos_range ERROR: 1.#QNAN00 is outside of [-1,+1] sin and cos value legal range
sin_cos_range ERROR: 1.#QNAN00 is outside of [-1,+1] sin and cos value legal range
sin_cos_range ERROR: 1.#QNAN00 is outside of [-1,+1] sin and cos value legal range
sin_cos_range ERROR: 1.#QNAN00 is outside of [-1,+1] sin and cos value legal range
sin_cos_range ERROR: 1.#QNAN00 is outside of [-1,+1] sin and cos value legal range

etc, etc, ETC!!!

The pc is:
CPU type AuthenticAMD
AMD Phenom(tm) II X6 1090T Processor [Family 16 Model 10 Stepping 0]
Number of CPUs 6
Operating System Microsoft Windows 7
Ultimate x64 Edition, Service Pack 1, (06.01.7601.00)
Memory 12283.63 MB
Cache 512 KB
Swap space 24565.44 MB

That is SIX cpu's with ONLY five crunching, the other is being used to support the gpu, with some left over for whatever.

The unit took "CPU time 14335.48", ie 4 HOURS, and then it just errored out!! What kind of project is this right now?!!! LOTS of problems with the different kinds of units yet they are being released like this is a BETA project or something!! Rosetta is SUPPOSED to be about the SCIENCE, not releasing units to 'see if they work or not'!! I thought that's what the Beta Project was all about, testing the units PRIOR to them being released here!!!
ID: 75788 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1232
Credit: 14,281,662
RAC: 1,402
Message 75790 - Posted: 23 Jun 2013, 20:32:29 UTC - in response to Message 75788.  

How is THIS my fault?
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=534659615


Rosetta@Home doesn't use beta testing - they use alpha testing (RALPH@Home) instead. Does this make them think that Rosetta@Home should do the beta testing instead? If so, the results are often worse than most of the alpha test BOINC projects I have my computers participate in.
ID: 75790 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 75791 - Posted: 24 Jun 2013, 6:48:59 UTC

Had a couple of these tasks error today, this message goes on for a few pages.

CASP9_fb_benchmark_hybridization_run54_T0534_1_C2_SAVE_ALL_OUT_IGNORE_THE_REST_47953_1846_1

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=534900423

ERROR: error in process_residue_request: 'com'
ERROR:: Exit from: src/core/conformation/symmetry/util.cc line: 93
# cpu_run_time_pref: 21600


ERROR: error in process_residue_request: 'com'
ERROR:: Exit from: src/core/conformation/symmetry/util.cc line: 93

ERROR: error in process_residue_request: 'com'
ERROR:: Exit from: src/core/conformation/symmetry/util.cc line: 93
======================================================
DONE :: 99 starting structures 1201 cpu seconds
This process generated 99 decoys from 99 attempts
======================================================
BOINC :: WS_max 0

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish
ID: 75791 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1994
Credit: 9,623,704
RAC: 7,594
Message 75792 - Posted: 24 Jun 2013, 8:32:37 UTC - in response to Message 75790.  
Last modified: 24 Jun 2013, 8:33:18 UTC

Rosetta@Home doesn't use beta testing - they use alpha testing (RALPH@Home) instead. Does this make them think that Rosetta@Home should do the beta testing instead? If so, the results are often worse than most of the alpha test BOINC projects I have my computers participate in.


I partecipate to ralph@home project, but sometimes i think that the possibility to test largely the new version/new code/etc is VERY understimated.
And admins do not partecipate on test forum.....
ID: 75792 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1895
Credit: 9,169,305
RAC: 3,078
Message 75795 - Posted: 24 Jun 2013, 11:02:22 UTC - in response to Message 75792.  

Rosetta@Home doesn't use beta testing - they use alpha testing (RALPH@Home) instead. Does this make them think that Rosetta@Home should do the beta testing instead? If so, the results are often worse than most of the alpha test BOINC projects I have my computers participate in.


I partecipate to ralph@home project, but sometimes i think that the possibility to test largely the new version/new code/etc is VERY understimated.
And admins do not partecipate on test forum.....


To be honest until some of us screamed, yelled and started aborting all the cryo units recently the Admins aren't HERE either!! I guess they 'are too busy' to waste their time seeing if what they designed actually works in the REAL WORLD!!
ID: 75795 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1895
Credit: 9,169,305
RAC: 3,078
Message 75796 - Posted: 24 Jun 2013, 13:28:58 UTC
Last modified: 24 Jun 2013, 13:30:14 UTC

Here is ANOTHER "hyb-ab-bench" unit that just cost me SEVEN HOURS of crunching time and THEN errored out:
https://boinc.bakerlab.org/rosetta/result.php?resultid=588986073

The reason:
</stderr_txt>
<message>
upload failure: <file_xfer_error>
<file_name>hyb_ab_bench_4aimA_SAVE_ALL_OUT_IGNORE_THE_REST_53960_1303_0_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>

UPLOAD FAILURE----WTF are you telling me? Is Rosetta telling me that AFTER SEVEN HOURS of crunching a unit fails to upload and I will get NO CREDITS for it???!!!!!!!! WHERE is the Scientist who designed these things? Why is SOMEONE not here explaining what the heck is going on?!!!! This is JUST ONE of my pc's here, I have NOT checked the others, but it is NOT the same one as the last problem I posted about having problems with!!!
ID: 75796 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
TJ

Send message
Joined: 29 Mar 09
Posts: 127
Credit: 4,799,890
RAC: 0
Message 75799 - Posted: 24 Jun 2013, 19:07:39 UTC - in response to Message 75778.  

These hyb_cb_bench_(etc) tasks are really poor credit tasks.
they complete ok, but give you only 20 credits for your work.


I am SOOOO glad I am almost outa here!! My cryo tasks are again taking 7 hours to finish, I just use the defaults, and I am getting 20 to 25 frickin credits for them. NOW the same thing is happening with the RB units, PATHETIC!!! My eb units are doing okay but it is a pain trying to keep 10 systems clear of all the bad units!! I AM trying to help but my rac is DECLINING and my work output is RISING, that is just NOT RIGHT!!!

It is indeed not right. The server code is obsolete and a cause of the low credit, for any WU. But they will not update it here unless it totally crashed and cannot be brought to live again... shame.
They are lucky that their research is quite important, otherwise...there are many many projects that need CPU-time.
Greetings,
TJ.
ID: 75799 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
TJ

Send message
Joined: 29 Mar 09
Posts: 127
Credit: 4,799,890
RAC: 0
Message 75800 - Posted: 24 Jun 2013, 19:10:10 UTC - in response to Message 75658.  

Were these new jobs set up this way, or was this an oversight? I've already lost a number of hours/days of crunching because of this issue before I got an idea of what was happening. Is there a reasonable workaround for this problem? Thanks.


Target CPU run time: 1 hour..... :-)

I have seen this on preferences but have now idea what it does or where it is for.
Can someone please explain this?
Greetings,
TJ.
ID: 75800 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1232
Credit: 14,281,662
RAC: 1,402
Message 75803 - Posted: 25 Jun 2013, 1:59:42 UTC - in response to Message 75800.  
Last modified: 25 Jun 2013, 2:02:30 UTC

Were these new jobs set up this way, or was this an oversight? I've already lost a number of hours/days of crunching because of this issue before I got an idea of what was happening. Is there a reasonable workaround for this problem? Thanks.


Target CPU run time: 1 hour..... :-)

I have seen this on preferences but have now idea what it does or where it is for.
Can someone please explain this?


Rosetta@Home workunits are set up in usually 100 sections, called decoys. They try to run however many of these decoys they expect to finish in the target CPU run time, but can go over if the last one takes longer than expected.

I'm not sure if the shutdown code runs properly if the last decoy that was finished reported an error instead of a good answer.
ID: 75803 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 75805 - Posted: 25 Jun 2013, 8:09:19 UTC

Looks like I'm going to have to take the big hammer to some of these tasks, I'm not amused at all. My 6hr runtime ended up over 10hrs.

CASP9_bw_benchmark_hybridization_run49_T0606_1_C1_SAVE_ALL_OUT_IGNORE_THE_REST_46414_1348_0

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=534948991

Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/input_CASP9_bw_benchmark_hybridization_run49_T0606_1_C1_yfsong.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
# cpu_run_time_pref: 21600
BOINC:: CPU time: 36341.9s, 14400s + 21600s[2013- 6-25 17:58:54:] :: BOINC
InternalDecoyCount: 2
======================================================
DONE :: 2 starting structures 36341.9 cpu seconds
This process generated 2 decoys from 2 attempts
======================================================
called boinc_finish
SIGSEGV: segmentation violation
Stack trace (21 frames):
[0xb2aef87]
[0xf777f400]
[0xa6ce54c]
[0xa6e7659]
[0xa1648c7]
[0xa1f2dd2]
[0xa1f4df1]
[0x9d4d1a5]
[0x9f10187]
[0x9d56457]
[0x9d4265a]
[0x8925eca]
[0x8681018]
[0x992d14f]
[0x9931429]
[0x9aebcad]
[0x9b4f815]
[0x9b4d045]
[0x8054950]
[0xb33f328]
[0x8048131]

Exiting...

</stderr_txt>
]]>

Validate state Valid
Claimed credit 280.45
Granted credit 11.25



ID: 75805 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · Next

Message boards : Number crunching : Minirosetta 3.46



©2024 University of Washington
https://www.bakerlab.org