Mini Rosetta 3.45

Message boards : Number crunching : Mini Rosetta 3.45

To post messages, you must log in.

1 · 2 · 3 · 4 · Next

AuthorMessage
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1232
Credit: 14,281,015
RAC: 1,790
Message 74411 - Posted: 15 Nov 2012, 0:47:32 UTC
Last modified: 15 Nov 2012, 0:50:52 UTC

I now have three 3.45 workunits on my 64-bit Windows 7 computer. Two just gave computation errors; that may be the zdock problem mentioned in another thread.

They show some progress, and do not automatically start the screensaver even if not requested (like 3.43 workunits do).
ID: 74411 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1232
Credit: 14,281,015
RAC: 1,790
Message 74413 - Posted: 15 Nov 2012, 1:03:03 UTC

The two failed workunits gave this error:

<core_client_version>7.0.28</core_client_version>
<![CDATA[
<message>
Maximum disk usage exceeded
</message>

Could you tell us just how much disk space these workunits are trying to use, so that those of us with extra disk space can decide whether it is reasonable to allow BOINC to use more disk space?
ID: 74413 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1232
Credit: 14,281,015
RAC: 1,790
Message 74414 - Posted: 15 Nov 2012, 1:15:53 UTC

How do you calculate the disk space limit for Rosetta@Home? For example, I allow BOINC to use 50 GB of disk space on one of my computers, but it is divided among 23 BOINC projects.

Would it be reasonable to make the next version of minirosetta add a little more information to the error message about running out of disk space, such as just how much disk space the workunit is trying to use and the limit Rosetta@Home is allowed to use?
ID: 74414 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mad_Max

Send message
Joined: 31 Dec 09
Posts: 209
Credit: 25,983,050
RAC: 14,947
Message 74418 - Posted: 15 Nov 2012, 2:36:36 UTC

AFAIK this error (Maximum disk usage exceeded) pop-up then app hit disk limit set to one individual WU. Not total BOINC disk usage limit.

I explore some of last WUs on my account and found that R@H set this limit to 300 Мб per WU
For example:

- <workunit>
<name>rb_11_08_34659_65118__t000__1_C1_SAVE_ALL_OUT_IGNORE_THE_REST_63875_115</name>
<app_name>minirosetta</app_name>
<version_num>341</version_num>
<rsc_fpops_est>40000000000000.000000</rsc_fpops_est>
<rsc_fpops_bound>500000000000000.000000</rsc_fpops_bound>
<rsc_memory_bound>100000000.000000</rsc_memory_bound>
<rsc_disk_bound>300000000.000000</rsc_disk_bound>
<command_line>-run:protocol jd2_scripting @flags_rb_11_08_34659_65118__t000__1_C1_robetta -silent_gz -mute all -out:file:silent default.out -in:file:boinc_wu_zip input_rb_11_08_34659_65118__t000__1_C1_robetta.zip -nstruct 10000 -cpu_run_time 10800 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -run::rng mt19937 -constant_seed -jran 2511608</command_line>
- <file_ref>
<file_name>flags_rb_11_08_34659_65118__t000__1_C1_robetta</file_name>
<open_name>flags_rb_11_08_34659_65118__t000__1_C1_robetta</open_name>
</file_ref>
- <file_ref>
<file_name>input_rb_11_08_34659_65118__t000__1_C1_robetta.zip</file_name>
<open_name>input_rb_11_08_34659_65118__t000__1_C1_robetta.zip</open_name>
</file_ref>
</workunit>

300 Мб sometimes is not enough, because about 170 MB of disk space is always used by minirosetta_database (decompressed for each task), leaving for all other files (input, temporary, output) ~ 130 Mb.
If you got WU with large input or/and output files (>130 Mb total in decopressed state) you will catch this error.

So we cannot do anything on your client side. It can be corrected only on the server side (the simplest way would be to raise this limit for all new generated WUs, best way - modify app to use only one instance of database, instead of unpacking it to a work folder of each WU)
ID: 74418 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Link
Avatar

Send message
Joined: 4 May 07
Posts: 356
Credit: 382,349
RAC: 0
Message 74421 - Posted: 15 Nov 2012, 8:38:08 UTC - in response to Message 74418.  

So we cannot do anything on your client side.

That's not really true, you can always patch this limit in your client_state.xml, but we really should not need to do things like that. It has been pointed out to the project staff already in the 3.41 thread when the zdock WUs were failing with that error. I don't understand why rosetta is using such small value there, other projects (I run) have up to over 1000 times more than they actually need.
.
ID: 74421 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 74429 - Posted: 15 Nov 2012, 18:26:12 UTC

Thanks for pointing this out. I can't express enough how helpful you all are. This is definitely a problem, particularly from our last application update since we included a very big database file which has been removed in our current application package. I will update our default settings.

Thanks again!

David Kim
ID: 74429 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 74430 - Posted: 15 Nov 2012, 18:33:55 UTC

The new default setting is x100 the previous value now. Newly submitted jobs from our research group will have this change but it will take some time for the current jobs in the queue to flush out.
ID: 74430 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 74435 - Posted: 16 Nov 2012, 1:00:48 UTC

This ran for over 8hrs on my 4hr limit then this happened.
================================

hyb_ac_bench_3rojD_10_SAVE_ALL_OUT_IGNORE_THE_REST_54834_171_1

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=492799306

BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev52077.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/input_hyb_ac_bench_3rojD_10_yfsong.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
# cpu_run_time_pref: 14400
BOINC:: CPU time: 29087.1s, 14400s + 14400s[2012-11-16 9:59: 0:] :: BOINC
WARNING! cannot get file size for default.out.gz: could not open file.
Output exists: default.out.gz Size: -1
InternalDecoyCount: 0 (GZ)
-----
0
-----
Stream information inconsistent.
Writing W_0000001
======================================================
DONE :: 1 starting structures 29087.1 cpu seconds
This process generated 1 decoys from 1 attempts
======================================================
called boinc_finish
SIGSEGV: segmentation violation
Stack trace (20 frames):
[0xafed447]
[0xf77c3400]
[0xa44e8ce]
[0xa363c49]
[0xa073900]
[0xa084eb3]
[0xa086655]
[0x980230d]
[0x9831f83]
[0x9834e85]
[0x892dfca]
[0x866eaea]
[0x97b619f]
[0x97bde49]
[0x9979352]
[0x99d8ed5]
[0x99d6705]
[0x80547cc]
[0xb07d7e8]
[0x8048131]

Exiting...

</stderr_txt>
<message>
upload failure: <file_xfer_error>
<file_name>hyb_ac_bench_3rojD_10_SAVE_ALL_OUT_IGNORE_THE_REST_54834_171_1_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>

</message>
]]>

ID: 74435 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mad_Max

Send message
Joined: 31 Dec 09
Posts: 209
Credit: 25,983,050
RAC: 14,947
Message 74445 - Posted: 16 Nov 2012, 13:59:52 UTC - in response to Message 74430.  

The new default setting is x100 the previous value now. Newly submitted jobs from our research group will have this change but it will take some time for the current jobs in the queue to flush out.

Be careful with such aggressive settings! This may cause more problems than fixes:

1. This is not just limit, but space allocation instruction for BOINC too.
Description from BOINC site (http://boinc.berkeley.edu/trac/wiki/JobIn#

Resource estimates and bounds
.......
rsc_disk_bound
A bound on the maximum disk space used by the job, including all input, temporary, and output files. The job will only be sent to hosts with at least this much available disk space. If this bound is exceeded, the job will be aborted.

So if you rise this limit by x100 times from current 300 Mb = 30 Gb, most of BOINC clients can not run such tasks at all (Because typical limits and the limits set by default for entire BOINC client is usual less than this value, for SSD disks 30 Gb of free space may be a problem too)
Other DC projects can afford to set a limit like 100 times greater than that actually required because they using very small amounts of disk space (unlike the R@H). So even х100 or x500 is still small absolute values.
For example:
POEM@Home use only ~50-100 kilobytes of disk space per 1 runnung WU (limit set to 100 MB)
Einstein@Home use ~ 300 kb (limit set to 100 MB)

2. This limit also is the watchdog for bad jobs (wich not rare in R@H). To prevent sutuation when 1 task looped when writing to the disk could not flush by garbage data all BOINC disk space and block normal work of entire BOINC client.

So my suggestion for this value for R@H situation ~ 1000 Mb
ID: 74445 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 74446 - Posted: 16 Nov 2012, 18:29:39 UTC

Thanks for the suggestions!

I updated it to 1 gig. I'll keep an eye out for how the clients/machines deal with this change.
ID: 74446 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Link
Avatar

Send message
Joined: 4 May 07
Posts: 356
Credit: 382,349
RAC: 0
Message 74448 - Posted: 16 Nov 2012, 18:43:03 UTC - in response to Message 74445.  
Last modified: 16 Nov 2012, 18:45:04 UTC

1. This is not just limit, but space allocation instruction for BOINC too.
Description from BOINC site (http://boinc.berkeley.edu/trac/wiki/JobIn#

Resource estimates and bounds
.......
rsc_disk_bound
A bound on the maximum disk space used by the job, including all input, temporary, and output files. The job will only be sent to hosts with at least this much available disk space. If this bound is exceeded, the job will be aborted.

So if you rise this limit by x100 times from current 300 Mb = 30 Gb, most of BOINC clients can not run such tasks at all (Because typical limits and the limits set by default for entire BOINC client is usual less than this value, for SSD disks 30 Gb of free space may be a problem too)

Does not seem to work like that, I just got two of those 30GB WUs while having 5.1GB "free, available to BOINC". My entire system partition, where also BOINC stores it's data is just 27GB. So the bold marked part seems to be wrong.

But with 1GB we should be fine too.
.
ID: 74448 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 74449 - Posted: 16 Nov 2012, 20:14:49 UTC - in response to Message 74448.  

1. This is not just limit, but space allocation instruction for BOINC too.
Description from BOINC site (http://boinc.berkeley.edu/trac/wiki/JobIn#

Resource estimates and bounds
.......
rsc_disk_bound
A bound on the maximum disk space used by the job, including all input, temporary, and output files. The job will only be sent to hosts with at least this much available disk space. If this bound is exceeded, the job will be aborted.

So if you rise this limit by x100 times from current 300 Mb = 30 Gb, most of BOINC clients can not run such tasks at all (Because typical limits and the limits set by default for entire BOINC client is usual less than this value, for SSD disks 30 Gb of free space may be a problem too)

Does not seem to work like that, I just got two of those 30GB WUs while having 5.1GB "free, available to BOINC". My entire system partition, where also BOINC stores it's data is just 27GB. So the bold marked part seems to be wrong.

But with 1GB we should be fine too.


let us know if these jobs get aborted by the client as we'd expect.
ID: 74449 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mad_Max

Send message
Joined: 31 Dec 09
Posts: 209
Credit: 25,983,050
RAC: 14,947
Message 74454 - Posted: 17 Nov 2012, 2:11:04 UTC

This is a how this parameter should work. Description from the official BOINC instructions. But i have not tested it personally.
It is possible that, in practice, this part is not working. Or this function (disk space allocation) has been removed in the new versions of BOINC (and devs just forgot to update the description in the manual).
ID: 74454 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Link
Avatar

Send message
Joined: 4 May 07
Posts: 356
Credit: 382,349
RAC: 0
Message 74465 - Posted: 18 Nov 2012, 8:43:56 UTC - in response to Message 74449.  

let us know if these jobs get aborted by the client as we'd expect.

Both running now together and as expected they are NOT aborted since the limit is not exceeded. BOINC v6.12.34.
.
ID: 74465 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Link
Avatar

Send message
Joined: 4 May 07
Posts: 356
Credit: 382,349
RAC: 0
Message 74477 - Posted: 19 Nov 2012, 18:40:02 UTC - in response to Message 74465.  

let us know if these jobs get aborted by the client as we'd expect.

Both running now together and as expected they are NOT aborted since the limit is not exceeded. BOINC v6.12.34.

... and both completed successfully.

abt_2CWY_1_abinitio_SAVE_ALL_OUT_64370_88_0
rb_11_16_34706_65259_h002__tr_IGNORE_THE_REST_14_10_64417_2_0


BTW, maybe you could make this thread a sticky and "unsticky" all the old ones.
.
ID: 74477 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bart

Send message
Joined: 8 Oct 11
Posts: 2
Credit: 476,125
RAC: 0
Message 74485 - Posted: 20 Nov 2012, 10:12:13 UTC

minirosette_3.4.5/vista home premium/32-bit

C:Documents and SettingsAll UsersApplication DataBOINCprojectsboinc.bakerlab.org_rosettaminirosetta_3.45_windows_intelx86.exe
C:Documents and SettingsAll UsersApplication DataBOINCprojectsboinc.bakerlab.org_rosettaminirosetta_graphics_3.43_windows_intelx86.exe
C:Documents and SettingsAll UsersApplication DataBOINCslots1minirosetta_3.45_windows_intelx86.exe

Kaspersky has blocked minirosetta because it attempted to access the user password area.

Note 3.4.5 but also 3.4.3 graphics. Dunno if this matters.

I suspended minirosetta and accept no new jobs until this gets resolved.
ID: 74485 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Link
Avatar

Send message
Joined: 4 May 07
Posts: 356
Credit: 382,349
RAC: 0
Message 74489 - Posted: 20 Nov 2012, 12:41:59 UTC - in response to Message 74485.  

Have you tried to exclude the entire C:Documents and SettingsAll UsersApplication DataBOINC directory from scanning? That should usually solve any issues with antivirus software.
.
ID: 74489 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 74500 - Posted: 21 Nov 2012, 0:28:33 UTC

the rb_11_19_34348 etc task that is running on my computer is now on 10hrs computing with another hour to go which is in violation my 6hr limit. It should upload later on today.
ID: 74500 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Polian
Avatar

Send message
Joined: 21 Sep 05
Posts: 152
Credit: 10,141,266
RAC: 0
Message 74507 - Posted: 21 Nov 2012, 17:14:43 UTC - in response to Message 74500.  

the rb_11_19_34348 etc task that is running on my computer is now on 10hrs computing with another hour to go which is in violation my 6hr limit. It should upload later on today.


I've noticed that happens on jobs with, for example, this line in sterr_out:

BOINC:: CPU time: 29087.1s, 14400s + 14400s[2012-11-16 9:59: 0:] :: BOINC

In this case, the user's pref is 14400 seconds. It's been doubled.
ID: 74507 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 74521 - Posted: 23 Nov 2012, 0:41:46 UTC

This one failed after 8 sec.

hyb_am_bench_4F3Q_SAVE_ALL_OUT_IGNORE_THE_REST_64953_34_0

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=495910136


Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev52077.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/input_hyb_am_bench_4F3Q_yfsong.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.

ERROR: can't open file: minirosetta_database//sampling/filtered.vall.dat.2006-05-05
ERROR:: Exit from: src/core/fragment/picking_old/vall/vall_io.cc line: 63
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish

</stderr_txt>
]]>

ID: 74521 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
1 · 2 · 3 · 4 · Next

Message boards : Number crunching : Mini Rosetta 3.45



©2024 University of Washington
https://www.bakerlab.org