Message boards : Number crunching : Mini Rosetta 3.45
Author | Message |
---|---|
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,281,015 RAC: 1,790 |
I now have three 3.45 workunits on my 64-bit Windows 7 computer. Two just gave computation errors; that may be the zdock problem mentioned in another thread. They show some progress, and do not automatically start the screensaver even if not requested (like 3.43 workunits do). |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,281,015 RAC: 1,790 |
The two failed workunits gave this error: <core_client_version>7.0.28</core_client_version> <![CDATA[ <message> Maximum disk usage exceeded </message> Could you tell us just how much disk space these workunits are trying to use, so that those of us with extra disk space can decide whether it is reasonable to allow BOINC to use more disk space? |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,281,015 RAC: 1,790 |
How do you calculate the disk space limit for Rosetta@Home? For example, I allow BOINC to use 50 GB of disk space on one of my computers, but it is divided among 23 BOINC projects. Would it be reasonable to make the next version of minirosetta add a little more information to the error message about running out of disk space, such as just how much disk space the workunit is trying to use and the limit Rosetta@Home is allowed to use? |
Mad_Max Send message Joined: 31 Dec 09 Posts: 209 Credit: 25,983,050 RAC: 14,947 |
AFAIK this error (Maximum disk usage exceeded) pop-up then app hit disk limit set to one individual WU. Not total BOINC disk usage limit. I explore some of last WUs on my account and found that R@H set this limit to 300 Мб per WU For example:
300 Мб sometimes is not enough, because about 170 MB of disk space is always used by minirosetta_database (decompressed for each task), leaving for all other files (input, temporary, output) ~ 130 Mb. If you got WU with large input or/and output files (>130 Mb total in decopressed state) you will catch this error. So we cannot do anything on your client side. It can be corrected only on the server side (the simplest way would be to raise this limit for all new generated WUs, best way - modify app to use only one instance of database, instead of unpacking it to a work folder of each WU) |
Link Send message Joined: 4 May 07 Posts: 356 Credit: 382,349 RAC: 0 |
So we cannot do anything on your client side. That's not really true, you can always patch this limit in your client_state.xml, but we really should not need to do things like that. It has been pointed out to the project staff already in the 3.41 thread when the zdock WUs were failing with that error. I don't understand why rosetta is using such small value there, other projects (I run) have up to over 1000 times more than they actually need. . |
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1018 Credit: 4,334,829 RAC: 0 |
Thanks for pointing this out. I can't express enough how helpful you all are. This is definitely a problem, particularly from our last application update since we included a very big database file which has been removed in our current application package. I will update our default settings. Thanks again! David Kim |
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1018 Credit: 4,334,829 RAC: 0 |
The new default setting is x100 the previous value now. Newly submitted jobs from our research group will have this change but it will take some time for the current jobs in the queue to flush out. |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
This ran for over 8hrs on my 4hr limit then this happened. ================================ hyb_ac_bench_3rojD_10_SAVE_ALL_OUT_IGNORE_THE_REST_54834_171_1 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=492799306 BOINC:: Setting up shared resources ... ok. BOINC:: Setting up semaphores ... ok. BOINC:: Updating status ... ok. BOINC:: Registering timer callback... ok. BOINC:: Worker initialized successfully. Registering options.. Registered extra options. Initializing broker options ... Registered extra options. Initializing core... Initializing options.... ok Options::initialize() Options::adding_options() Options::initialize() Check specs. Options::initialize() End reached Loaded options.... ok Processed options.... ok Initializing random generators... ok Initialization complete. Setting WU description ... Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev52077.zip Unpacking WU data ... Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/input_hyb_ac_bench_3rojD_10_yfsong.zip Setting database description ... Setting up checkpointing ... Setting up graphics native ... BOINC:: Worker startup. Starting watchdog... Watchdog active. # cpu_run_time_pref: 14400 BOINC:: CPU time: 29087.1s, 14400s + 14400s[2012-11-16 9:59: 0:] :: BOINC WARNING! cannot get file size for default.out.gz: could not open file. Output exists: default.out.gz Size: -1 InternalDecoyCount: 0 (GZ) ----- 0 ----- Stream information inconsistent. Writing W_0000001 ====================================================== DONE :: 1 starting structures 29087.1 cpu seconds This process generated 1 decoys from 1 attempts ====================================================== called boinc_finish SIGSEGV: segmentation violation Stack trace (20 frames): [0xafed447] [0xf77c3400] [0xa44e8ce] [0xa363c49] [0xa073900] [0xa084eb3] [0xa086655] [0x980230d] [0x9831f83] [0x9834e85] [0x892dfca] [0x866eaea] [0x97b619f] [0x97bde49] [0x9979352] [0x99d8ed5] [0x99d6705] [0x80547cc] [0xb07d7e8] [0x8048131] Exiting... </stderr_txt> <message> upload failure: <file_xfer_error> <file_name>hyb_ac_bench_3rojD_10_SAVE_ALL_OUT_IGNORE_THE_REST_54834_171_1_0</file_name> <error_code>-161</error_code> </file_xfer_error> </message> ]]> |
Mad_Max Send message Joined: 31 Dec 09 Posts: 209 Credit: 25,983,050 RAC: 14,947 |
The new default setting is x100 the previous value now. Newly submitted jobs from our research group will have this change but it will take some time for the current jobs in the queue to flush out. Be careful with such aggressive settings! This may cause more problems than fixes: 1. This is not just limit, but space allocation instruction for BOINC too. Description from BOINC site (http://boinc.berkeley.edu/trac/wiki/JobIn#
So if you rise this limit by x100 times from current 300 Mb = 30 Gb, most of BOINC clients can not run such tasks at all (Because typical limits and the limits set by default for entire BOINC client is usual less than this value, for SSD disks 30 Gb of free space may be a problem too) Other DC projects can afford to set a limit like 100 times greater than that actually required because they using very small amounts of disk space (unlike the R@H). So even х100 or x500 is still small absolute values. For example: POEM@Home use only ~50-100 kilobytes of disk space per 1 runnung WU (limit set to 100 MB) Einstein@Home use ~ 300 kb (limit set to 100 MB) 2. This limit also is the watchdog for bad jobs (wich not rare in R@H). To prevent sutuation when 1 task looped when writing to the disk could not flush by garbage data all BOINC disk space and block normal work of entire BOINC client. So my suggestion for this value for R@H situation ~ 1000 Mb |
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1018 Credit: 4,334,829 RAC: 0 |
Thanks for the suggestions! I updated it to 1 gig. I'll keep an eye out for how the clients/machines deal with this change. |
Link Send message Joined: 4 May 07 Posts: 356 Credit: 382,349 RAC: 0 |
1. This is not just limit, but space allocation instruction for BOINC too. Does not seem to work like that, I just got two of those 30GB WUs while having 5.1GB "free, available to BOINC". My entire system partition, where also BOINC stores it's data is just 27GB. So the bold marked part seems to be wrong. But with 1GB we should be fine too. . |
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1018 Credit: 4,334,829 RAC: 0 |
1. This is not just limit, but space allocation instruction for BOINC too. let us know if these jobs get aborted by the client as we'd expect. |
Mad_Max Send message Joined: 31 Dec 09 Posts: 209 Credit: 25,983,050 RAC: 14,947 |
This is a how this parameter should work. Description from the official BOINC instructions. But i have not tested it personally. It is possible that, in practice, this part is not working. Or this function (disk space allocation) has been removed in the new versions of BOINC (and devs just forgot to update the description in the manual). |
Link Send message Joined: 4 May 07 Posts: 356 Credit: 382,349 RAC: 0 |
let us know if these jobs get aborted by the client as we'd expect. Both running now together and as expected they are NOT aborted since the limit is not exceeded. BOINC v6.12.34. . |
Link Send message Joined: 4 May 07 Posts: 356 Credit: 382,349 RAC: 0 |
let us know if these jobs get aborted by the client as we'd expect. ... and both completed successfully. abt_2CWY_1_abinitio_SAVE_ALL_OUT_64370_88_0 rb_11_16_34706_65259_h002__tr_IGNORE_THE_REST_14_10_64417_2_0 BTW, maybe you could make this thread a sticky and "unsticky" all the old ones. . |
Bart Send message Joined: 8 Oct 11 Posts: 2 Credit: 476,125 RAC: 0 |
minirosette_3.4.5/vista home premium/32-bit C:Documents and SettingsAll UsersApplication DataBOINCprojectsboinc.bakerlab.org_rosettaminirosetta_3.45_windows_intelx86.exe C:Documents and SettingsAll UsersApplication DataBOINCprojectsboinc.bakerlab.org_rosettaminirosetta_graphics_3.43_windows_intelx86.exe C:Documents and SettingsAll UsersApplication DataBOINCslots1minirosetta_3.45_windows_intelx86.exe Kaspersky has blocked minirosetta because it attempted to access the user password area. Note 3.4.5 but also 3.4.3 graphics. Dunno if this matters. I suspended minirosetta and accept no new jobs until this gets resolved. |
Link Send message Joined: 4 May 07 Posts: 356 Credit: 382,349 RAC: 0 |
Have you tried to exclude the entire C:Documents and SettingsAll UsersApplication DataBOINC directory from scanning? That should usually solve any issues with antivirus software. . |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
the rb_11_19_34348 etc task that is running on my computer is now on 10hrs computing with another hour to go which is in violation my 6hr limit. It should upload later on today. |
Polian Send message Joined: 21 Sep 05 Posts: 152 Credit: 10,141,266 RAC: 0 |
the rb_11_19_34348 etc task that is running on my computer is now on 10hrs computing with another hour to go which is in violation my 6hr limit. It should upload later on today. I've noticed that happens on jobs with, for example, this line in sterr_out: BOINC:: CPU time: 29087.1s, 14400s + 14400s[2012-11-16 9:59: 0:] :: BOINC In this case, the user's pref is 14400 seconds. It's been doubled. |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
This one failed after 8 sec. hyb_am_bench_4F3Q_SAVE_ALL_OUT_IGNORE_THE_REST_64953_34_0 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=495910136 Loaded options.... ok Processed options.... ok Initializing random generators... ok Initialization complete. Setting WU description ... Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev52077.zip Unpacking WU data ... Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/input_hyb_am_bench_4F3Q_yfsong.zip Setting database description ... Setting up checkpointing ... Setting up graphics native ... BOINC:: Worker startup. Starting watchdog... Watchdog active. ERROR: can't open file: minirosetta_database//sampling/filtered.vall.dat.2006-05-05 ERROR:: Exit from: src/core/fragment/picking_old/vall/vall_io.cc line: 63 BOINC:: Error reading and gzipping output datafile: default.out called boinc_finish </stderr_txt> ]]> |
Message boards :
Number crunching :
Mini Rosetta 3.45
©2024 University of Washington
https://www.bakerlab.org