Message boards : Number crunching : Report stuck work units here
Previous · 1 · 2 · 3 · 4 · 5 · Next
Author | Message |
---|---|
Tern Send message Joined: 25 Oct 05 Posts: 576 Credit: 4,695,362 RAC: 9 |
Where would I look for the "slots" directories? C:Program FilesBOINCslots (or 1, 2, 3, 4...) I'm sure someone else will have this again soon if you can't grab it. |
Los Alcoholicos~La Muis Send message Joined: 4 Nov 05 Posts: 34 Credit: 1,041,724 RAC: 0 |
1hz6A_topology_sample_106743_0 is now at 1% after 14:25:00 hours (on a P4 ht 3.0Mhz) stderr.txt # ===================================== # random seed: 504801 # ===================================== stdout.txt 2005-12-20 07:00:08 :: BOINC :: boinc_init() command executed: projects/boinc.bakerlab.org_rosetta/rosetta_4.80_windows_intelx86.exe aa 1hz6 A -abrelax_mode -relax_score_filter -filter1 -110 -filter2 -145 -stringent_relax -more_relax_cycles -output_chi_silent -vary_omega -sim_aneal -rand_envpair_res_wt -rand_SS_wt -farlx -ex1 -ex2 -silent -barcode_from_fragments -barcode_from_fragments_length 10 -ssblocks -barcode_mode 3 -barcode_file 1hz6.top7_lowenergy.cst -jitter_frag -jitter_variation gauss -output_silent_gz -nstruct 10 [STR OPT]Default value for [-paths] paths.txt. [T/F OPT]Default FALSE value for [-unix_paths] -------------------------------------------- WARNING:: paths.txt file not found!! Setting all paths to . Using default fragment file names: aa*****03_05.200_v1_3 aa*****03_05.200_v1_3 -------------------------------------------- [T/F OPT]Default FALSE value for [-version] - - - - [T/F OPT]New TRUE value for [-jitter_frag] [REAL OPT]Default value for [-jitter_amount] 2 [STR OPT]New value for [-jitter_variation] gauss. score0 done: (best, low) rms 0 0 22.1686611 --------------------------------------------------------- score1 done: (best, low) rms (best,low) 19.9913731 15.6340599 15.2607765 14.5612974 standard trials: 2000 accepts: 666 %: 33.3 ----------------------------------------------------- Alternate score2/score5... kk score2 score5 low_score n_low_accept rms rms_min low_rms 0 29.008 29.008 29.008 17 14.561 10.744 14.561 [REAL OPT]Default value for [-cpu_frac] 0.100000001 [REAL OPT]Default value for [-frame_rate] 10 [REAL OPT]Default value for [-cpu_frac] 0.100000001 [REAL OPT]Default value for [-frame_rate] 10 [REAL OPT]Default value for [-cpu_frac] 0.100000001 [REAL OPT]Default value for [-frame_rate] 10 I will give it a another few hours (but I will make a copy of slot 1) before I abort it. [edit] To late... it just error out after 14:37:32 hour (Maximum cpu time exceeded) |
Mark Rush Send message Joined: 6 Oct 05 Posts: 13 Credit: 52,225,739 RAC: 9,518 |
Bill: Last night my WU hit an "unrecoverable error" and so was trashed. Sorry about not getting the slots directory copied. Mark |
Tern Send message Joined: 25 Oct 05 Posts: 576 Credit: 4,695,362 RAC: 9 |
I think the current application, that is having the "short" WU errors at the moment, has the fix to the "stuck at 1%" problem in it... Still, if anyone has this error now, whether from the same cause or a new one, I'm sure the staff would love whatever information anyone could get. Thanks everyone for what you've done to help out so far! |
Jack Schonbrun Send message Joined: 1 Nov 05 Posts: 115 Credit: 5,954 RAC: 0 |
I think the current application, that is having the "short" WU errors at the moment, has the fix to the "stuck at 1%" problem in it... Yes, we would be especially interested in cases of stuck at 1% that occur with 4.81. I know that these might be hard to notice while wading through the various other problems we've been having. |
Laurenu2 Send message Joined: 6 Nov 05 Posts: 57 Credit: 3,818,778 RAC: 0 |
I have been getting WU's that the time clock just stops and takes a reboot to get it going again I have had to reboot over 100 systems in the past 2 days My points per day is 1/2 what it is on a norm. I guess I should just shut down my network till you can solve this problem. As I have little time to babysit your client with the holly days here or just change to another project If You Want The Best You Must forget The Rest ---------------And Join Free-DC---------------- |
Tern Send message Joined: 25 Oct 05 Posts: 576 Credit: 4,695,362 RAC: 9 |
I have been getting WU's that the time clock just stops and takes a reboot to get it going again I have had to reboot over 100 systems in the past 2 days Several issues here. "Time clock just stops" is a new problem, if it's really a problem. Of course, with zero information from you on this, even though you have had it occur "over 100 times", it is hard to give any information. This is your FIRST posting on the issue. When the clock stops, is the status of the result by any chance "preempted"? And please, explain why a _reboot_ would be necessary? Are you sure that the problem isn't the OPERATING SYSTEM locking up, maybe because you're way overclocked, and not anything to do with Rosetta? If the problem is NOT as you describe, if the problem is instead the one being discussed in this thread, then you have had over 100 examples of something the project is asking for help to solve, yet you have not given the project any assistance. Instead you prefer to complain about the WU _names_ (in another thread) and now blame the project for what sounds like a problem on your end, or a total misunderstanding of the way the system works. In general, as much as I'm sure the project appreciates your (considerable) computer power, if you are only in this for the "points per day" and not to help the project, and expect the project to cater to your whims and jump to solve your problems, while you are unwilling to give the project any help in solving these problems, my _personal_ opinion is that you WOULD be happier with another project. Somewhere that the science would be less important, and you could get all the credits you want. I would suggest SETI. If you are here to volunteer your CPU time to a worthy effort, and not just to earn credits, then you need to start asking questions instead of jumping to conclusions. We are all happy to help anyone with a problem. |
Tern Send message Joined: 25 Oct 05 Posts: 576 Credit: 4,695,362 RAC: 9 |
Does anyone know which posting is "stretching" this thread? I see several that have long lines of text from stderr files, but none that I can say shouldn't have wrapped. If we can identify the posting, I can copy it and repost it and delete the original. If we can't identify it, I may create a new thread and start moving posts around until I can see which is the problem... |
Scribe Send message Joined: 2 Nov 05 Posts: 284 Credit: 157,359 RAC: 0 |
https://boinc.bakerlab.org/rosetta/forum_thread.php?id=680#6927 the long command line....maybe not....shrug... |
Webmaster Yoda Send message Joined: 17 Sep 05 Posts: 161 Credit: 162,253 RAC: 0 |
It's message 6479 and a few others that have the long command line wrapped in a <pre> element, which means the formatting will be preserved. Remove the <pre> and </pre> from those posts (or insert some line breaks) and they should wrap. *** Join BOINC@Australia today *** |
Tern Send message Joined: 25 Oct 05 Posts: 576 Credit: 4,695,362 RAC: 9 |
The following information was originally entered by River~~ in Message 6477 - Posted 16 Dec 2005 22:20:26 UTC - Last modified: 16 Dec 2005 22:25:44 UTC. The original has been moved to thread 750. Below is the original information, with formatting changes ONLY. BBCode is pretty limited - and apparently the 'pre' tag forces no-line-wrap. Same box, result = 1n0u__topology_sample_128114_0 |
Tern Send message Joined: 25 Oct 05 Posts: 576 Credit: 4,695,362 RAC: 9 |
Followup post, also from River~~, same wrap problem: Message 6479 - Posted 16 Dec 2005 22:57:44 UTC - Last modified: 16 Dec 2005 23:20:47 UTC Same box, result = 1n0u__topology_sample_128114_0 |
Paul D. Buck Send message Joined: 17 Sep 05 Posts: 815 Credit: 1,812,737 RAC: 0 |
if you are only in this for the "points per day" ... I would suggest SETI. CPDN is another good candidate based on my experienced CS/sec ... :) Before the last batch of optimized clients the CS/sec was nearly double that of other projects ... YMMV |
pieface Send message Joined: 20 Sep 05 Posts: 17 Credit: 797,661 RAC: 0 |
I think I have one of those 'stuck' WU's as well. I have 'suspended' rosetta for a bit, and took a full backup of the BOINC directory if you want it (or any part of it). Let me know if you want it aborted. Rosetta Version 481 [workunit: 1hz6a_abrelaxmode_test_20349] 1% complete CPU time: 6 hr 46 min 43 sec stage: Ab Initio Step: 2699 Accepted Rmsd: 14.14 Accepted energy: 29.42311 It's running on a P4 2ghz machine, win xp home sp2, BM 5.2.13, sharing 50/50 with einstein, and left in-memory when swapped. Both the cpu time and time to completion increased every 5 secs or so. The 'step' hasn't changed since i noticed it was having a problem. |
Laurenu2 Send message Joined: 6 Nov 05 Posts: 57 Credit: 3,818,778 RAC: 0 |
Well with a reply like this one accuseing me of just doing it for the points will do NOTHING but but push me way.I have been getting WU's that the time clock just stops and takes a reboot to get it going again I have had to reboot over 100 systems in the past 2 days If this what you want just say the word and I can pull the Plug . I do not over clock any onf my nodes the OS is Win ME the clock just stops I am sory I am new to this project and do not know how to get you the Info to get help I am just a DUM Plumber but that should be no reason to act in such a belitteling way If You Want The Best You Must forget The Rest ---------------And Join Free-DC---------------- |
Tern Send message Joined: 25 Oct 05 Posts: 576 Credit: 4,695,362 RAC: 9 |
Well with a reply like this one accuseing me of just doing it for the points will do NOTHING but but push me way. First, realize that _I_ am not "project staff" - I'm a volunteer participant just like you are. That tag to the left under my name says "forum moderator", not "project" anything. However, I volunteer my time to help people who have a problem and ask for help on these boards. I do not over clock any onf my nodes the OS is Win ME the clock just stops I am sory I am new to this project and do not know how to get you the Info to get help I am just a DUM Plumber but that should be no reason to act in such a belitteling way Ok. We are now getting some information from you, namely that you're on ME and the clock just stops. Can you narrow down _when_ the clock just stops? Is it when projects are switched? Are you running multiple projects? Do you have the preference "leave applications in memory when preempted" set to "yes"? (If not, please do so.) Are you running the graphics when this happens? Running any other programs on the system? The more info you give us, the better. I have no problem helping you if you ask for help. But if you come in with the "I'll just shut down if you don't solve your problem" attitude, then you're going to get attitude right back. There is NO reason to reboot a system because of ANY Rosetta problem. So the first step in solving this is to stop rebooting and instead, describe what is happening, copy/paste any messages from the Messages tab, and give us some information so we can begin to solve the problem. |
Laurenu2 Send message Joined: 6 Nov 05 Posts: 57 Credit: 3,818,778 RAC: 0 |
Well with a reply like this one accuseing me of just doing it for the points will do NOTHING but but push me way. Well maybe you should take a look at your style of help When propel come here looking for help or just expressing that they see as a problem they may not express them selfs in a clear or to the point manner. if this is a hard thing for you to handle perhaps you should stop giving help I did not come here to get insulted or to be made a fool of by you or to do damage to this project , Just to express things that I am having a problem with. If You Want The Best You Must forget The Rest ---------------And Join Free-DC---------------- |
Tern Send message Joined: 25 Oct 05 Posts: 576 Credit: 4,695,362 RAC: 9 |
if this is a hard thing for you to handle perhaps you should stop giving help I'll make a deal with you - I'll stop giving you help. There are plenty of others here that can do so if they choose. EDIT:: I just double-checked something. I know I said I'd stop helping, but... Windows ME is not supported by Rosetta. Seems it doesn't report CPU times back to the application correctly. |
River~~ Send message Joined: 15 Dec 05 Posts: 761 Credit: 285,578 RAC: 0 |
The following information was originally entered by River~~ ... yep, mea culpa! The bbcode [ pre ] translates directly to the html < pre > which preserves formatting. It can be important to know where the line breaks occur in a file, so as we were asked for lines to be posted form a file I used pre. It also stretches the page, so is less helpful if the thread turns into discussion rather than simply a place to 'upload' error files. Thanks for fixing it. |
Tern Send message Joined: 25 Oct 05 Posts: 576 Credit: 4,695,362 RAC: 9 |
It can be important to know where the line breaks occur in a file, so as we were asked for lines to be posted form a file I used pre. That's why I moved them rather than _just_ copying and re-pasting. :-) (Well, that and I firmly believe that a moderator should moderate as little as possible...) |
Message boards :
Number crunching :
Report stuck work units here
©2024 University of Washington
https://www.bakerlab.org