Please abort WUs with

Message boards : Number crunching : Please abort WUs with

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 9 · Next

AuthorMessage
Profile River~~
Avatar

Send message
Joined: 15 Dec 05
Posts: 761
Credit: 285,578
RAC: 0
Message 7406 - Posted: 23 Dec 2005, 20:09:49 UTC - in response to Message 7298.  

I just read a post that says windows ME is not supported by Rosetta. I have ME on my computer and Rosetta was doing fine with it until the 18th. Does it depend on the type of WU you receive as to working with ME?


As I understand it, Rosetta officially only *aims* to have its apps working on Win2k and later versions of windows.

Even so, I run one ME box on this project without any OS-specific problems.

The latest app runs as well for me as the earlier ones did, once I exclude the huge numbers of bad jobs we've had recently and which would not have run on any OS.

However ME is officially unsupported - so in future an app may come along which won't run under ME, and then me & thee will have no right to moan at the project programmers because it is us that are 'out of spec' not them.

Have another project, or a linux dual boot, ready for the day that that happens! I've got both lined up, me...

River~~
ID: 7406 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile O&O
Avatar

Send message
Joined: 11 Dec 05
Posts: 25
Credit: 66,900
RAC: 0
Message 7470 - Posted: 24 Dec 2005, 2:15:34 UTC - in response to Message 6910.  

A) ...

On 24/12/2005 02:01: Started downloading aa1dis2_09_5.400_v1_3.gz (3.08MB)

Since then, I received: "rosetta@home|Temporarily failed download of aa1di2_09_05.200_v1_3.gz: error 500 X" ... about 24 times!

Nevertheless, ...
On 24/12/2005 03:27: Finished downloading the last peice of it which was WU: DEFAULT_1di2_205_78_5

Q1) I have "suspend" working on this WU while it is in the "Ready to run" state, should I still "abort"?
Q2) 25 times of "error 500", are they "normal" communication problems between your "server" and my PC?


B)....

24/12/2005 01:13:48|rosetta@home|Starting result 1hz6A_topology_sample_207_14685_8 using rosetta version 481
24/12/2005 01:13:49|rosetta@home|Starting result 1hz6A_topology_sample_207_15735_8 using rosetta version 481
24/12/2005 01:14:34|rosetta@home|Unrecoverable error for result 1hz6A_topology_sample_207_14685_8 ( - exit code -1073741819 (0xc0000005))
24/12/2005 01:14:34|rosetta@home|Unrecoverable error for result 1hz6A_topology_sample_207_15735_8 ( - exit code -1073741819 (0xc0000005))
24/12/2005 03:38:50|rosetta@home|Resuming result 1hz6A_topology_sample_207_10781_7 using rosetta version 481
24/12/2005 03:39:25|rosetta@home|Unrecoverable error for result 1hz6A_topology_sample_207_10781_7 ( - exit code -1073741819 (0xc0000005))
24/12/2005 04:40:11|rosetta@home|Starting result 1hz6A_topology_sample_207_11598_1 using rosetta version 481
24/12/2005 04:40:23|rosetta@home|Unrecoverable error for result 1ogw__topology_sample_207_9040_6 ( - exit code -1073741819 (0xc0000005))
24/12/2005 04:40:23|rosetta@home|Computation for result 1ogw__topology_sample_207_9040_6 finished
24/12/2005 04:40:24|rosetta@home|Starting result 1hz6A_topology_sample_207_9621_7 using rosetta version 481
24/12/2005 04:40:50|rosetta@home|Unrecoverable error for result 1hz6A_topology_sample_207_11598_1 ( - exit code -1073741819 (0xc0000005))
24/12/2005 04:40:50|rosetta@home|Computation for result 1hz6A_topology_sample_207_11598_1 finished
24/12/2005 04:40:51|rosetta@home|Starting result 1ogw__topology_sample_207_12440_5 using rosetta version 481
24/12/2005 04:41:04|rosetta@home|Unrecoverable error for result 1hz6A_topology_sample_207_9621_7 ( - exit code -1073741819 (0xc0000005))
24/12/2005 04:41:04|rosetta@home|Computation for result 1hz6A_topology_sample_207_9621_7 finished
24/12/2005 04:41:05|rosetta@home|Starting result 1ogw__topology_sample_207_9064_3 using rosetta version 481
24/12/2005 04:41:38|rosetta@home|Unrecoverable error for result 1ogw__topology_sample_207_12440_5 ( - exit code -1073741819 (0xc0000005))
24/12/2005 04:41:38|rosetta@home|Computation for result 1ogw__topology_sample_207_12440_5 finished
24/12/2005 04:41:38|rosetta@home|Starting result 1ogw__topology_sample_204_2061_5 using rosetta version 481
24/12/2005 04:41:52|rosetta@home|Unrecoverable error for result 1ogw__topology_sample_207_9064_3 ( - exit code -1073741819 (0xc0000005))
24/12/2005 04:41:53|rosetta@home|Computation for result 1ogw__topology_sample_207_9064_3 finished
24/12/2005 04:41:53|rosetta@home|Starting result 1ogw__topology_sample_207_14480_7 using rosetta version 481
24/12/2005 04:42:29|rosetta@home|Unrecoverable error for result 1ogw__topology_sample_204_2061_5 ( - exit code -1073741819 (0xc0000005))
24/12/2005 04:42:29|rosetta@home|Computation for result 1ogw__topology_sample_204_2061_5 finished
24/12/2005 04:42:29|rosetta@home|Starting result 1ogw__topology_sample_207_9063_5 using rosetta version 481
24/12/2005 04:42:41|rosetta@home|Unrecoverable error for result 1ogw__topology_sample_207_14480_7 ( - exit code -1073741819 (0xc0000005))
24/12/2005 04:42:41|rosetta@home|Computation for result 1ogw__topology_sample_207_14480_7 finished
24/12/2005 04:42:41|rosetta@home|Starting result 1hz6A_topology_sample_207_5164_6 using rosetta version 481
24/12/2005 04:43:15|rosetta@home|Unrecoverable error for result 1ogw__topology_sample_207_9063_5 ( - exit code -1073741819 (0xc0000005))
24/12/2005 04:43:15|rosetta@home|Computation for result 1ogw__topology_sample_207_9063_5 finished
24/12/2005 04:43:15|rosetta@home|Starting result 1hz6A_topology_sample_207_15831_9 using rosetta version 481
24/12/2005 04:43:22|rosetta@home|Unrecoverable error for result 1hz6A_topology_sample_207_5164_6 ( - exit code -1073741819 (0xc0000005))
24/12/2005 04:43:22|rosetta@home|Computation for result 1hz6A_topology_sample_207_5164_6 finished
24/12/2005 04:43:22|rosetta@home|Starting result 1ogw__topology_sample_207_16196_6 using rosetta version 481
24/12/2005 04:43:56|rosetta@home|Unrecoverable error for result 1hz6A_topology_sample_207_15831_9 ( - exit code -1073741819 (0xc0000005))
24/12/2005 04:43:56|rosetta@home|Computation for result 1hz6A_topology_sample_207_15831_9 finished
24/12/2005 04:43:56|rosetta@home|Starting result 1dtj__abrelax_rand_len10_jit02_omega_sim_23401_1 using rosetta version 481
24/12/2005 04:44:10|rosetta@home|Unrecoverable error for result 1ogw__topology_sample_207_16196_6 ( - exit code -1073741819 (0xc0000005))

Q3) Are they related to your announced "peoblems"?
Q4) In the future, should my PC expect more of such WUs (Took more than 2 hours to download) so to have'em end with "Computational errors" in fractions of seconds?

Thank you.

O&O (UTC +3, Dial-up)
ID: 7470 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Tern
Avatar

Send message
Joined: 25 Oct 05
Posts: 576
Credit: 4,695,362
RAC: 9
Message 7471 - Posted: 24 Dec 2005, 2:28:08 UTC - in response to Message 7470.  
Last modified: 24 Dec 2005, 2:28:33 UTC

DEFAULT_1di2_205_78_5

Q1) I have "suspend" working on this WU while it is in the "Ready to run" state, should I still "abort"?


Yes... OR... given you are on dial-up, you _could_ let this result run, and then you will get credit for the time spent on it. It is "good" in terms of structure, it is only "bad" in that it runs extremely long, and eventually trips the "maximum CPU time" error. You will get 0 credit at first, then after the holidays, they will grant the credit manually.

Q2) 25 times of "error 500", are they "normal" communication problems between your "server" and my PC?


That does not sound normal at all. Error "500" is the "generic fallback" error that is reported when BOINC doesn't have a real error message. The servers have not been overloaded, from what I've seen.

24/12/2005 04:44:10|rosetta@home|Unrecoverable error for result 1ogw__topology_sample_207_16196_6 ( - exit code -1073741819 (0xc0000005))

Q3) Are they related to your announced "peoblems"?
Q4) In the future, should my PC expect more of such WUs (Took more than 2 hours to download) so to have'em end with "Computational errors" in fractions of seconds?


Yes, these are examples of the "short WUs" that error out quickly due to a random-number problem. They are supposed to be "almost gone" at this point, but we have been advising people on dial-up that they should Suspend Rosetta for a day or two, or at least until they can check these boards and verify that all the bad ones are gone. Given the length of time it took you to get these, I would personally run the DEFAULT_205 thing, _or_, suspend Rosetta and work on another project for a couple of days. There is no point in your spending that much time downloading, only to not only return no valid results, but also get no credit for it.

ID: 7471 · Rating: 1 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile O&O
Avatar

Send message
Joined: 11 Dec 05
Posts: 25
Credit: 66,900
RAC: 0
Message 7475 - Posted: 24 Dec 2005, 2:51:54 UTC - in response to Message 7471.  
Last modified: 24 Dec 2005, 3:04:25 UTC

Thank you BM for your swift response ... much appreciated.
One more question if you can answer ... please ...

Q5)Not mentioning the time it took me to download'em, I have in a "Reday to Run" status, 14 Default_xxxx_219_xxxx_x WUs, 9 Default_xxxx_218_xxxx_x and 1 Default_xxxx_221_xxxx_x.

Should I "abort"?

Edit: And this ... rather silly but I was wondering ...
Q6) I have one WU with the name ... BARCODE_FRAG_30_1n0u_221_42_0 ...,
is it related to the 3-dimensional shapes of proteins research to find cures for some major human diseases?

Regards,

O&O
ID: 7475 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Webmaster Yoda
Avatar

Send message
Joined: 17 Sep 05
Posts: 161
Credit: 162,253
RAC: 0
Message 7477 - Posted: 24 Dec 2005, 3:10:35 UTC - in response to Message 7475.  

Q5)Not mentioning the time it took me to download'em, I have in a "Reday to Run" status, 14 Default_xxxx_219_xxxx_x WUs, 9 Default_xxxx_218_xxxx_x and 1 Default_xxxx_221_xxxx_x.

Should I "abort"?


See the very first message in this thread:
'please ABORT any WUs whose names start with "DEFAULT_....._205_...." '

The ones you mention are not the 205 batch so don't need to be aborted.


*** Join BOINC@Australia today ***
ID: 7477 · Rating: -1 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Tern
Avatar

Send message
Joined: 25 Oct 05
Posts: 576
Credit: 4,695,362
RAC: 9
Message 7479 - Posted: 24 Dec 2005, 3:48:19 UTC - in response to Message 7475.  
Last modified: 24 Dec 2005, 3:49:04 UTC

Q5)Not mentioning the time it took me to download'em, I have in a "Reday to Run" status, 14 Default_xxxx_219_xxxx_x WUs, 9 Default_xxxx_218_xxxx_x and 1 Default_xxxx_221_xxxx_x.

Should I "abort"?

Edit: And this ... rather silly but I was wondering ...
Q6) I have one WU with the name ... BARCODE_FRAG_30_1n0u_221_42_0 ...,
is it related to the 3-dimensional shapes of proteins research to find cures for some major human diseases?


The "DEFAULT_xxx_218" (and up) is, unless it happens to be a "short WU" (which is unlikely, I think the problem was fixed by batch 218) should be good.

The "Barcode" part, I have no idea about. The WU names are sometimes discussed in the Science forums, but I haven't seen that one.

ID: 7479 · Rating: 1 · rate: Rate + / Rate - Report as offensive    Reply Quote
divyab

Send message
Joined: 20 Oct 05
Posts: 6
Credit: 0
RAC: 0
Message 7503 - Posted: 24 Dec 2005, 8:13:21 UTC - in response to Message 7475.  


Edit: And this ... rather silly but I was wondering ...
Q6) I have one WU with the name ... BARCODE_FRAG_30_1n0u_221_42_0 ...,
is it related to the 3-dimensional shapes of proteins research to find cures for some major human diseases?



(in the future, science questions like this will probably be more promptly addressed on one of the science threads...but since it seems like the WU's are stabalizing, i'll answer here....)

Barcode refers to a particular method we use when we try and accurately predict the protein's structure, as you guessed above. basically, we use this as a way to make sure that we are not missing some particular "features" when we are searching for the correct structure. a "barcode" might be for some particular feature (lets say, a kink in the chain), and has different "flavors" (kink at the beginning, kink in the middle, kink at the end, all 3, etc.). in the runs that say "barcode", we spread our search so that all the different flavors of certain features are evaluated before making our predictions.


ID: 7503 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
AKH54
Avatar

Send message
Joined: 8 Dec 05
Posts: 4
Credit: 1,812,208
RAC: 0
Message 7508 - Posted: 24 Dec 2005, 10:23:53 UTC
Last modified: 24 Dec 2005, 10:24:16 UTC

Does this mean you have to abort all wu starting with DEFAULT, or just the ones with 205.

I have been crunching for a few days now, and I have noticed I have 12 client errors. What does this mean? and is it normal to get so many errors. Might explan why my graph in the statistic tab is flatlined.

Also the people who have the little database of all their wu for different BOINC project in their replys

How can I get the same for my WU

Many Thanks

Alan
ID: 7508 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Charles Dennett
Avatar

Send message
Joined: 27 Sep 05
Posts: 102
Credit: 2,081,660
RAC: 513
Message 7517 - Posted: 24 Dec 2005, 12:59:29 UTC - in response to Message 7508.  
Last modified: 24 Dec 2005, 13:07:01 UTC

Does this mean you have to abort all wu starting with DEFAULT, or just the ones with 205.

I have been crunching for a few days now, and I have noticed I have 12 client errors. What does this mean? and is it normal to get so many errors. Might explan why my graph in the statistic tab is flatlined.

Also the people who have the little database of all their wu for different BOINC project in their replys

How can I get the same for my WU

Many Thanks

Alan


Just abort the DEFAULT 205 workunits. I thought all those were purged from the system by now so you should not see any of those. The problem with those, if I recall, was that they would keep on running and running (for 100 times longer than normal I think) and would hit a cpu time limit that each workunit has built in and then abort themselves. Even if you let it run you'd get no credit.

As for others, yes, some of those will abort themselves after running for only a short while. Do not abort those. It will help clean them out. When a WU is reported as a failure as these are, the boinc servers send it back out up to 10 times before finally giving up. Letting them run and abort will help clear them out of the system. I've got a bunch in my queue that will abort when they reach the top of the queue later today.

(How do I know that, you ask? A workunit name ends in _X where X is some single digit. The first time a WU is sent out X=0. If it fails for whatever reason and needs to be sent out again, then the X is changed to a 1. If it fails again and is sent out a third time, X is set to 2. See the pattern? I've got several where X is 5 or 6 or 7 or 8. I know those will abort after running for only a few seconds. But, and this is important, just because X is greater than 0 does not necessarily mean a WU will abort. For example, if a WU is sent out and is not returned by the deadline, it is sent out again with X=1. It could be a perfectly good WU. Also if someone aborts a WU maually or resets a project, the WUs in question will need to be resent and therfore the value of X for these will be greater than 0. So, jest let all these go and the system will do the right thing for you.)

Oh, I should say, if you are on a broadband connection you should have no problem. If, on the other hand, you are on dial-up, it may be best to simply suspend RAH and let boinc process on some other project(s). With all the traffic involed with uploading and downloading files for WUs that only run a short time, dial-up would be very inefficient (and expensive if you pay for connection time and/or number of bytes transferred.) The admins for the project are off for the holidays but will work on these problems whe they get back. They've been very responsive so far and I have no reason to doubt them.

As for the database of WU's some people have in their replies, there are several sites that collect the stats file from the various projects and make nice tables and graphs out of them. They also supply graphics for signatures. You end up adding a url that points to one of these sites and specifies your particular user id. I get mine from boincstats.com. They tell you how to set it up for their site in their FAQ at http://www.boincstats.com/page/faq.php#3. I'm sure other sites have similar instructions. The url for you signature is added to your forum preferences. Click on "your account" on the main page and then on "view or edit forum preferences".

Hope all this helps.

Charlie
-Charlie
ID: 7517 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Tern
Avatar

Send message
Joined: 25 Oct 05
Posts: 576
Credit: 4,695,362
RAC: 9
Message 7542 - Posted: 24 Dec 2005, 19:19:24 UTC - in response to Message 7517.  

Just abort the DEFAULT 205 workunits. I thought all those were purged from the system by now so you should not see any of those. The problem with those, if I recall, was that they would keep on running and running (for 100 times longer than normal I think) and would hit a cpu time limit that each workunit has built in and then abort themselves. Even if you let it run you'd get no credit.


One very minor addition to the excellent information Charlie has provided: While the DEFAULT_xxxx_205 WUs will report "error" and "0 credit", whether aborted or allowed to run, the project staff has said that when they return from the holidays and all of these have been 'flushed through' the system, they will go back and AWARD credit for any time you have spent on these before aborting or failing.

ID: 7542 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Fuzzy Hollynoodles
Avatar

Send message
Joined: 7 Oct 05
Posts: 234
Credit: 15,020
RAC: 0
Message 7626 - Posted: 25 Dec 2005, 20:19:35 UTC
Last modified: 25 Dec 2005, 20:20:30 UTC

I just aborted one:

12/25/2005 9:14:31 PM|rosetta@home|Unrecoverable error for result DEFAULT_2reb_205_39_3 (aborted by user)
12/25/2005 9:14:31 PM||Rescheduling CPU: result op
12/25/2005 9:14:32 PM||Rescheduling CPU: process exited
12/25/2005 9:14:32 PM|rosetta@home|Computation for result DEFAULT_2reb_205_39_3 finished
12/25/2005 9:14:36 PM||Rescheduling CPU: result op


after 2 hours and 0.7% finished (Was watching Monsters Inc. on tv! :-D )


[b]"I'm trying to maintain a shred of dignity in this world." - Me[/b]

ID: 7626 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Tern
Avatar

Send message
Joined: 25 Oct 05
Posts: 576
Credit: 4,695,362
RAC: 9
Message 7637 - Posted: 25 Dec 2005, 22:57:12 UTC - in response to Message 7626.  

I just aborted one


Fuzzy, can you give me a link to that one? I'm writing up a bunch of stuff for David when he returns, and I had thought all the 205's had flushed out by now... I know there are still _some_ of the "short WUs" around, because I just had two of them today. (Much lower percentage than a couple of days ago, of course.)

ID: 7637 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
AMDave

Send message
Joined: 16 Dec 05
Posts: 35
Credit: 12,576,896
RAC: 0
Message 7639 - Posted: 26 Dec 2005, 0:37:08 UTC - in response to Message 7637.  

I just aborted one


Fuzzy, can you give me a link to that one? I'm writing up a bunch of stuff for David when he returns, and I had thought all the 205's had flushed out by now... I know there are still _some_ of the "short WUs" around, because I just had two of them today. (Much lower percentage than a couple of days ago, of course.)


Bill:
I just aborted one as well. It hasn't begun processing and remains in my queue. Is this a link you can use? https://boinc.bakerlab.org/rosetta/workunit.php?wuid=3760755
ID: 7639 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Tern
Avatar

Send message
Joined: 25 Oct 05
Posts: 576
Credit: 4,695,362
RAC: 9
Message 7641 - Posted: 26 Dec 2005, 1:14:05 UTC - in response to Message 7639.  

I just aborted one as well. It hasn't begun processing and remains in my queue. Is this a link you can use? https://boinc.bakerlab.org/rosetta/workunit.php?wuid=3760755


Perfect. Bad news though. The first two people both let that one run completely, taking four days to get to you, you aborted it, that still leaves 8 more people to do it before it's "flushed".

Large caches kill us on things like this. The guy who got it first has 48 results on his system. Nothing _wrong_ with that, it just sure slows down getting bad WUs flushed through quick. If everybody else takes 4 days to get to this one, we'll be into February. So yes, the staff needs to "kill" these, can't rely on them being gone by the time they get back. That's the question I was trying to answer, it's just not the answer I hoped for. :-(

ID: 7641 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile rbpeake

Send message
Joined: 25 Sep 05
Posts: 168
Credit: 247,828
RAC: 0
Message 7642 - Posted: 26 Dec 2005, 2:55:28 UTC - in response to Message 7641.  

If everybody else takes 4 days to get to this one, we'll be into February. So yes, the staff needs to "kill" these, can't rely on them being gone by the time they get back. That's the question I was trying to answer, it's just not the answer I hoped for. :-(

They are like zombies, they can't seem to be killed and they keep rising up.....(Sorry, my daughter gave me "The Zombie Survival Guide" for Christmas, so I have as bad case of zombies on the mind right now.... ;)

Regards,
Bob P.
ID: 7642 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Webmaster Yoda
Avatar

Send message
Joined: 17 Sep 05
Posts: 161
Credit: 162,253
RAC: 0
Message 7643 - Posted: 26 Dec 2005, 3:10:24 UTC - in response to Message 7642.  
Last modified: 26 Dec 2005, 3:11:23 UTC

If everybody else takes 4 days to get to this one, we'll be into February. So yes, the staff needs to "kill" these, can't rely on them being gone by the time they get back. That's the question I was trying to answer, it's just not the answer I hoped for. :-(


Yeah, the 10 errors alllowed per WU is going to keep these (and the other bad WUs) circulating for some time.

If this can't be fixed in the scheduler, perhaps it can be fixed by deleting (or renaming) the directories on the project's server that these bad WUs are stored in.

It would result in download errors but it would reduce bandwidth usage, stop anyone crunching them and get them flushed out of the system quickly (assuming download errors add to the error count on the WU)
*** Join BOINC@Australia today ***
ID: 7643 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Paul D. Buck

Send message
Joined: 17 Sep 05
Posts: 815
Credit: 1,812,737
RAC: 0
Message 7648 - Posted: 26 Dec 2005, 7:46:26 UTC

Just an aside, I am, and have been doing a number of work units with the graphics beta application ... :)
ID: 7648 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile River~~
Avatar

Send message
Joined: 15 Dec 05
Posts: 761
Credit: 285,578
RAC: 0
Message 7665 - Posted: 26 Dec 2005, 16:01:58 UTC - in response to Message 7641.  

... The first two people both let that one run completely, taking four days to get to you, you aborted it, that still leaves 8 more people to do it before it's "flushed".

Large caches kill us on things like this. The guy who got it first has 48 results on his system. ... If everybody else takes 4 days to get to this one, we'll be into February. So yes, the staff needs to "kill" these, can't rely on them being gone by the time they get back.


In hindsight, with the project staff being away and with a high replication factor, better advice would be to suspend the WU for now (not the project, the individual result), and only abort it after the project tell us they have deleted the files from the server. Suspend would delay reissue, deletign the files would then prevent it.

In the event that there are still some out there, can I ask people to suspend for now, until the project people get back? People will have to make their own mind up whether to follow this or to go with Jack's request - after all as a project scientist he does rank me!

Also in hindsight, the fact that people were aborting these wholesale for a few hours explains why there were so many of the things around for a few hours.


That's the question I was trying to answer, it's just not the answer I hoped for. :-(


If you have your answer, do you still need reports of aborts, Bill? I aborted a couple on Christmas day - one that hadn't run and one that had clocked over 24hr before I noticed it! - I also clicked the wrong option on BOINCview and aborted several good WU from the cache on the same box :-( >doh<
ID: 7665 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Fuzzy Hollynoodles
Avatar

Send message
Joined: 7 Oct 05
Posts: 234
Credit: 15,020
RAC: 0
Message 7667 - Posted: 26 Dec 2005, 18:27:18 UTC - in response to Message 7637.  
Last modified: 26 Dec 2005, 18:37:23 UTC

I just aborted one


Fuzzy, can you give me a link to that one? I'm writing up a bunch of stuff for David when he returns, and I had thought all the 205's had flushed out by now... I know there are still _some_ of the "short WUs" around, because I just had two of them today. (Much lower percentage than a couple of days ago, of course.)


Yeah, sure! :-)

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=3760991

With the result: https://boinc.bakerlab.org/rosetta/result.php?resultid=5090070



River~~:
In hindsight, with the project staff being away and with a high replication factor, better advice would be to suspend the WU for now (not the project, the individual result), and only abort it after the project tell us they have deleted the files from the server. Suspend would delay reissue, deletign the files would then prevent it.


Very good idea! :-)


[b]"I'm trying to maintain a shred of dignity in this world." - Me[/b]

ID: 7667 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Tern
Avatar

Send message
Joined: 25 Oct 05
Posts: 576
Credit: 4,695,362
RAC: 9
Message 7677 - Posted: 26 Dec 2005, 20:37:40 UTC

I don't think any information is left to be gained on these... so no, _I_ certainly don't need to see anything. I don't see any reason the project would either, but I can't be 100% sure of that. I have a couple of examples now (thanks Fuzzy!) for my email to DK.

River, your suggestion is great; suspending the WU until the staff returns, rather than aborting it, would at least keep it from going to someone who hasn't read the boards, etc... I think Jack's "just abort them" was to prevent us from wasting our time crunching them. Suspending it accomplishes that, AND keeps someone else from wasting the time.

Now if I could only get one... :-(

ID: 7677 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 9 · Next

Message boards : Number crunching : Please abort WUs with



©2024 University of Washington
https://www.bakerlab.org