Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 · 2 · 3 · 4 · 5 . . . 55 · Next
Author | Message |
---|---|
ukishun Send message Joined: 28 Apr 11 Posts: 4 Credit: 18,756 RAC: 0 |
I mentioned a task that was giving me trouble here: https://boinc.bakerlab.org/rosetta/forum_thread.php?id=5704 It eventually went away (I went to sleep, I have no idea what happened it to it) Now a similarly named WU: FOLD_N_DOCK_dagk_D2symm_SAVE_ALL_OUT_IGNORE_THE_REST_26520_4912 (https://boinc.bakerlab.org/rosetta/workunit.php?wuid=383890060) is giving me some trouble. when I suspend the task and restart the computer, the task restarts and goes back to 0%. Even after about 45 minutes of CPU runtime, I check the properties and CPU time at last checkpoint is blank. I don't know if this is 'normal', since it's been mentioned that some WUs take a long time to complete, but it seems like a waste of computing time if it just restarts. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
...when I suspend the task and restart the computer, the task restarts and goes back to 0%. Even after about 45 minutes of CPU runtime, I check the properties and CPU time at last checkpoint is blank. I don't know if this is 'normal', since it's been mentioned that some WUs take a long time to complete, but it seems like a waste of computing time if it just restarts. Yes, normal. Some tasks checkpoint more frequently then others. If your machine were in an environment where no progress is made after five attempts at restarting it, Rosetta automatically marks the task as completed and reports it back. So, such tasks never just get stuck on a machine that can never complete them. In your case, it just needs to run longer before it is going to be able to checkpoint and preserve the work completed up to that point. Rosetta Moderator: Mod.Sense |
SafeAggie Send message Joined: 22 Oct 05 Posts: 3 Credit: 458,414 RAC: 0 |
Client Error/Compute Error: FOLD_N_DOCK_dagk_D2symm_SAVE_ALL_OUT_IGNORE_THE_REST_26520_2442_0 resultid=420544634 wuid=383772100 CPU Time: 3,129.82 seconds Validate Error: ProteinG_abinitio_SAVE_ALL_OUT_design_relax_g038_003_25638_198_1 resultid=420536503 wuid=381625616 CPU Time: 2,695.49 seconds Validate Error: ProteinG_abinitio_SAVE_ALL_OUT_design_relax_g060_007_26528_2_0 resultid=420535967 wuid=383764150 CPU Time: 2,754.88 seconds |
Jesse Viviano Send message Joined: 14 Jan 10 Posts: 42 Credit: 2,700,472 RAC: 0 |
Please see my computer's list of work units. The two work units whose names start with "IF3_like_SAVE_ALL_OUT_relax_i091_26681_" get validate errors on my machine which is set to a 24 hour target work unit turnaround time, and one of the work units was resent to a machine which is set to a much shorter turnaround time. That machine's result validated fine. Could someone fix the validator or adjust the limit that finishes the work unit early to prevent validator problems for this series for those whose machines are set to a 24 hour turnaround time? |
googloo Send message Joined: 15 Sep 06 Posts: 133 Credit: 22,712,214 RAC: 4,046 |
Maybe somebody will pay attention to this thread. Hey, project people - there's been no work since yesterday afternoon. |
CBSX01 Send message Joined: 17 Dec 07 Posts: 11 Credit: 5,387,356 RAC: 0 |
Maybe somebody will pay attention to this thread. Hey, project people - there's been no work since yesterday afternoon. I know! I've got 3 PCs on my desk here at work and only 1 network cable (sorry, no personal switches or routers allowed). Got my preferences set to cache 3 days (for the long weekend) and hoping to fill them up with work to report on Tuesday. |
Plasmon_attack Send message Joined: 2 May 10 Posts: 13 Credit: 15,451,384 RAC: 0 |
Yeah, this seems to happen every once in a while, the queue runs down. I'm tempted to ramp up my queues to be over 5 days or more since it can take that long to get work filled back in. Given it's Friday I suspect our computers are all getting a break over the weekend and our RAC's will just have to eat it. |
edikl Send message Joined: 16 Jun 10 Posts: 10 Credit: 186,187 RAC: 0 |
It is normal that sometimes things go wrong. But it would be nice to hear a word from project administrators, that we have a problem (why and when it is predicted to be fixed). We, users, like to know that we are treated seriously :) |
Telescope Adrian Send message Joined: 14 Nov 06 Posts: 9 Credit: 1,906,378 RAC: 0 |
The answer is very simple , run work for other BOINC projects too , like SPINHENGE or WCG . |
CBSX01 Send message Joined: 17 Dec 07 Posts: 11 Credit: 5,387,356 RAC: 0 |
It is normal that sometimes things go wrong. But it would be nice to hear a word from project administrators, that we have a problem (why and when it is predicted to be fixed). We, users, like to know that we are treated seriously :) Precisely. Even if it's "Just cracked the first cold one. Back on Tuesday". And in reply to Mr. Telescope, yes, these things have happened in the past and we all know the routine. Going to to be attaching to POEM on all PCs but it would be nice know if it's going to be 10 minutes, hours or days... |
Samson Send message Joined: 23 May 11 Posts: 8 Credit: 257,870 RAC: 0 |
I thought I made the right choice with Rosetta but now I'm not so sure. I want a project that I can devote all my CPU time to. Rosetta seems to be the best as far as practical medical advances is concerned. I went as far as to contact 2 mods/admins yesterday; I haven't heard a peep. I'd like to get Rosetta running again. An update or catastrophe notice would be nice. |
TPCBF Send message Joined: 29 Nov 10 Posts: 111 Credit: 5,084,419 RAC: 1,974 |
I thought I made the right choice with Rosetta but now I'm not so sure.Don't hold you breath on getting any official info on what's wrong any time soon. Specially with the long weekend coming up here in the U.S. of A... I don't put all my eggs in one nest anymore since the "breakdown" over the change of year, I split my CPU time between Rosetta and World Community Grid, so far they don't have been down both at the same time and WGC is at least a prolific with their research as R@H... Ralf |
cnick6 Send message Joined: 30 May 06 Posts: 29 Credit: 12,597,623 RAC: 0 |
I also do multiple. The funny thing is Seti@home is down too. Just can't win. |
Samson Send message Joined: 23 May 11 Posts: 8 Credit: 257,870 RAC: 0 |
Work is starting to trickle in. Seems I got a few units about an hour ago. I could check the log but.... Anyway, some work is coming down the pipe. For me at least. |
rochester new york Send message Joined: 2 Jul 06 Posts: 2842 Credit: 2,020,043 RAC: 0 |
Work is starting to trickle in. yeah the queue increased by a few thousand |
Shawn Volunteer moderator Project developer Project scientist Send message Joined: 22 Jan 10 Posts: 17 Credit: 53,741 RAC: 0 |
Hey guys, I submitted a new job for MVH, which you can read about in the protein-protein interface thread if you're interested. This job is slightly different from the previous ones (it includes more stubs), so I wanted to do some extra checking to make sure that it wouldn't break anything. Hopefully, I'll get some jobs for Ebola targets later this week! |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
your validator or credit counting server must have crashed. i lost just over 100 pts average credit in 1 or 2 days!!!!! the results show nothing wrong with credits. so there must be something else. how can you lose 100 pts average credit in a couple of days???????? from about 525 to 425 starting on the line of may 27. today is the 29th |
TPCBF Send message Joined: 29 Nov 10 Posts: 111 Credit: 5,084,419 RAC: 1,974 |
your validator or credit counting server must have crashed.Got a bunch of WU stuck as "Pending credit" over night now too. Well, never a dull moment with R@H... :-( Ralf |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2124 Credit: 41,224,342 RAC: 11,119 |
your validator or credit counting server must have crashed. They're not lost - just saved up. When they do get awarded you'll have an inflated rac, then it'll drop again when that lump drops out. IYSWIM... But there is an intermittent problem with validation somewhere. Some of my team are getting credits straight away, some after a delay (not long but undetermined), one for much of the day. No biggie, but a kick may be in order... |
TPCBF Send message Joined: 29 Nov 10 Posts: 111 Credit: 5,084,419 RAC: 1,974 |
But there is an intermittent problem with validation somewhere. Some of my team are getting credits straight away, some after a delay (not long but undetermined), one for much of the day. No biggie, but a kick may be in order...I wouldn't call that an "intermittent" problem. A couple of WUs got validated during the day, but the list of "Pending credit" WU's just keeps getting longer... Hope that isn't like a balloon that gets slowly blown up until you get a big bang (again). Ralf |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2024 University of Washington
https://www.bakerlab.org