Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 298 · 299 · 300 · 301 · 302 · 303 · 304 . . . 308 · Next

AuthorMessage
Dr Who Fan
Avatar

Send message
Joined: 28 May 06
Posts: 79
Credit: 273,880
RAC: 361
Message 110057 - Posted: 23 Nov 2024, 22:30:26 UTC

Upload/Download SERVER(s) appear to be off-line again but the server status page is all green
11/23/2024 16:25:35 Internet access OK - project servers may be temporarily down.
Rosetta@home 11/23/2024 16:25:34 Backing off 01:14:05 on download of input_rb_11_23_642993_638479__t000__2_C1_robetta.zip
Rosetta@home 11/23/2024 16:25:34 Temporarily failed download of input_rb_11_23_642993_638479__t000__2_C1_robetta.zip: transient HTTP error
Rosetta@home 11/23/2024 16:25:34 Backing off 01:59:39 on download of flags_rb_11_23_642993_638479__t000__2_C1_robetta
Rosetta@home 11/23/2024 16:25:34 Temporarily failed download of flags_rb_11_23_642993_638479__t000__2_C1_robetta: transient HTTP error
Rosetta@home 11/23/2024 16:25:34 Backing off 02:07:26 on download of input_rb_11_23_642993_638479__t000__1_C1_robetta.zip
Rosetta@home 11/23/2024 16:25:34 Temporarily failed download of input_rb_11_23_642993_638479__t000__1_C1_robetta.zip: transient HTTP error
Rosetta@home 11/23/2024 16:25:34 Backing off 01:24:28 on download of flags_rb_11_23_642993_638479__t000__1_C1_robetta
Rosetta@home 11/23/2024 16:25:34 Temporarily failed download of flags_rb_11_23_642993_638479__t000__1_C1_robetta: transient HTTP error
11/23/2024 16:25:34 Project communication failed: attempting access to reference site
Rosetta@home 11/23/2024 16:25:33 Started download of input_rb_11_23_642993_638479__t000__2_C1_robetta.zip
Rosetta@home 11/23/2024 16:25:33 Started download of flags_rb_11_23_642993_638479__t000__2_C1_robetta
Rosetta@home 11/23/2024 16:25:33 Started download of input_rb_11_23_642993_638479__t000__1_C1_robetta.zip
Rosetta@home 11/23/2024 16:25:33 Started download of flags_rb_11_23_642993_638479__t000__1_C1_robetta


ID: 110057 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bryn Mawr

Send message
Joined: 26 Dec 18
Posts: 398
Credit: 12,294,748
RAC: 9,249
Message 110058 - Posted: 23 Nov 2024, 22:31:07 UTC - in response to Message 110055.  

ERROR: Error in protocols::cyclic_peptide_predict::SimpleCycpepPredictpplication::set_up_n_to_c_cyclization_mover() function: residue 1 does not have a LOWER_CONNECT.
Unfortunately, one of the usual ones.


Yes, presumably a definition error for the molecule being tested.
ID: 110058 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1722
Credit: 18,356,357
RAC: 25,250
Message 110059 - Posted: 23 Nov 2024, 23:32:06 UTC - in response to Message 110057.  

Upload/Download SERVER(s) appear to be off-line again but the server status page is all green
I'm not having any issues at all.
Grant
Darwin NT
ID: 110059 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Dr Who Fan
Avatar

Send message
Joined: 28 May 06
Posts: 79
Credit: 273,880
RAC: 361
Message 110060 - Posted: 24 Nov 2024, 0:41:49 UTC - in response to Message 110059.  

Upload/Download SERVER(s) appear to be off-line again but the server status page is all green
I'm not having any issues at all.

Did a manual retry a few minutes ago and they downloaded successfully.
ID: 110060 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2137
Credit: 41,518,559
RAC: 15,775
Message 110064 - Posted: 25 Nov 2024, 17:25:49 UTC - in response to Message 110060.  

Upload/Download SERVER(s) appear to be off-line again but the server status page is all green
I'm not having any issues at all.

Did a manual retry a few minutes ago and they downloaded successfully.

I didn't see it here at Rosetta, but for 7 or 10 days it was happening to everyone at WCG and each of 6 files per upload needed 5-10 tries on tasks that uploaded and downloaded 4-6 times as often.
If anything happened at Rosetta in that time it was lost among 40 files waiting to transfer to WCG at any one time.
ID: 110064 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bryn Mawr

Send message
Joined: 26 Dec 18
Posts: 398
Credit: 12,294,748
RAC: 9,249
Message 110066 - Posted: 25 Nov 2024, 19:06:16 UTC - in response to Message 110064.  

[
I didn't see it here at Rosetta, but for 7 or 10 days it was happening to everyone at WCG and each of 6 files per upload needed 5-10 tries on tasks that uploaded and downloaded 4-6 times as often.
If anything happened at Rosetta in that time it was lost among 40 files waiting to transfer to WCG at any one time.


Currently experiencing transient HTTPS errors on probably half of the downloads and this has been going on for maybe 4 days. Some downloads have taken 15 retries to clear.
ID: 110066 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2137
Credit: 41,518,559
RAC: 15,775
Message 110067 - Posted: 25 Nov 2024, 21:26:51 UTC - in response to Message 110066.  

I didn't see it here at Rosetta, but for 7 or 10 days it was happening to everyone at WCG and each of 6 files per upload needed 5-10 tries on tasks that uploaded and downloaded 4-6 times as often.
If anything happened at Rosetta in that time it was lost among 40 files waiting to transfer to WCG at any one time.


Currently experiencing transient HTTPS errors on probably half of the downloads and this has been going on for maybe 4 days. Some downloads have taken 15 retries to clear.

It's weird that I'm just as susceptible as anyone else to those errors coming from WCG, but don't see any here at Rosetta.
The only solution I know is manually retrying for as long as it takes
ID: 110067 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mmonnin

Send message
Joined: 2 Jun 16
Posts: 61
Credit: 25,390,629
RAC: 47,239
Message 110068 - Posted: 25 Nov 2024, 21:35:39 UTC

I have to retry all the time to download tasks here at Rosetta which is something new for Rosetta. Some retries work on the 1st attempt and others won't download after a dozen attempts. I've even aborted a task to download more work and those new ones will download. It's typically the smaller files from Rosetta that need reties.
ID: 110068 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1722
Credit: 18,356,357
RAC: 25,250
Message 110069 - Posted: 26 Nov 2024, 5:18:46 UTC

Still no signs of file transfer issues in my Event log, sounds like there is some sort of network issue between ISPs.
Grant
Darwin NT
ID: 110069 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bill Swisher

Send message
Joined: 10 Jun 13
Posts: 38
Credit: 34,878,935
RAC: 105,916
Message 110072 - Posted: 26 Nov 2024, 16:08:18 UTC - in response to Message 110069.  

transient http errors

As a snowbird I relocated earlier this month. Between the time I shutdown one computer, I pack this one in my checked baggage, and when I turned it on at the new location WCG started giving me errors. LOTS of errors on multiple computers. Thinking it was because I switched ISP's I diddled around with it a lot before I did some real testing. First I fired off the VPN and used a place in Europe as my gateway, no change, then I really got serious. I did a ssh connect to the computers back where I live most of the time. They were clogged up also. After about a week and a half things settled down and traffic to WCG went to normal. Then Rosetta hiccuped a few times. At the moment all seems to be OK.
ID: 110072 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
JLDun
Avatar

Send message
Joined: 31 May 08
Posts: 8
Credit: 73,164
RAC: 123
Message 110073 - Posted: 26 Nov 2024, 21:09:03 UTC

I was getting "transient" errors yesterday (on a phone, using Google Fiber for WiFi).
ID: 110073 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1722
Credit: 18,356,357
RAC: 25,250
Message 110076 - Posted: 27 Nov 2024, 9:24:47 UTC
Last modified: 27 Nov 2024, 9:26:41 UTC

Server Status is showing all green, but there is a Validation backlog starting to build up again...


Edit- and i've lost 2 Tasks to a Validation error.
Grant
Darwin NT
ID: 110076 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2137
Credit: 41,518,559
RAC: 15,775
Message 110078 - Posted: 27 Nov 2024, 15:19:46 UTC - in response to Message 110076.  

Server Status is showing all green, but there is a Validation backlog starting to build up again...

Edit- and I've lost 2 Tasks to a Validation error.

It didn't take too much longer
boinc-process is now being reported as down again
ID: 110078 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1722
Credit: 18,356,357
RAC: 25,250
Message 110079 - Posted: 29 Nov 2024, 19:29:50 UTC

Another small batch of Rosetta Tasks (20,000), boinc-process host is still dead.
Grant
Darwin NT
ID: 110079 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1722
Credit: 18,356,357
RAC: 25,250
Message 110080 - Posted: 30 Nov 2024, 8:47:52 UTC

And boinc-process host lives again, backlog mostly cleared.
Till the next time.
Grant
Darwin NT
ID: 110080 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1722
Credit: 18,356,357
RAC: 25,250
Message 110082 - Posted: 30 Nov 2024, 20:17:06 UTC
Last modified: 30 Nov 2024, 20:21:03 UTC

And another small batch of work released (25,000).
Current group of Tasks using 800MB to 1.2GB of RAM each, so very low RAM systems could be having problems.
Grant
Darwin NT
ID: 110082 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1722
Credit: 18,356,357
RAC: 25,250
Message 110090 - Posted: 3 Dec 2024, 5:02:06 UTC

1.7 million Tasks ready to send- hopefully there won't be too many that die within seconds of starting.
Grant
Darwin NT
ID: 110090 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2137
Credit: 41,518,559
RAC: 15,775
Message 110091 - Posted: 3 Dec 2024, 6:02:50 UTC - in response to Message 110090.  

1.7 million Tasks ready to send - hopefully there won't be too many that die within seconds of starting.

I wasn't expecting that.
The first downloads I had were 7-8hrs ago, so I'm clearing everything else down to make space for a full Rosetta cache
ID: 110091 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jean-David Beyer

Send message
Joined: 2 Nov 05
Posts: 195
Credit: 6,613,600
RAC: 9,094
Message 110092 - Posted: 3 Dec 2024, 14:57:05 UTC - in response to Message 110090.  

The ones I get that fail -- even before starting -- are canceled by the server. They send a task to someone who has not finished and not failed. Then tey send me one of te same thing. Then the first person completes the work unit. Then they cancel me. This is a rude process. They should not send me a task if they are still waiting for the first user to complete.

Workunit 1415251024
name 	rb_11_29_646130_639665__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_3010044_217
application 	Rosetta
created 	29 Nov 2024, 10:19:10 UTC
canonical result 	1590758128
granted credit 	339.97
minimum quorum 	1
initial replication 	1
max # of error/total/success tasks 	1, 2, 1
Task
click for details	Computer	Sent	Time reported
or deadline
explain	Status	Run time
(sec)	CPU time
(sec)	Credit	Application
1590758128 	3773674 	29 Nov 2024, 10:20:20 UTC 	2 Dec 2024, 11:52:45 UTC 	Completed and validated 	19,549.58 	19,469.61 	339.97 	Rosetta v4.20
windows_x86_64
1590852975 	5910575 	2 Dec 2024, 10:20:27 UTC 	2 Dec 2024, 12:15:32 UTC 	Cancelled by server 	0.00 	0.00 	--- 	Rosetta v4.20
x86_64-pc-linux-gnu

ID: 110092 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1233
Credit: 14,338,560
RAC: 2,994
Message 110093 - Posted: 3 Dec 2024, 16:24:09 UTC - in response to Message 110092.  
Last modified: 3 Dec 2024, 16:27:24 UTC

The ones I get that fail -- even before starting -- are canceled by the server. They send a task to someone who has not finished and not failed. Then tey send me one of te same thing. Then the first person completes the work unit. Then they cancel me. This is a rude process. They should not send me a task if they are still waiting for the first user to complete.

Would you prefer that they cancel workunits after they start? They obviously want one copy to finish soon, and may not have information on whether the first one will ever finish.

I think, though, that they will let the second one finish and give it credit if it starts before the first one finishes,
ID: 110093 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 298 · 299 · 300 · 301 · 302 · 303 · 304 . . . 308 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org