Upload errors.

Message boards : Number crunching : Upload errors.

To post messages, you must log in.

AuthorMessage
entigy

Send message
Joined: 2 Nov 05
Posts: 5
Credit: 990,830
RAC: 0
Message 87201 - Posted: 5 Sep 2017, 7:26:03 UTC

This.

05/09/2017 08:22:12 | Rosetta@home | Started upload of rb_09_04_77236_119997__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_514924_697_0_r181725287_0
05/09/2017 08:22:12 | Rosetta@home | Started upload of cdfdc_SOL_jumping_SAVE_ALL_OUT_514877_4989_0_r1586011663_0
05/09/2017 08:22:14 | Rosetta@home | [error] Error reported by file upload server: [rb_09_04_77236_119997__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_514924_697_0_r181725287_0] locked by file_upload_handler PID=255
05/09/2017 08:22:14 | Rosetta@home | [error] Error reported by file upload server: [cdfdc_SOL_jumping_SAVE_ALL_OUT_514877_4989_0_r1586011663_0] locked by file_upload_handler PID=255
05/09/2017 08:22:14 | Rosetta@home | Temporarily failed upload of rb_09_04_77236_119997__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_514924_697_0_r181725287_0: transient upload error
05/09/2017 08:22:14 | Rosetta@home | Backing off 05:09:59 on upload of rb_09_04_77236_119997__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_514924_697_0_r181725287_0
05/09/2017 08:22:14 | Rosetta@home | Temporarily failed upload of cdfdc_SOL_jumping_SAVE_ALL_OUT_514877_4989_0_r1586011663_0: transient upload error
05/09/2017 08:22:14 | Rosetta@home | Backing off 04:08:07 on upload of cdfdc_SOL_jumping_SAVE_ALL_OUT_514877_4989_0_r1586011663_0
05/09/2017 08:22:15 | Rosetta@home | Started upload of 95dbf_SOL_jumping_SAVE_ALL_OUT_514877_4988_0_r1180513244_0
05/09/2017 08:22:16 | Rosetta@home | [error] Error reported by file upload server: [95dbf_SOL_jumping_SAVE_ALL_OUT_514877_4988_0_r1180513244_0] locked by file_upload_handler PID=-1
05/09/2017 08:22:16 | Rosetta@home | Temporarily failed upload of 95dbf_SOL_jumping_SAVE_ALL_OUT_514877_4988_0_r1180513244_0: transient upload error
05/09/2017 08:22:16 | Rosetta@home | Backing off 00:16:02 on upload of 95dbf_SOL_jumping_SAVE_ALL_OUT_514877_4988_0_r1180513244_0
ID: 87201 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
googloo
Avatar

Send message
Joined: 15 Sep 06
Posts: 133
Credit: 22,711,732
RAC: 4,070
Message 87204 - Posted: 5 Sep 2017, 12:04:32 UTC - in response to Message 87201.  

Same here.
ID: 87204 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1994
Credit: 9,613,375
RAC: 9,035
Message 87205 - Posted: 5 Sep 2017, 12:05:11 UTC - in response to Message 87201.  

[quote]rb_09_04_77236_119997__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_514924_697_0_r181725287_0: transient upload error[/quote

+1
ID: 87205 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Warped

Send message
Joined: 15 Jan 06
Posts: 48
Credit: 1,788,185
RAC: 0
Message 87207 - Posted: 5 Sep 2017, 15:41:32 UTC

Same for me.

Tue 05 Sep 2017 17:22:55 SAST | Rosetta@home | Started upload of rb_09_03_77151_119988_ab_stage0_t000___robetta_IGNORE_THE_REST_05_09_514908_71_0_r1550815990_0
Tue 05 Sep 2017 17:22:59 SAST | Rosetta@home | [error] Error reported by file upload server: [rb_09_03_77151_119988_ab_stage0_t000___robetta_IGNORE_THE_REST_05_09_514908_71_0_r1550815990_0] locked by file_upload_handler PID=255
Tue 05 Sep 2017 17:22:59 SAST | Rosetta@home | Temporarily failed upload of rb_09_03_77151_119988_ab_stage0_t000___robetta_IGNORE_THE_REST_05_09_514908_71_0_r1550815990_0: transient upload error
Tue 05 Sep 2017 17:22:59 SAST | Rosetta@home | Backing off 04:48:33 on upload of rb_09_03_77151_119988_ab_stage0_t000___robetta_IGNORE_THE_REST_05_09_514908_71_0_r1550815990_0
ID: 87207 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
JohnH

Send message
Joined: 25 Mar 13
Posts: 43
Credit: 2,319,355
RAC: 0
Message 87209 - Posted: 5 Sep 2017, 18:04:32 UTC
Last modified: 5 Sep 2017, 18:13:10 UTC

Me too
9/5/2017 6:55:09 PM | Rosetta@home | [error] Error reported by file upload server: [6ea159939ab8b54604c3f5fbeadf8c01_C2_docking_big_job_17_08_03_12_20_globalDocking_0_SAVE_ALL_OUT_510695_8_0_r1334300603_0] locked by file_upload_handler PID=255
9/5/2017 6:55:09 PM | Rosetta@home | Temporarily failed upload of 6ea159939ab8b54604c3f5fbeadf8c01_C2_docking_big_job_17_08_03_12_20_globalDocking_0_SAVE_ALL_OUT_510695_8_0_r1334300603_0: transient upload error
9/5/2017 6:55:09 PM | Rosetta@home | Backing off 05:19:22 on upload of 6ea159939ab8b54604c3f5fbeadf8c01_C2_docking_big_job_17_08_03_12_20_globalDocking_0_SAVE_ALL_OUT_510695_8_0_r1334300603_0
ID: 87209 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile shanen
Avatar

Send message
Joined: 16 Apr 14
Posts: 195
Credit: 12,662,308
RAC: 0
Message 87210 - Posted: 5 Sep 2017, 19:32:24 UTC - in response to Message 87201.  

I have four of them on two machines, making a total of 8 tasks (AKA units) stuck in the "Uploading" status on the Tasks tab. I didn't see how the second machine got there, but on the first machine there was definitely an interval when only three units were stuck, and the fourth was added later. On the machine I can see now, all four of them are from different projects, with completion times from 4 to 8 hours.

On the Transfers tab, the "Upload: retry in ..." times vary from 2 to 5 hours. Using the Retry Now button individually or collectively fails after about 2 seconds of "active" status.

Not a new problem, but I think this is the first time I've seen it since the major server upgrade a few weeks back.

There was at least one other peculiar behavior, but since the one I can recall right now involves the arbitrary and meaningless deadlines, I file it under "C'est la vie." At least the deadlines continue to appear arbitrary and without meaning from my perspective as a volunteer or donor... Their only significance is the demotivating feeling of well-intended contributions tossed in the bit bucket, which may happen to these frozen-in-Uploading units, too. Sometimes I feel like instead of saying "C'est la vie" I should be saying "Cela signifie la guerre!" (At least in this case I understand the circumstances which caused the deadlines to be missed, so I can basically dismiss the lost hours as a one-time failure affecting 2.5 machines.)

Ah. Just finished another task (AKA work unit) on this machine, and it went to the "Ready to report" status. Going to the Projects tab and clicking Update works as expected. The newly finished task disappears, and the four "Uploading" tasks remain unaffected. Clicking on Retry Now from the Transfers tab fails.
#1 Freedom = (Meaningful - Constrained) Choice{5} != (Beer^3 | Speech)
ID: 87210 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile shanen
Avatar

Send message
Joined: 16 Apr 14
Posts: 195
Credit: 12,662,308
RAC: 0
Message 87211 - Posted: 5 Sep 2017, 20:21:55 UTC - in response to Message 87210.  

Now up to 5 "Uploading" tasks frozen on this machine with total work time over 30 hours. Doesn't appear to be an immediate threat of lost credit, since the earliest deadline is the 11th, but looking at the other machine... It has also increased to 5 making a total of 10 jammed units. Checked two other machines at hand, and they don't have any yet.
#1 Freedom = (Meaningful - Constrained) Choice{5} != (Beer^3 | Speech)
ID: 87211 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1994
Credit: 9,613,375
RAC: 9,035
Message 87212 - Posted: 5 Sep 2017, 20:44:52 UTC
Last modified: 5 Sep 2017, 20:45:03 UTC

Maybe a restart of server's daemons.....
ID: 87212 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2124
Credit: 41,219,446
RAC: 10,842
Message 87217 - Posted: 6 Sep 2017, 1:33:12 UTC - in response to Message 87210.  

On the Transfers tab, the "Upload: retry in ..." times vary from 2 to 5 hours. Using the Retry Now button individually or collectively fails after about 2 seconds of "active" status.

Not a new problem, but I think this is the first time I've seen it since the major server upgrade a few weeks back.

Same.

If I remember correctly, a solution was found last time on the server side - that shouldn't have solved anything, but it did. If someone searches for "transient" I think that solution will be found.
ID: 87217 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile shanen
Avatar

Send message
Joined: 16 Apr 14
Posts: 195
Credit: 12,662,308
RAC: 0
Message 87219 - Posted: 6 Sep 2017, 2:24:16 UTC - in response to Message 87217.  

Well, I can report that it has spread to two more computers.

Since I had changed my network configuration recently, I went ahead and tested one machine with an alternate routing. Much slower connection, but no apparent effects on the problem.

There seems to be some inconsistency in how quickly a "Retry Now" from the Transfer tab fails. On this machine, it goes back to the "Upload: retry in ..." status quite quickly, just a few seconds. Other machines remain in "Upload: active" status for a long time.

Quite annoying, but not surprising or anything... Which leads to the long history of project struggles. I do recall something about a network security configuration problem at the Baker Lab sign. If this is similar, then it took them about a week to communicate the nature of the problem to the university's network people and get the fix, whatever it was.
#1 Freedom = (Meaningful - Constrained) Choice{5} != (Beer^3 | Speech)
ID: 87219 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
JohnH

Send message
Joined: 25 Mar 13
Posts: 43
Credit: 2,319,355
RAC: 0
Message 87222 - Posted: 6 Sep 2017, 9:26:13 UTC
Last modified: 6 Sep 2017, 9:27:56 UTC

All seems to be working for me this morning.
Units finished and retry queued on Sep 5th were credited on the 6th.
One way arrow of time strikes again ... oh well.
Anybody know that happened/resolution?
ID: 87222 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2124
Credit: 41,219,446
RAC: 10,842
Message 87224 - Posted: 6 Sep 2017, 9:56:00 UTC - in response to Message 87222.  

All seems to be working for me this morning.
Units finished and retry queued on Sep 5th were credited on the 6th.
One way arrow of time strikes again ... oh well.
Anybody know that happened/resolution?

Mine cleared up at almost exactly the same time - 2 minutes prior to this post
ID: 87224 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 87226 - Posted: 6 Sep 2017, 18:49:54 UTC

"Which leads to the long history of project struggles."

We've been up since late June with the new server front and back ends without any significant issues (knock on wood). This has been the first significant issue since and may or may not have been related to network and power instability here at the UW recently. The file locking logic in the upload handler started to fail for the majority of upload requests. We rebooted the web servers and filesystem but that didn't fix the issue. We had to modify the source code to comment out the file locking logic and rebuild the upload handler. This appears to have fixed the issue. The file locking logic is not necessary for our system and things appear to be back to normal.

On a positive note, I think our project has a long history of success including research from our lab being runner-up to Science magazine's breakthrough of 2016 for protein design, success in using co-evolution sequence data from meta-genomes to determine new protein structures at a cost significantly less than structural genomics initiatives, and designing/modeling small cyclic peptides with non-canonical amino acids (much of which was modeled on mobile android devices), and more.
ID: 87226 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
JohnH

Send message
Joined: 25 Mar 13
Posts: 43
Credit: 2,319,355
RAC: 0
Message 87229 - Posted: 7 Sep 2017, 8:58:33 UTC - in response to Message 87226.  

Thanks for the update. :-)
ID: 87229 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile shanen
Avatar

Send message
Joined: 16 Apr 14
Posts: 195
Credit: 12,662,308
RAC: 0
Message 87233 - Posted: 8 Sep 2017, 1:18:01 UTC - in response to Message 87226.  

The quote sounds like one of my asides... Anyway, all seems back to normal and mostly I don't care much these days. I continue to bubble with imaginary constructive suggestions and continue to feel the world at large will do what it darn well pleases. Just saw another one implemented today about 30 years after I first wrote about it... (If my memory wasn't so darned selective I would be better at remembering all my erroneous ideas, too.)
#1 Freedom = (Meaningful - Constrained) Choice{5} != (Beer^3 | Speech)
ID: 87233 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : Upload errors.



©2024 University of Washington
https://www.bakerlab.org