Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 299 · 300 · 301 · 302 · 303 · 304 · 305 . . . 308 · Next
Author | Message |
---|---|
ArcSedna Send message Joined: 23 Oct 11 Posts: 16 Credit: 71,462,581 RAC: 87,530 |
I'm having transient HTTP errors recently. According to my client log with http_debug flag enabled, some of the download server(s) might have SSL certificate problem. Rosetta@home 2024/12/04 09:16 [http] HTTP_OP::init_get(): https://boinc-files.bakerlab.org/rosetta/download/294/8a_hal_x_hal_8aa_4jp9719_d196_0001_1.flags |
Jean-David Beyer Send message Joined: 2 Nov 05 Posts: 195 Credit: 6,613,600 RAC: 9,094 |
Would you prefer that they cancel workunits after they start? They obviously want one copy to finish soon, and may not have information on whether the first one will ever finish. I prefer they not send me the work unit at all if another instance of it is presumably running. Since they want only one result, they should wait until they decide the current unit has timed out (or failed) before sending me the new one. Then they save the network cost of sending me the work unit and later, the cost pf telling my Boinc client to cancel the one they sent me. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2137 Credit: 41,518,559 RAC: 15,775 |
Would you prefer that they cancel workunits after they start? They obviously want one copy to finish soon, and may not have information on whether the first one will ever finish. The sole criteria is that the task has passed its deadline. If the previous host completes the task late, but before you've started it, the server will ask your system to abort the task it sent you. If you've already started the task, it won't abort running and you get this situation. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1722 Credit: 18,356,357 RAC: 25,250 |
It is a case of poor BOINC server configuration- ideally it would be configured so that there would be a grace period after the deadline for Tasks that have missed the deadline but are still being processed to be returned before a Task is resent. There would still be some resends that are cancelled when the original finally comes in late, but much less than there are now, which would reduce the load on the servers. Also, a Task cancelled by the project really shouldn't be classed as an error. And the easiest & best way to avoid having this occur? Run with no cache at all. If you've got it and it is being processed, then it won't be cancelled by the server. The larger your cache is, then the longer it takes to start processing work you have downloaded, and the more likely it is that Tasks will be cancelled by the Project. No cache, no cancelled Tasks, large cache, lots of cancelled Tasks. Your choice. Grant Darwin NT |
ArcSedna Send message Joined: 23 Oct 11 Posts: 16 Credit: 71,462,581 RAC: 87,530 |
It seems that, the DNS server of my ISP resolves boinc-files.bakerlab.org to 128.95.160.135 or 128.95.160.134 , which have expired SSL certs. I wrote alternative IP to my local /etc/hosts manually, like 128.95.160.156 boinc-files.bakerlab.org , now every downloads working fine so far. I'm having transient HTTP errors recently. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1722 Credit: 18,356,357 RAC: 25,250 |
Just checked my Event log, and downloads are instantly timing out. Tried ArcSedna's suggestion- Success! Thanks for that info. Edit- could be other issues occurring- Getting Ghost Tasks. One system has requested work, log says got 2 new Tasks, but no Tasks downloaded. Grant Darwin NT |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1722 Credit: 18,356,357 RAC: 25,250 |
And to add to the download issues, boinc-process has died again. Grant Darwin NT |
Dr Who Fan Send message Joined: 28 May 06 Posts: 79 Credit: 273,880 RAC: 361 |
Download issues on 3 tasks on 3 devices one taks has been trying to download over 12 hours now. Error log from one device showing the SSL EXPIRED CERTIFICATE MESSAGE: Rosetta@home 12/4/2024 05:29:27 [http] HTTP_OP::init_get(): https://boinc-files.bakerlab.org/rosetta/download/a4/flags_rb_12_04_647237_640782__t000__0_C1_robetta |
Bryn Mawr Send message Joined: 26 Dec 18 Posts: 398 Credit: 12,294,748 RAC: 9,249 |
Many thanks, downloads fixed, tasks running :-) |
BobbyB Send message Joined: 25 Apr 20 Posts: 2 Credit: 2,088,662 RAC: 27,165 |
12 tasks hung for 15 hours. Is it me or something else. I aborted them 10 minutes ago. Running headless. Example log for 1: 2024-12-03 19:04:54 | Rosetta@home | Started download of 8a_hal_v_hal_8aa_4jp8289_d247_0001_1.flags 2024-12-03 19:04:55 | | Internet access OK - project servers may be temporarily down. 2024-12-03 19:04:55 | Rosetta@home | Temporarily failed download of 8a_hal_v_hal_8aa_4jp8289_d247_0001_1.flags: transient HTTP error 2024-12-03 19:04:55 | Rosetta@home | Backing off 00:02:02 on download of 8a_hal_v_hal_8aa_4jp8289_d247_0001_1.flags ... ... 2024-12-04 10:22:51 | Rosetta@home | Temporarily failed download of 8a_hal_v_hal_8aa_4jp8289_d247_0001_1.flags: transient HTTP error 2024-12-04 10:22:51 | Rosetta@home | Backing off 01:46:39 on download of 8a_hal_v_hal_8aa_4jp8289_d247_0001_1.flags 2024-12-04 10:22:52 | | Internet access OK - project servers may be temporarily down. |
Jean-David Beyer Send message Joined: 2 Nov 05 Posts: 195 Credit: 6,613,600 RAC: 9,094 |
12 tasks hung for 15 hours. Is it me or something else. I aborted them 10 minutes ago.. I get similar results to yours. Wed 04 Dec 2024 11:16:24 AM EST | Rosetta@home | Started download of 8a_hal_w_hal_8aa_4jp5150_d23_0001_1.zip Wed 04 Dec 2024 11:16:26 AM EST | | Project communication failed: attempting access to reference site Wed 04 Dec 2024 11:16:26 AM EST | Rosetta@home | Temporarily failed download of 8a_hal_w_hal_8aa_4jp5150_d23_0001_1.zip: transient HTTP error Wed 04 Dec 2024 11:16:26 AM EST | Rosetta@home | Backing off 03:28:48 on download of 8a_hal_w_hal_8aa_4jp5150_d23_0001_1.zip Wed 04 Dec 2024 11:16:28 AM EST | | Internet access OK - project servers may be temporarily down. But my machine continues to run currently running rosetta tasks and starts those I still have. I am not sure why your machine was hung. I seem to be able to upload results. They do seem to be having server problems... |
kotenok2000 Send message Joined: 22 Feb 11 Posts: 271 Credit: 507,897 RAC: 496 |
Chrome accepts connection to https://boinc-files.bakerlab.org, but curl does not. |
kotenok2000 Send message Joined: 22 Feb 11 Posts: 271 Credit: 507,897 RAC: 496 |
This has happened before. https://boinc.bakerlab.org/rosetta/forum_thread.php?id=6893&postid=108041 |
kotenok2000 Send message Joined: 22 Feb 11 Posts: 271 Credit: 507,897 RAC: 496 |
Now it works. |
Dr Who Fan Send message Joined: 28 May 06 Posts: 79 Credit: 273,880 RAC: 361 |
Now have 3 tasks stuck in download with many retries, 2 of them on Android phones. Manual retry fail instantly. |
Dr Who Fan Send message Joined: 28 May 06 Posts: 79 Credit: 273,880 RAC: 361 |
Now 4 tasks stuck on downloads all on Android phones. |
mmonnin Send message Joined: 2 Jun 16 Posts: 61 Credit: 25,390,629 RAC: 47,239 |
I ran out of work on multiple PCs with a 1+ day queue set due to all of the failed downloads. I haven't been able to get tasks consistently since the 1.7m tasks went up. With that amount of work I could have gotten my 25m goal either Thurs or Friday. |
Jean-David Beyer Send message Joined: 2 Nov 05 Posts: 195 Credit: 6,613,600 RAC: 9,094 |
It is not working for me. I have one task trying to download: two files. I checked and they will time out Friday afternoon, so they better deliver enough before then for me to get them done. Web site says download server is green, though a lot of others are red. |
BobbyB Send message Joined: 25 Apr 20 Posts: 2 Credit: 2,088,662 RAC: 27,165 |
Still not working. Download stuck. 2024-12-04 16:01:39 | Rosetta@home | Temporarily failed download of input_rb_12_01_646399_639907__t000__0_C1_robetta.zip: transient HTTP error 2024-12-04 16:01:39 | Rosetta@home | Backing off 00:06:14 on download of input_rb_12_01_646399_639907__t000__0_C1_robetta.zip 2024-12-04 16:01:40 | | Internet access OK - project servers may be temporarily down. This would be a really good time for Rosetta to get their stuff working properly. Around Dec 6 or 7 World Community Grid will be offline for about 1 month. All those people and their machines will be looking for work. I have 76 cores which will be hungry. |
mmonnin Send message Joined: 2 Jun 16 Posts: 61 Credit: 25,390,629 RAC: 47,239 |
The host file update worked for me in Win10 and Linux. Most clients needed a restart as it thought there was a file stuck downloading (even though it showed none) so nothing else would download. The client restart cleared that up. |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2024 University of Washington
https://www.bakerlab.org