Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 7 · 8 · 9 · 10 · 11 · 12 · 13 . . . 302 · Next

AuthorMessage
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 81188 - Posted: 15 Feb 2017, 17:44:40 UTC - in response to Message 81187.  

I have a stuck workunit: it's spent several hours at least stuck on Model 6 Step 7205 (Fast Relax). It's an all-sheet hexamer. Is it worth letting it continue?

170214.3._fold_and_dock_SAVE_ALL_OUT_468868_40_0

https://boinc.bakerlab.org/rosetta/result.php?resultid=901485891


Yes, I'd let it continue.
ID: 81188 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
svincent

Send message
Joined: 30 Dec 05
Posts: 219
Credit: 12,120,035
RAC: 0
Message 81189 - Posted: 15 Feb 2017, 19:00:18 UTC - in response to Message 81188.  

I have a stuck workunit: it's spent several hours at least stuck on Model 6 Step 7205 (Fast Relax). It's an all-sheet hexamer. Is it worth letting it continue?

170214.3._fold_and_dock_SAVE_ALL_OUT_468868_40_0

https://boinc.bakerlab.org/rosetta/result.php?resultid=901485891


Yes, I'd let it continue.


Good advice: it successfully completed and validated. tx.
ID: 81189 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mqk

Send message
Joined: 5 Jul 16
Posts: 1
Credit: 4,241,297
RAC: 0
Message 81233 - Posted: 25 Feb 2017, 15:27:02 UTC

Hello,

I have BOINC 7.4.53 on my Android phone. I have received
a lot of computation errors (more then success results).
For example https://boinc.bakerlab.org/rosetta/result.php?resultid=902985860.
In that task details you can find

Invalid address 0xf6fd3e40 passed to free: value not allocated

in <stderr_txt> section.
For other tasks it is similar, only memory address differs (0xf71dde40, 0xf722be40 etc.).

Thank you for help

=======================================

All stderr out of task 902985860 (name 12res_c4m_pred1_c.6.9_0009_SAVE_ALL_OUT_469166_306_1, workunit 814737734):

<core_client_version>7.4.53</core_client_version>
<![CDATA[
<message>
process got signal 11
</message>
<stderr_txt>
[2017- 2-24 21:51:42:] :: BOINC:: Initializing ... ok.
[2017- 2-24 21:51:42:] :: BOINC :: boinc_init()
BOINC:: Worker initialized successfully.
command: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_android_3.77_arm-android-linux-gnu @res2_binoculars_binoculars_TRP2.flags -nstruct 10000 -cpu_run_time 10800 -boinc:max_nstruct 600 -checkpoint_interval 120 -mute all -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -run::rng mt19937 -constant_seed -jran 2005177
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_0cb8dc1_A.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/res2_binoculars_binoculars_TRP2.zip
Setting database description ...
Setting up checkpointing ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
# cpu_run_time_pref: 3600
reached end of minirosetta::main()
======================================================
DONE :: 2 starting structures 2497.49 cpu seconds
This process generated 2 decoys from 2 attempts
======================================================
BOINC :: WS_max 7.49036e+07

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
22:34:49 (1060): called boinc_finish(0)
Invalid address 0xf722be40 passed to free: value not allocated

</stderr_txt>
]]>
ID: 81233 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 81243 - Posted: 27 Feb 2017, 20:00:13 UTC
Last modified: 27 Feb 2017, 22:19:14 UTC

Hi mqk,

can you email me the type of device you're using with the hardware specs such as memory? Anything that can help us diagnose the issue. We do see instability in a number of devices but do not have access to test these devices locally at the moment.

dekim at uw dot edu

thanks,

David K
ID: 81243 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
WetherM

Send message
Joined: 28 Aug 12
Posts: 2
Credit: 200,913
RAC: 0
Message 81245 - Posted: 28 Feb 2017, 1:13:52 UTC

I have not received any WU from Rosetta for months. I have removed and added the project back. I continually get the message "Communication Deferred".

I have set cache to 0 days +1.5 days as suggested above in this thread. I am running BOINC mgr 7.6.33 on a late 2011 MacBook Pro 2.2GHz Intel Core i7, OS X 10.11.6.

This device has been functioning for years, Seti since 2004, Rosetta since 2012. Seti still functions and receives work. Rosetta no longer will connect and get work. I have suspended Seti, still no work for Rosetta and no connection, continuous "Communication Deferred" message with a countdown timer.

Any ideas are helpful. I have made my way through this thread trying all suggestions but to no avail.

Thanks,
ID: 81245 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 81246 - Posted: 28 Feb 2017, 1:24:07 UTC - in response to Message 81245.  
Last modified: 28 Feb 2017, 1:27:21 UTC

I have not received any WU from Rosetta for months. I have removed and added the project back. I continually get the message "Communication Deferred".

I have set cache to 0 days +1.5 days as suggested above in this thread. I am running BOINC mgr 7.6.33 on a late 2011 MacBook Pro 2.2GHz Intel Core i7, OS X 10.11.6.

This device has been functioning for years, Seti since 2004, Rosetta since 2012. Seti still functions and receives work. Rosetta no longer will connect and get work. I have suspended Seti, still no work for Rosetta and no connection, continuous "Communication Deferred" message with a countdown timer.

Any ideas are helpful. I have made my way through this thread trying all suggestions but to no avail.

Thanks,


Can you go to the boinc manager and click on the Tools -> Event Log... pull down option and then copy and email me the contents? dekim at uw dot edu


I just installed the boinc client on a 10.11.6 iMac (late 2011) 2.5 GHz Intel Core i5, and within a few minutes it started running a R@h task. I'm not sure what is going on with your particular computer but maybe the event log might shed some useful info.
ID: 81246 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2125
Credit: 41,249,734
RAC: 9,368
Message 81248 - Posted: 1 Mar 2017, 2:57:16 UTC - in response to Message 81243.  

Hi mqk,
Can you email me the type of device you're using with the hardware specs such as memory? Anything that can help us diagnose the issue. We do see instability in a number of devices but do not have access to test these devices locally at the moment.

dekim at uw dot edu

thanks,

David K

Same here using a Samsung Galaxy S6
Hardware specifics available through the task details of each WU
ID: 81248 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 81249 - Posted: 1 Mar 2017, 4:36:20 UTC - in response to Message 81248.  

Hi mqk,
Can you email me the type of device you're using with the hardware specs such as memory? Anything that can help us diagnose the issue. We do see instability in a number of devices but do not have access to test these devices locally at the moment.

dekim at uw dot edu

thanks,

David K

Same here using a Samsung Galaxy S6
Hardware specifics available through the task details of each WU


Thanks!

If anyone else is having similar issues, please let us know the device type.

ID: 81249 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2125
Credit: 41,249,734
RAC: 9,368
Message 81265 - Posted: 7 Mar 2017, 0:42:05 UTC

Intermittent problems with the website atm?
06/03/2017 23:16:51 | rosetta@home | Sending scheduler request: Requested by user.
06/03/2017 23:16:51 | rosetta@home | Requesting new tasks for CPU
06/03/2017 23:16:53 | rosetta@home | Scheduler request failed: Server returned nothing (no headers, no data)
06/03/2017 23:17:04 | | Project communication failed: attempting access to reference site
06/03/2017 23:17:07 | | Internet access OK - project servers may be temporarily down.
07/03/2017 00:10:19 | rosetta@home | Computation for task tj_2_13_junc_X_DHR78_DHR47_l3_t1_t3_0_v4c_fragments_abinitio_SAVE_ALL_OUT_468179_119_0 finished
07/03/2017 00:10:19 | rosetta@home | [dcf] DCF: 1.468696->1.467995, raw_ratio 1.461685, adj_ratio 0.995227
07/03/2017 00:10:24 | rosetta@home | Starting task NCAA9_BISHIS_nohet_11_fragments_fold_SAVE_ALL_OUT_472210_104_0
07/03/2017 00:10:24 | rosetta@home | [cpu_sched] Starting task NCAA9_BISHIS_nohet_11_fragments_fold_SAVE_ALL_OUT_472210_104_0 using minirosetta version 373 in slot 1
07/03/2017 00:10:25 | rosetta@home | Started upload of tj_2_13_junc_X_DHR78_DHR47_l3_t1_t3_0_v4c_fragments_abinitio_SAVE_ALL_OUT_468179_119_0_0
07/03/2017 00:10:27 | rosetta@home | Temporarily failed upload of tj_2_13_junc_X_DHR78_DHR47_l3_t1_t3_0_v4c_fragments_abinitio_SAVE_ALL_OUT_468179_119_0_0: transient HTTP error
07/03/2017 00:10:27 | rosetta@home | Backing off 00:02:18 on upload of tj_2_13_junc_X_DHR78_DHR47_l3_t1_t3_0_v4c_fragments_abinitio_SAVE_ALL_OUT_468179_119_0_0
07/03/2017 00:10:38 | | Project communication failed: attempting access to reference site
07/03/2017 00:10:41 | | Internet access OK - project servers may be temporarily down.
07/03/2017 00:12:47 | rosetta@home | Started upload of tj_2_13_junc_X_DHR78_DHR47_l3_t1_t3_0_v4c_fragments_abinitio_SAVE_ALL_OUT_468179_119_0_0
07/03/2017 00:12:55 | rosetta@home | Finished upload of tj_2_13_junc_X_DHR78_DHR47_l3_t1_t3_0_v4c_fragments_abinitio_SAVE_ALL_OUT_468179_119_0_0
07/03/2017 00:12:58 | rosetta@home | Sending scheduler request: To report completed tasks.
07/03/2017 00:12:58 | rosetta@home | Reporting 1 completed tasks
07/03/2017 00:12:58 | rosetta@home | Requesting new tasks for CPU
07/03/2017 00:12:59 | rosetta@home | Scheduler request completed: got 1 new tasks
07/03/2017 00:13:01 | rosetta@home | Started download of jr8_0091__data.zip
07/03/2017 00:13:03 | rosetta@home | Incomplete read of 986.000000 < 5KB for jr8_0091__data.zip - truncating
07/03/2017 00:13:03 | rosetta@home | Finished download of jr8_0091__data.zip
07/03/2017 00:13:03 | rosetta@home | [error] File jr8_0091__data.zip has wrong size: expected 3292770, got 0
07/03/2017 00:13:03 | rosetta@home | [error] Checksum or signature error for jr8_0091__data.zip
07/03/2017 00:17:05 | rosetta@home | Sending scheduler request: To report completed tasks.
07/03/2017 00:17:05 | rosetta@home | Reporting 1 completed tasks
07/03/2017 00:17:05 | rosetta@home | Requesting new tasks for CPU
07/03/2017 00:17:06 | rosetta@home | Scheduler request completed: got 1 new tasks
07/03/2017 00:17:08 | rosetta@home | Started download of fs_2_13_junc_X_DHR79_DHR47_l4_h22_l2_t1_t3_0_v6c_fragments_fold_data.zip
07/03/2017 00:17:18 | rosetta@home | Finished download of fs_2_13_junc_X_DHR79_DHR47_l4_h22_l2_t1_t3_0_v6c_fragments_fold_data.zip

ID: 81265 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile BlisteringSheep

Send message
Joined: 15 Sep 06
Posts: 5
Credit: 26,529,678
RAC: 3,979
Message 81268 - Posted: 8 Mar 2017, 21:56:12 UTC

bakerlab.org hosts are not resolvable.

None of the bakerlab.org hosts (boinc, ralph, srv[1-5], mail) are globally resolvable. I haven't been able to find any public DNS server that will resolve these (I was able to connect by writing static entries in my local hosts file).
ID: 81268 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile BlisteringSheep

Send message
Joined: 15 Sep 06
Posts: 5
Credit: 26,529,678
RAC: 3,979
Message 81269 - Posted: 8 Mar 2017, 22:23:03 UTC - in response to Message 81268.  

bakerlab.org hosts are not resolvable.

None of the bakerlab.org hosts (boinc, ralph, srv[1-5], mail) are globally resolvable. I haven't been able to find any public DNS server that will resolve these (I was able to connect by writing static entries in my local hosts file).


Update: from this list of providers, only Comodo Secure DNS (8.26.56.26), DNS Advantage (156.154.70.1), Norton ConnectSafe (199.85.126.10) and Alternate DNS (198.101.242.72) were able to resolve.

Level3 (209.244.0.3)
Verisign (64.6.64.6)
Google (8.8.8.8)
DNS.WATCH (84.200.69.80)
Comodo Secure DNS (8.26.56.26)
OpenDNS Home (208.67.222.222)
DNS Advantage (156.154.70.1)
Norton ConnectSafe (199.85.126.10)
GreenTeamDNS (81.218.119.11)
SafeDNS (195.46.39.39)
OpenNIC (96.90.175.167)
SmartViper (208.76.50.50)
Dyn (216.146.35.35)
FreeDNS (37.235.1.174)
Alternate DNS (198.101.242.72)
Yandex.DNS (77.88.8.8)
UncensoredDNS (91.239.100.100)
Hurricane Electric (74.82.42.42)
puntCAT (109.69.8.51)
ID: 81269 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile BlisteringSheep

Send message
Joined: 15 Sep 06
Posts: 5
Credit: 26,529,678
RAC: 3,979
Message 81272 - Posted: 9 Mar 2017, 3:15:19 UTC - in response to Message 81269.  

bakerlab.org hosts are not resolvable.

None of the bakerlab.org hosts (boinc, ralph, srv[1-5], mail) are globally resolvable. I haven't been able to find any public DNS server that will resolve these (I was able to connect by writing static entries in my local hosts file).


Update: from this list of providers, only Comodo Secure DNS (8.26.56.26), DNS Advantage (156.154.70.1), Norton ConnectSafe (199.85.126.10) and Alternate DNS (198.101.242.72) were able to resolve.

Level3 (209.244.0.3)
Verisign (64.6.64.6)
Google (8.8.8.8)
DNS.WATCH (84.200.69.80)
Comodo Secure DNS (8.26.56.26)
OpenDNS Home (208.67.222.222)
DNS Advantage (156.154.70.1)
Norton ConnectSafe (199.85.126.10)
GreenTeamDNS (81.218.119.11)
SafeDNS (195.46.39.39)
OpenNIC (96.90.175.167)
SmartViper (208.76.50.50)
Dyn (216.146.35.35)
FreeDNS (37.235.1.174)
Alternate DNS (198.101.242.72)
Yandex.DNS (77.88.8.8)
UncensoredDNS (91.239.100.100)
Hurricane Electric (74.82.42.42)
puntCAT (109.69.8.51)


To help you debug the problem, here is the output from nslookup when using the authoritative nameservers (as reported by InterNIC https://reports.internic.net/cgi/whois?whois_nic=bakerlab.org&type=domain) ns1.bakerlab.org and ns5.bakerlab.org in /etc/resolv.conf:

;; Got recursion not available from 128.95.160.253, trying next server
;; Got recursion not available from 140.142.20.94, trying next server
Server: 8.8.8.8
Address: 8.8.8.8#53

** server can't find boinc.bakerlab.org: NXDOMAIN



ID: 81272 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Dj

Send message
Joined: 6 Jan 17
Posts: 1
Credit: 6,257,818
RAC: 0
Message 81277 - Posted: 9 Mar 2017, 14:37:12 UTC - in response to Message 81268.  

bakerlab.org hosts are not resolvable.

None of the bakerlab.org hosts (boinc, ralph, srv[1-5], mail) are globally resolvable. I haven't been able to find any public DNS server that will resolve these (I was able to connect by writing static entries in my local hosts file).


I tried the same in my Linux hosts, but even with nsswitch.conf set to resolve from files first, it would still refuse to find the local entries for the servers we need, and for reason I didn't bother to take a deep dive into.

What I did find is that the whois-listed nameservers are reachable, and they do resolve the domain information, but have limited (or nonexistent) recursion, so on many of my systems that don't necessarily need to be resolving names of things other than R@H servers, I just set the domain's listed servers for those machines, and they continue to download/upload work.

As to why something seems to be broken that is preventing the root nameservers from finding how to pass us along to bakerlab's servers, IDK... everything looks like it should be working, but there's a breakdown right inside there...
ID: 81277 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 81282 - Posted: 10 Mar 2017, 15:16:43 UTC - in response to Message 81277.  
Last modified: 10 Mar 2017, 15:46:54 UTC

bakerlab.org hosts are not resolvable.

None of the bakerlab.org hosts (boinc, ralph, srv[1-5], mail) are globally resolvable. I haven't been able to find any public DNS server that will resolve these (I was able to connect by writing static entries in my local hosts file).


This is what worked for me on Windows and Ubuntu:
Hosts file

Of course, to see that here you have to be able to get to this website anyway, which is a bit of a Catch-22. Good luck.

(And I am throwing other machine on it, to do what ICANN.)
ID: 81282 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Warped

Send message
Joined: 15 Jan 06
Posts: 48
Credit: 1,788,185
RAC: 0
Message 81283 - Posted: 10 Mar 2017, 15:58:15 UTC
Last modified: 10 Mar 2017, 15:58:37 UTC

ID: 81283 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
neil

Send message
Joined: 22 Dec 06
Posts: 3
Credit: 18,162,630
RAC: 139
Message 81284 - Posted: 10 Mar 2017, 16:39:18 UTC

Yay!!
ID: 81284 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Keith E. Laidig
Volunteer moderator
Project developer
Avatar

Send message
Joined: 1 Jul 05
Posts: 154
Credit: 117,189,961
RAC: 0
Message 81285 - Posted: 10 Mar 2017, 16:39:28 UTC

Folks! Thanks for your patience and support during this outage. We appreciate the support of the BOINC community! You are an impressive crew! -KEL

ID: 81285 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Dr Who Fan
Avatar

Send message
Joined: 28 May 06
Posts: 70
Credit: 267,906
RAC: 335
Message 81289 - Posted: 10 Mar 2017, 17:51:17 UTC - in response to Message 81285.  

Folks! Thanks for your patience and support during this outage. We appreciate the support of the BOINC community! You are an impressive crew! -KEL


Nice to see that things finally got resolved.

ID: 81289 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile shanen
Avatar

Send message
Joined: 16 Apr 14
Posts: 195
Credit: 12,662,308
RAC: 0
Message 81293 - Posted: 10 Mar 2017, 21:18:51 UTC - in response to Message 81289.  

Folks! Thanks for your patience and support during this outage. We appreciate the support of the BOINC community! You are an impressive crew! -KEL


Nice to see that things finally got resolved.


Ditto, and now let me go blame the spammers on "Cafe Rosetta".
#1 Freedom = (Meaningful - Constrained) Choice{5} != (Beer^3 | Speech)
ID: 81293 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2125
Credit: 41,249,734
RAC: 9,368
Message 81298 - Posted: 11 Mar 2017, 1:23:48 UTC

I'm going to admit that it crossed my mind bakerlab may've decided that they didn't need all the grief they've been getting from moaning nerds like me and decided to pull the whole project...

...so I'm quite relieved it turned out to be someone else's fault.

Welcome back all. And try not to do that again <heart attack mode off>
ID: 81298 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 7 · 8 · 9 · 10 · 11 · 12 · 13 . . . 302 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org