Problems and Technical Issues with Rosetta@home

Author	Message
Miklos M Send message Joined: 8 Dec 13 Posts: 29 Credit: 5,277,251 RAC: 0	Message 77280 - Posted: 4 Aug 2014, 22:38:03 UTC - in response to Message 77265. Can you post a log? My backlog of uploads has cleared, but I am still getting a lot of Computation Errors. I have 32 shown in just a few minutes and the list is growing. .Computation errors here too. ID: 77280 · Rating: 0 · rate: /

Murasaki Send message Joined: 20 Apr 06 Posts: 303 Credit: 511,418 RAC: 0	Message 77281 - Posted: 4 Aug 2014, 23:34:41 UTC - in response to Message 77280. Miklos,M wrote: .Computation errors here too. Your errors are mainly listed as tasks aborted by the user. Did you notice any unusual behaviour? Of the ones I checked, the reassigned tasks are either still in progress or have been completed successfully by the next user. I noticed one error for a pd1 graftsheet task that had been reassigned to you but that batch of tasks was failing for almost everyone. ID: 77281 · Rating: 0 · rate: /

krypton Volunteer moderator Project developer Project scientist Send message Joined: 16 Nov 11 Posts: 108 Credit: 2,164,309 RAC: 0	Message 77282 - Posted: 5 Aug 2014, 0:07:51 UTC We increased the number of allowed concurrent users, but the number of max_connections (for mySQL database) remains at 800 users. Looking at the processlist I noticed that most of the users are in "sleep" status waiting for the default 8 hours wait_timeout before being killed. I set the wait_timeout to 30 mins (which still seem rather high, but maybe required for boinc manager?) If anyone sees any database errors in the boinc manager logs, please alert me! ID: 77282 · Rating: 0 · rate: /

googloo Send message Joined: 15 Sep 06 Posts: 137 Credit: 24,022,414 RAC: 2	Message 77283 - Posted: 5 Aug 2014, 0:29:37 UTC 8/4/2014 7:50:26 PM \| rosetta@home \| Started upload of rb_05_25_47349_92839__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_167016_2278_0_0 8/4/2014 7:50:49 PM \| rosetta@home \| Temporarily failed upload of rb_05_25_47349_92839__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_167016_2278_0_0: connect() failed 8/4/2014 7:50:49 PM \| rosetta@home \| Backing off 01:25:31 on upload of rb_05_25_47349_92839__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_167016_2278_0_0 8/4/2014 7:50:52 PM \| \| Project communication failed: attempting access to reference site 8/4/2014 7:50:54 PM \| \| Internet access OK - project servers may be temporarily down. 8/4/2014 7:57:51 PM \| rosetta@home \| Sending scheduler request: To fetch work. 8/4/2014 7:57:51 PM \| rosetta@home \| Requesting new tasks for CPU and NVIDIA 8/4/2014 7:58:14 PM \| rosetta@home \| Scheduler request failed: Couldn't connect to server 8/4/2014 7:58:17 PM \| \| Project communication failed: attempting access to reference site 8/4/2014 7:58:19 PM \| \| Internet access OK - project servers may be temporarily down. 8/4/2014 7:59:59 PM \| rosetta@home \| Sending scheduler request: To fetch work. 8/4/2014 7:59:59 PM \| rosetta@home \| Requesting new tasks for CPU and NVIDIA 8/4/2014 8:00:21 PM \| rosetta@home \| Scheduler request failed: Couldn't connect to server 8/4/2014 8:00:24 PM \| \| Project communication failed: attempting access to reference site 8/4/2014 8:00:25 PM \| \| Internet access OK - project servers may be temporarily down. 8/4/2014 8:02:41 PM \| rosetta@home \| Sending scheduler request: To fetch work. 8/4/2014 8:02:41 PM \| rosetta@home \| Requesting new tasks for CPU and NVIDIA 8/4/2014 8:03:04 PM \| rosetta@home \| Scheduler request completed: got 0 new tasks 8/4/2014 8:03:04 PM \| rosetta@home \| Server can't open database ID: 77283 · Rating: 0 · rate: /

Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0	Message 77284 - Posted: 5 Aug 2014, 0:48:12 UTC With 30,000 ADDITIONAL new users (after 15,000 the day before), I've been trying to leave the server alone. But I just ran a scheduler request for more work and got this: 8/4/2014 7:46:25 PM \| rosetta@home \| Scheduler request failed: Failure when receiving data from the peer Rosetta Moderator: Mod.Sense ID: 77284 · Rating: 0 · rate: /

googloo Send message Joined: 15 Sep 06 Posts: 137 Credit: 24,022,414 RAC: 2	Message 77285 - Posted: 5 Aug 2014, 1:27:53 UTC Last modified: 5 Aug 2014, 1:28:24 UTC All is well at my end, for now. Finally finished an upload and got new tasks. ID: 77285 · Rating: 0 · rate: /

krypton Volunteer moderator Project developer Project scientist Send message Joined: 16 Nov 11 Posts: 108 Credit: 2,164,309 RAC: 0	Message 77286 - Posted: 5 Aug 2014, 1:30:17 UTC - in response to Message 77285. Thanks googloo and Mod.Sense! All is well at my end, for now. Finally finished an upload and got new tasks. ID: 77286 · Rating: 0 · rate: /

Daedalus Send message Joined: 1 Aug 08 Posts: 39 Credit: 10,153,940 RAC: 0	Message 77287 - Posted: 5 Aug 2014, 14:47:09 UTC Since the 3 august i did not have any problem. ID: 77287 · Rating: 0 · rate: /

Timo Send message Joined: 9 Jan 12 Posts: 185 Credit: 45,662,635 RAC: 0	Message 77288 - Posted: 5 Aug 2014, 15:36:25 UTC All working well over here now. I did two things to hopefully help. 1) I set the 'Target CPU run time' in the online Rosetta Preferences page to 6 hours (double the default of 3 hours) which should mean my machines will bug the server for work less often. 2) I also set my local clients to cache a bit more work in case things go crazy again. I encourage others to take similar actions to help ration server resources. ID: 77288 · Rating: 0 · rate: /

Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0	Message 77289 - Posted: 5 Aug 2014, 19:11:44 UTC - in response to Message 77288. All working well over here now. I did two things to hopefully help. 1) I set the 'Target CPU run time' in the online Rosetta Preferences page to 6 hours (double the default of 3 hours) which should mean my machines will bug the server for work less often. 2) I also set my local clients to cache a bit more work in case things go crazy again. I encourage others to take similar actions to help ration server resources. I would encourage it as well. Just be aware that BOINC Manager will take some time to get used to the new target runtime. I generally suggest that you start with only a small cache of work, and bump the target runtime only a notch or two each day. I'd also suggest going to 12hrs or beyond if it suits the way you use your machine. Gradual change in runtime preference helps avoid BOINC Manager downloading more work than you can complete. The change WILL effect tasks that you've already downloaded once it completes an update to the project with the new preference setting. Hence the suggestion to start when cache of existing work is low. Do not increase your number of days of work to request until AFTER you have run at your final runtime setting for a day or so. Rosetta Moderator: Mod.Sense ID: 77289 · Rating: 0 · rate: /

googloo Send message Joined: 15 Sep 06 Posts: 137 Credit: 24,022,414 RAC: 2	Message 77294 - Posted: 6 Aug 2014, 18:56:41 UTC Not getting work, even though BOINC Manager requests tasks. Server status shows only 32 ready to send. ID: 77294 · Rating: 0 · rate: /

Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0	Message 77296 - Posted: 6 Aug 2014, 22:06:45 UTC Right the available work seems to be getting consumed about as quickly as it is being generated. The project is still adjusting to all of the new hosts that have all come at once. Which is a great problem to have! But I've seen on the server status page the actual number of tasks ready to send has been swinging rapidly as new work is generated, and then assigned to hungry hosts. The BOINC Manager will do retries for work and pull some down when work units are available. Rosetta Moderator: Mod.Sense ID: 77296 · Rating: 0 · rate: /

googloo Send message Joined: 15 Sep 06 Posts: 137 Credit: 24,022,414 RAC: 2	Message 77297 - Posted: 7 Aug 2014, 0:47:01 UTC - in response to Message 77296. Right the available work seems to be getting consumed about as quickly as it is being generated. The project is still adjusting to all of the new hosts that have all come at once. Which is a great problem to have! But I've seen on the server status page the actual number of tasks ready to send has been swinging rapidly as new work is generated, and then assigned to hungry hosts. The BOINC Manager will do retries for work and pull some down when work units are available. Yes, it's been alternating between "no work sent" and actually getting new tasks. ID: 77297 · Rating: 0 · rate: /

Sid Celery Send message Joined: 11 Feb 08 Posts: 2590 Credit: 47,220,881 RAC: 5	Message 77298 - Posted: 7 Aug 2014, 2:13:09 UTC I've been out of the country for 3 days and things seem to have sorted themselves out in that time - a watched pot never boils - save from some connectivity issues at my end, now resolved. I've just snuck a few tasks for 3 of my 4 machines. The last one should pinch some tomorrow, then I'm back to normal. I doubt WCG will see many further calls for work over the next month with the priorities I've got set. ID: 77298 · Rating: 0 · rate: /

TPCBF Send message Joined: 29 Nov 10 Posts: 111 Credit: 6,046,267 RAC: 1	Message 77311 - Posted: 9 Aug 2014, 22:30:35 UTC Just keep getting "no work sent"... :-( ID: 77311 · Rating: 0 · rate: /

krypton Volunteer moderator Project developer Project scientist Send message Joined: 16 Nov 11 Posts: 108 Credit: 2,164,309 RAC: 0	Message 77312 - Posted: 9 Aug 2014, 23:46:04 UTC - in response to Message 77311. Last modified: 9 Aug 2014, 23:50:51 UTC I've noticed the same message on my machine. Looking into it. But it's the same pattern as googloo reported. It seems to alternate... Just keep getting "no work sent"... :-( ID: 77312 · Rating: 0 · rate: /

Sid Celery Send message Joined: 11 Feb 08 Posts: 2590 Credit: 47,220,881 RAC: 5	Message 77314 - Posted: 10 Aug 2014, 4:20:47 UTC Arghh!! Just as I get home to clear a network issue holding up the upload of 48 tasks, the scheduler's taken offline and I'm full up with 55 WCG tasks instead to plough through. Not having any luck right now :( ID: 77314 · Rating: 0 · rate: /

shanen Send message Joined: 16 Apr 14 Posts: 195 Credit: 12,662,308 RAC: 0	Message 77315 - Posted: 10 Aug 2014, 4:37:44 UTC Last modified: 10 Aug 2014, 4:39:23 UTC Just checking to see if there is any useful information about the latest problems. Didn't expect to find any, but I could have been surprised. Perhaps they could just reissue old tasks for double-checking? Overall it seems to be another example of the supply of people who want to be helpful being larger than the supply of helpful things for them to do. I feel like starting a list... ID: 77315 · Rating: 0 · rate: /

Greg_BE Send message Joined: 30 May 06 Posts: 5770 Credit: 6,139,760 RAC: 0	Message 77316 - Posted: 10 Aug 2014, 5:11:14 UTC Now what is the problem? Everything is disabled!and again no news. ID: 77316 · Rating: 0 · rate: /

David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1480 Credit: 4,334,829 RAC: 0	Message 77317 - Posted: 10 Aug 2014, 5:18:08 UTC I fired up more make_work daemons to hopefully catch up with the work demand. We have plenty of work queued up but our daemons were having trouble catching up. Hopefully the updates I just made will help. Sorry but I had to stop the servers and restart so there was a short bit of down time, so short I didn't bother posting anything. ID: 77317 · Rating: 0 · rate: /