Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 . . . 302 · Next

AuthorMessage
BarryAZ

Send message
Joined: 27 Dec 05
Posts: 153
Credit: 30,843,285
RAC: 0
Message 81105 - Posted: 30 Jan 2017, 16:36:49 UTC - in response to Message 81104.  

As should be quite apparent to the truly active participants of this project, communicating with active participants by the project is a VERY low priority in the Rosetta scheme of things.

That means that issues perhaps viewed as insignificant by the project folks (or perhaps issues that they are simply not aware of) only get passing response.

I believe it is an informed choice made by the project to not allocate time and resources to the 'care and feeding' of the active user community.

Users get to prioritize as well, as a long time participant (going back over ten years) there have been times when maintained a daily completed work traffic generating 30 to 40 thousand credits. These days, it is more like 5 thousand credits as I shifted MY priorities to the WorldGrid project.

We all make choices.

And another blank day for stats

ID: 81105 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2125
Credit: 41,228,659
RAC: 9,701
Message 81106 - Posted: 31 Jan 2017, 3:26:27 UTC - in response to Message 81105.  

As should be quite apparent to the truly active participants of this project, communicating with active participants by the project is a VERY low priority in the Rosetta scheme of things.

That means that issues perhaps viewed as insignificant by the project folks (or perhaps issues that they are simply not aware of) only get passing response.

I believe it is an informed choice made by the project to not allocate time and resources to the 'care and feeding' of the active user community.

Users get to prioritize as well, as a long time participant (going back over ten years) there have been times when maintained a daily completed work traffic generating 30 to 40 thousand credits. These days, it is more like 5 thousand credits as I shifted MY priorities to the WorldGrid project.

We all make choices.
And another blank day for stats

Aside from that, I've wondered whether the project is subject to the communication restrictions instructed from above. Though that kind of subject may be more appropriately discussed in Café Rosetta
ID: 81106 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1831
Credit: 119,627,225
RAC: 10,243
Message 81108 - Posted: 1 Feb 2017, 10:03:37 UTC - in response to Message 81106.  

I think it's probably more likely that there's no funding for someone dedicated to the role, and everyone else has other priorities so it falls to no-one. There could obviously be lots more compute power available here but it might be that there is sufficient as-is so it works for getting the science done, regardless of how frustrating it is for users.

I just found my computer sat idle with the 24 hour back-off bug. I'll add a second project, but because the server code here is so old I don't believe I can add a project as a backup - only as a low % so I'll do that.

Maybe it would be useful if we maintained a sticky thread (Mod.Sense!) where we list the priorities from our point of view, so the team can see what we think needs fixing. I.e. under URGENT, we'd have the 24hr bug, or make work, and then under the next heading (Less urgent?) we'd have the server upgrade, maybe with a link to the discussion of it. If anything breaks, stick it at the top.

Might that help?

D

ID: 81108 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 81110 - Posted: 1 Feb 2017, 14:09:57 UTC

But I don't quite understand why our moderator can't just email someone at UW when exceptional problems arise. Aren't they on speaking terms?

If they don't need more work done, so be it. But they should tell us I believe.
ID: 81110 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
BarryAZ

Send message
Joined: 27 Dec 05
Posts: 153
Credit: 30,843,285
RAC: 0
Message 81112 - Posted: 1 Feb 2017, 15:49:12 UTC - in response to Message 81110.  

That raises an interesting question. Maybe a driving reason for the project to be essentially NO attention to the care and feeding of its most active participants reflects an internal decision that they already have too much work to process internally.

That, by actively ignoring the user community they are hoping to reduce the number of work units processed.

I can confirm that approach has worked just fine for me....



But I don't quite understand why our moderator can't just email someone at UW when exceptional problems arise. Aren't they on speaking terms?

If they don't need more work done, so be it. But they should tell us I believe.


ID: 81112 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 81113 - Posted: 1 Feb 2017, 15:56:48 UTC - in response to Message 81110.  
Last modified: 1 Feb 2017, 15:59:26 UTC

But I don't quite understand why our moderator can't just email someone at UW when exceptional problems arise. Aren't they on speaking terms?

If they don't need more work done, so be it. But they should tell us I believe.


Actually, over the weekend, I did send an EMail to DK and Dr. Baker, with links to these msg boards, summarizing some suggested "todos" that would eliminate some of the annoyances I believe can be easily addressed.

I cited:

  • 24hr backoff,
  • short (2 day) deadlines,
  • the mention of "Android" in the message when your scheduler request does not return work,
  • and the suggestion that the logic that detects whether or not to run the next model take the deadline in to account in addition to the runtime preference.



The server upgrade we know is coming for hardware, and on their list for BOINC server code.


Rosetta Moderator: Mod.Sense
ID: 81113 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 81114 - Posted: 1 Feb 2017, 17:01:52 UTC - in response to Message 81113.  

Actually, over the weekend, I did send an EMail to DK and Dr. Baker, with links to these msg boards, summarizing some suggested "todos" that would eliminate some of the annoyances I believe can be easily addressed.

Thanks very much. It will be interesting to see their response. I sometimes think like BarryAZ that it is just an indirect way of managing their workload. It works for a while, but I don't think the long-term prospects are good.

ID: 81114 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 81116 - Posted: 1 Feb 2017, 19:26:12 UTC

In response to Mod.Sense's feedback/recommendations, I increased the short 2 day deadline to 3 days. I don't think I can increase the deadline longer for these high priority jobs. The deadlines are short for an important reason since there are time constraints for these jobs (weekly CAMEO benchmarks for Robetta). I also increased the standard deadline from 5-7 days to 2 weeks which should help.

If anyone knows how to change the 24 hour backoff, please chime in. I'm not sure if it's server or client logic and configurable.

Also, if anyone knows how to fix the "Android" alert, please let us know. I'll of course look into this also.

The last issue point I think can be coded into our application and I'll put it on the list of things to do for the next update.

Also, if you are not getting work, it is most likely because there isn't any to issue at the time. Our demand comes in waves as projects progress. However, our public structure prediction server, Robetta, usually provides continual work. It was down for a few days last week for updates though.
ID: 81116 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
BarryAZ

Send message
Joined: 27 Dec 05
Posts: 153
Credit: 30,843,285
RAC: 0
Message 81123 - Posted: 2 Feb 2017, 7:40:23 UTC - in response to Message 81116.  

David, thanks much for jumping in here -- the air had been getting rather thin.

My sense is that the 24 hour backoff is a server specific function -- as in the multiple other projects I work with their is a progressive backoff typically starting at either 5 minutes or 1 hour and progressing with multiple non-responsiveness up to as much as 3 to 5 hours and then recycling to a 1 hour back off. It is only with Rosetta I have seen that.

As to the android no work report -- again that is likely a project specific configuration. Other projects provide a 'no work for your applications' message but with Rosetta it seems specific to android work -- I would think that could be configured out.

I don't do code though... so its all speculative.

The other issue -- for which your post is seriously appreciated, is the sense of the active user community being a bit 'unloved' by a lack of periodic responses from folks such as yourself.

I'm sure you have more work than time, but even a weekly "we're here and watching" message might reduce that sense.

Thanks again for you message.


In response to Mod.Sense's feedback/recommendations, I increased the short 2 day deadline to 3 days. I don't think I can increase the deadline longer for these high priority jobs. The deadlines are short for an important reason since there are time constraints for these jobs (weekly CAMEO benchmarks for Robetta). I also increased the standard deadline from 5-7 days to 2 weeks which should help.

If anyone knows how to change the 24 hour backoff, please chime in. I'm not sure if it's server or client logic and configurable.

Also, if anyone knows how to fix the "Android" alert, please let us know. I'll of course look into this also.

The last issue point I think can be coded into our application and I'll put it on the list of things to do for the next update.

Also, if you are not getting work, it is most likely because there isn't any to issue at the time. Our demand comes in waves as projects progress. However, our public structure prediction server, Robetta, usually provides continual work. It was down for a few days last week for updates though.

ID: 81123 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 81129 - Posted: 2 Feb 2017, 19:37:04 UTC

I found where the relevant parameters are set in the scheduling code. I'm open to suggestions and feedback for more preferable values as long as it doesn't cause too much load on our servers.


// various delay params.
// Any of these could be moved into SCHED_CONFIG, if projects need control.

#define DELAY_MISSING_KEY 3600
// account key missing or invalid
#define DELAY_UNACCEPTABLE_OS 3600*24
// Darwin 5.x or 6.x (E@h only)
#define DELAY_BAD_CLIENT_VERSION 3600*24
// client version < config.min_core_client_version
#define DELAY_NO_WORK_SKIP 0
// no work, config.nowork_skip is set
// Rely on the client's exponential backoff in this case
#define DELAY_PLATFORM_UNSUPPORTED 3600*24
// platform not in our DB
#define DELAY_DISK_SPACE 3600
// too little disk space or prefs (locality scheduling)
#define DELAY_DELETE_FILE 3600*4
// wait for client to delete a file (locality scheduling)
#define DELAY_ANONYMOUS 3600*4
// anonymous platform client doesn't have version
#define DELAY_NO_WORK_TEMP 0
// client asked for work but we didn't send any,
// because of a reason that could be fixed by user
// (e.g. prefs, or run BOINC more)
// Rely on the client's exponential backoff in this case
#define DELAY_NO_WORK_PERM 3600*24
// client asked for work but we didn't send any,
// because of a reason not easily changed
// (like wrong kind of computer)
#define DELAY_NO_WORK_CACHE 0
// client asked for work but we didn't send any,
// because user had too many results in cache.
// Rely on client's exponential backoff
#define DELAY_MAX (2*86400)
// maximum delay request
ID: 81129 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 81130 - Posted: 2 Feb 2017, 22:00:45 UTC

I believe this one is the behavior people are seeing elsewhere and expecting:

#define DELAY_NO_WORK_SKIP 0
// no work, config.nowork_skip is set
// Rely on the client's exponential backoff in this case
Rosetta Moderator: Mod.Sense
ID: 81130 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2125
Credit: 41,228,659
RAC: 9,701
Message 81135 - Posted: 3 Feb 2017, 13:17:21 UTC - in response to Message 81129.  

I found where the relevant parameters are set in the scheduling code. I'm open to suggestions and feedback for more preferable values as long as it doesn't cause too much load on our servers.

I'd suggest 1 hour as a reasonable compromise between server load and our buffer sizes
ID: 81135 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Erich56

Send message
Joined: 11 Jan 16
Posts: 35
Credit: 1,437,503
RAC: 0
Message 81143 - Posted: 6 Feb 2017, 14:19:13 UTC

For several days, I've been back to Rosetta with one of my PCs, and have crunched 14 tasks since then.
Today, when BOINC was trying to download the next task, I got the notice

"06.02.2017 14:49:08 | rosetta@home | Rosetta Mini for Android is not available for your type of computer."

How come? I am not trying to crunch Rosetta Mini for Android on my Windows PC.
ID: 81143 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Erich56

Send message
Joined: 11 Jan 16
Posts: 35
Credit: 1,437,503
RAC: 0
Message 81144 - Posted: 6 Feb 2017, 16:01:06 UTC - in response to Message 81143.  

"06.02.2017 14:49:08 | rosetta@home | Rosetta Mini for Android is not available for your type of computer."

Just now, after a while, a new task was downloaded :-)
ID: 81144 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Erich56

Send message
Joined: 11 Jan 16
Posts: 35
Credit: 1,437,503
RAC: 0
Message 81145 - Posted: 6 Feb 2017, 16:44:08 UTC

Unfortunaltely, now again the BOINC messanger shows the meassage

"6.02.2017 17:38:42 | rosetta@home | Rosetta Mini for Android is not available for your type of Computer"

when trying to download a new task on my Windows system. Why so? What's going wrong?


ID: 81145 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1994
Credit: 9,623,704
RAC: 8,387
Message 81146 - Posted: 6 Feb 2017, 17:05:03 UTC - in response to Message 81145.  

"6.02.2017 17:38:42 | rosetta@home | Rosetta Mini for Android is not available for your type of Computer"

when trying to download a new task on my Windows system. Why so? What's going wrong?


A long time ago, in a galaxy far....
ID: 81146 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 81147 - Posted: 6 Feb 2017, 18:19:49 UTC - in response to Message 81145.  
Last modified: 6 Feb 2017, 18:26:04 UTC

Unfortunaltely, now again the BOINC messanger shows the meassage

"6.02.2017 17:38:42 | rosetta@home | Rosetta Mini for Android is not available for your type of Computer"


Erich,
This is a known problem. Rosetta Mini for Android problem

Read earlier in this thread. They are working to find and fix it. But it has gotten better for me the last day or two, whether that means anything.
ID: 81147 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 81148 - Posted: 6 Feb 2017, 20:24:53 UTC - in response to Message 81147.  

Unfortunaltely, now again the BOINC messanger shows the meassage

"6.02.2017 17:38:42 | rosetta@home | Rosetta Mini for Android is not available for your type of Computer"


Erich,
This is a known problem. Rosetta Mini for Android problem

Read earlier in this thread. They are working to find and fix it. But it has gotten better for me the last day or two, whether that means anything.


I think this alert only occurs when there are no non-android work units available. It's not a serious issue.
ID: 81148 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 81149 - Posted: 6 Feb 2017, 20:33:25 UTC - in response to Message 81148.  

Unfortunaltely, now again the BOINC messanger shows the meassage

"6.02.2017 17:38:42 | rosetta@home | Rosetta Mini for Android is not available for your type of Computer"


Erich,
This is a known problem. Rosetta Mini for Android problem

Read earlier in this thread. They are working to find and fix it. But it has gotten better for me the last day or two, whether that means anything.


I think this alert only occurs when there are no non-android work units available. It's not a serious issue.


We know it is not a serious issue, but EVERYONE that encounters this message immediately feels things are not running properly (except, I suppose, an Android user). That is why the request was made to improve the wording of the message.
Rosetta Moderator: Mod.Sense
ID: 81149 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2125
Credit: 41,228,659
RAC: 9,701
Message 81151 - Posted: 7 Feb 2017, 5:17:50 UTC - in response to Message 81148.  

Unfortunately, now again the BOINC messanger shows the message

"6.02.2017 17:38:42 | rosetta@home | Rosetta Mini for Android is not available for your type of Computer"

Erich,
This is a known problem. Rosetta Mini for Android problem

Read earlier in this thread. They are working to find and fix it. But it has gotten better for me the last day or two, whether that means anything.

I think this alert only occurs when there are no non-android work units available. It's not a serious issue.

But the 24hr backoff that results directly from it <is> a serious issue for <users> if not for the Rosetta project itself. Our buffers run out and we either run nothing or many tasks get downloaded from backup projects if we have one set.

I can't even believe you said that tbh.

It came up on 3 of my devices today and 1 of my team-members - all coming up with the 24hr backoff message. 2 of those 4 are attended, 2 aren't. If the unattended ones are unlucky they'll re-poll after 24hrs and maybe find there aren't tasks again, which'll mean they get another 24hr backoff and run out of Rosetta work. And when I get to them later in the week I'll spend a few days forcing a heap of non-preferred project's tasks to run in order to clear them down so there's space to get Rosetta tasks back into their buffer. Then when I get back here a few days later I may find the same here and do the same.

You make think it's not serious. I think it's a circus that's been driving me crazy for the last few months without a break.

So if you could see your way clear to changing that back-off to 1 hour instead of 24 hours (sounds like two minutes work to me) - because it hasn't happened yet - I'd kind of appreciate it, if that's not too much to ask.

And if you could avoid saying all this manual intervention I'm having to do week after week after week after week "isn't a serious issue" ever again in your entire lifetime, that would be kind of neat too.

No rush, obviously...
ID: 81151 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 . . . 302 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org