Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 30 · 31 · 32 · 33 · 34 · 35 · 36 . . . 55 · Next
Author | Message |
---|---|
Sid Celery Send message Joined: 11 Feb 08 Posts: 2130 Credit: 41,424,155 RAC: 16,102 |
@Timo If it's 5 days for you, you have a problem. The first failure here was 22 hours before this message. Personally, I'm just glad it's not me this time... |
shanen Send message Joined: 16 Apr 14 Posts: 195 Credit: 12,662,308 RAC: 0 |
Wow, that is NOT the way to squelch rumors. I offered a joke that their poor communication skills were liable to cause rumors. So I started a rumor that an NSA trainee had bollixed up the project while trying to install a fake work unit that hijacks your computer's camera to take blackmail photos of your wife or girlfriend. The punchline of the joke was that the rumor would cause the trainee to get punished--but then the post with the joke disappeared. Maybe I pro-jesteth too much? If THIS joke disappears and you don't hear from me again... Well, you better hope they didn't catch you reading it, eh? |
Mild Mannered Professor Send message Joined: 5 Apr 09 Posts: 14 Credit: 22,006,166 RAC: 0 |
Some people here are incredibly immature and demanding - I'm talking to those who are stomping their feet and threatening to leave the project just 12 hours into what I'll call the work download/upload outage. Grow up people. My mom passed away after suffering from Alzheimer's for years. No one should have to suffer from that damned disease! I'll stay with Rosetta since they do research which could prevent that in the future. |
shanen Send message Joined: 16 Apr 14 Posts: 195 Credit: 12,662,308 RAC: 0 |
The NSA trainee should stop deleting these jokes and start preparing to clean toilets. Oh yes, why don't you fix the system on your way to the head? |
BarryAZ Send message Joined: 27 Dec 05 Posts: 153 Credit: 30,843,285 RAC: 0 |
It is rather unfortunate that this sort of problem -- which as others have noted, has happened before, goes on without any project comment, before, during, or after the event. This project has run reliably for years -- and I suspect the unacknowledged communications problem (with the server at any rate) will be resolved within a day or too. It simply seems to me that the lack of communication with the minions demonstrates a bit of disrespect. When I encountered this particular instance of the current failure, I simply redirected processing to other projects. I've done that before. I will likely return to processing for Rosetta some time relatively soon after the unacknowledged problem is resolved without comment. During the interim, projects like POEM will get more cycles. |
krypton Volunteer moderator Project developer Project scientist Send message Joined: 16 Nov 11 Posts: 108 Credit: 2,164,309 RAC: 0 |
I apologize for the issue. I've alerted those that take care of the server side... Our guy is looking into it right now... hopefully they can fix it soon. Unfortunately, there isn't anything I can do personally =[ I usually check once every other day for reported problems and forward them to the scientists involved. |
ronhelvey Send message Joined: 1 Feb 06 Posts: 1 Credit: 144,513 RAC: 0 |
I have a completed task which will not upload, thus blocking new tasks from downloading. Do I need to abort this task? Tried to update and force the upload several times over last 2 days to no avail. Options? I've had similar issues with the POGS project, but never with Rosetta until now. |
adrianxw Send message Joined: 18 Sep 05 Posts: 653 Credit: 11,840,739 RAC: 10 |
They are aware of the problem. I also have several wu's complete and not uploading, deadline approaching, but there is nothing I or you can do about it, so chill out and let it resolve. Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
Murasaki Send message Joined: 20 Apr 06 Posts: 303 Credit: 511,418 RAC: 0 |
shanen wrote: Wow, that is NOT the way to squelch rumors. Your post is still there. BarryAZ wrote: It is rather unfortunate that this sort of problem -- which as others have noted, has happened before, goes on without any project comment, before, during, or after the event. I'm wondering how much the issue is a lack of communication on the project side and how much it is a case of participants not bothering to read the comments (especially where I have requoted a project team member's comments in several threads). Not reading and then accusing the scientists of showing disrespect for not acknowledging the problem is rather ironic. krypton wrote: I apologize for the issue. I've alerted those that take care of the server side... Our guy is looking into it right now... hopefully they can fix it soon. Thanks for the continued communication, krypton. I think most of us welcome the messages from you and the other project team members. It just seems to be a vocal group who post first and think second. |
shanen Send message Joined: 16 Apr 14 Posts: 195 Credit: 12,662,308 RAC: 0 |
WHERE should we be looking for the status information? This part of the website is NOT doing a good job of disseminating anything but annoyance. There's a REALLY natural place for the project managers to communicate with the volunteers. If you click on the big "Home Page" button for the project, it takes you to the top page of the project's website. Right there you will see a lovely box that is labeled "News". There is a trivial announcement from a month ago, but what it SHOULD say is "We are aware of the problems." Even better if it says something about what the problems are or when the system will be repaired. On the same page there is a link for server status. That is obviously worthless and should be repaired. According to the Server Status Page that it takes you to, everything is perfect. I stand on my rumor that an NSA intern has hijacked the system! He is simply being overwhelmed by all of the video the volunteers' cameras are sending him! |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
Muraski, the scientific team is not to blame for this. Never has been never will be. The problem lies with the IT people of the project. It always seems there is not enough eyes keeping track of things. I only know of KEL who can perform wonders when alerted to the problem, but other than that I know of nobody else who really monitors the IT side. Since krypton monitors only every other day, then I suppose this would have been the first time he read about it. Krypton, Can you ask KEL or someone if you guys could set up a email address to report problems to? Then whoever is in charge of IT monitoring can have that email address bounce messages to their personal account so they know to go back to the tech email and deal with the problems. Think that might be a better solution than someone randomly selecting a day to come read the boards and look at the problem. Or someone needs to look and read this thread every day. I know of one project that is as big as RAH or bigger and their tech guy is all over any problems the same day. RAH has never been able to do this and I do not understand why. I know this issue predates your involvement, but it would be nice to see someone actually make some changes or propose them to Dr. B. Shanen, what your saying has been suggested many a time. It used to happen, but a few years ago tech updates stopped and never started. Problems are acknowledged after they are fixed. Never before or during the fix. Server page only shows top end servers, never the ones deep down that cause problems like this one. I too looked there to see if anything was offline, but it was all green, which made me look here and see that the issue was just starting to be discussed. Just another day at RAH technical services. The Science here is great! But the IT is not as good when it comes to responding to problems. |
BarryAZ Send message Joined: 27 Dec 05 Posts: 153 Credit: 30,843,285 RAC: 0 |
Thanks Krypton for your message. I'm guessing this particular problem is pretty much the same as has happened a couple of times in the past. For whatever reason, the IP changes that the UW IT folks make are not the sort of thing that other projects encounter that I've seen in the 10+ years I've done BOINC processing. The back end IT folks for own no doubt legitimate reasons need to reconfigure the network resulting in IP address changes that need to get propagated. In the past (this has happened twice previously that I can recall over the pat 8 or more years), sometimes the I.T. folks send out a message in advance to the project admins and they are able to post something on the home page in advance. Sometimes that message doesn't happen (I suspect that might be the case here). I'd note it is the project admins (along with whomever has the designated role regarding the home page and any message board postings) that get to deal with us, the unruly users. I'd note that those of us who use the message boards are typically the most irascible of the user community. The scientists have the task of working with the data and likely would not know of these problems unless they extended for a long enough period to be noticed as a lack of data to work with. In any event, sometimes with advance notice, some information regarding how to manually change the BOINC script for Rosetta to force a revised routing is available. Otherwise we simply get to wait for the 24 to 72 hours for the changes to propagate within the internet. So I meant no respect to the scientists -- nor to the admins at this juncture as I suspect that they too were either not in the information loop or perhaps not in their offices earlier this week when the problem first surfaced. In any event, I appreciate your message and your efforts. |
shanen Send message Joined: 16 Apr 14 Posts: 195 Credit: 12,662,308 RAC: 0 |
Well, I'll just say that my suggestion is to put the hither priority on distributing fresh tasks. It doesn't really bother me if I have a lot of completed work piling up here as long as it is eventually accepted. Right now the state of the system seems to be "useless". I could work on another project, but I dropped (at least) three others because of annoyance with how they were managed or pointlessness. This one seems to be a reasonable balance of good science and low-key but reliable-enough functioning. However, I'm still not convinced the NSA isn't involved. It would just be another example of looking where the light is better. Lots of cheap data is so attractive to some authoritarians. |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
Hi. Just saw on the news that they had a burst water main at the uni, caused a lot of flooding that might be the cause of this problem. Not sure thou! |
Daedalus Send message Joined: 1 Aug 08 Posts: 39 Credit: 10,107,661 RAC: 168 |
Same problem here. On two computers on two different ISPs. I noticed that while it seems unable to upload, my account acknowledges my computers have communicated with the project. |
Timo Send message Joined: 9 Jan 12 Posts: 185 Credit: 45,649,459 RAC: 0 |
These things take time to fix. In the meantime it might be a good idea to post something about this outage to the 'NEWS' section of the homepage. |
Sir Cracked of the Mind Send message Joined: 5 Apr 07 Posts: 2 Credit: 682,661 RAC: 0 |
I'm sitting on 30 units for over 24hrs(project backoff) no information or anything, isn't that why there is a 'notices' window on BOINC. There comes a point when you just think your throwing computers at a brick wall and start thinking if you cant find a better use for the time. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1233 Credit: 14,324,975 RAC: 3,637 |
The messages I'm getting is indicate that the problem that the Rosetta@Home server is no longer properly connected to the internet. However, the Ralph@Home server is still connected, and therefore getting more traffic than usual. |
amgthis Send message Joined: 25 Mar 06 Posts: 81 Credit: 203,879,282 RAC: 0 |
It seems like the server 'status' page for Rosetta@home rarely shows when anything is down. <sigh> 30-Jul-2014 06:42:38 [rosetta@home] Started upload of rtrpv1_full_length_rosettacm_cartrelax_truncated_asymm_IGNORE_THE_REST_176238_47563_0_0 30-Jul-2014 06:42:38 [rosetta@home] Started upload of rb_07_21_48263_94856_ab_stage0_t000___robetta_IGNORE_THE_REST_08_06_179551_12_0_0 30-Jul-2014 06:43:46 [---] Project communication failed: attempting access to reference site 30-Jul-2014 06:43:46 [rosetta@home] Temporarily failed upload of rtrpv1_full_length_rosettacm_cartrelax_truncated_asymm_IGNORE_THE_REST_176238_47563_0_0: connect() failed 30-Jul-2014 06:43:46 [rosetta@home] Backing off 2 hr 30 min 58 sec on upload of rtrpv1_full_length_rosettacm_cartrelax_truncated_asymm_IGNORE_THE_REST_176238_47563_0_0 30-Jul-2014 06:43:46 [rosetta@home] Temporarily failed upload of rb_07_21_48263_94856_ab_stage0_t000___robetta_IGNORE_THE_REST_08_06_179551_12_0_0: connect() failed 30-Jul-2014 06:43:46 [rosetta@home] Backing off 4 hr 11 min 26 sec on upload of rb_07_21_48263_94856_ab_stage0_t000___robetta_IGNORE_THE_REST_08_06_179551_12_0_0 30-Jul-2014 06:43:53 [---] Internet access OK - project servers may be temporarily down. 30-Jul-2014 06:44:58 [rosetta@home] Started upload of rtrpv1_full_length_rosettacm_cartrelax_truncated_asymm_IGNORE_THE_REST_176238_47563_0_0 30-Jul-2014 06:44:58 [rosetta@home] Started upload of rb_07_21_48263_94856_ab_stage0_t000___robetta_IGNORE_THE_REST_08_06_179551_12_0_0 30-Jul-2014 06:46:07 [---] Project communication failed: attempting access to reference site 30-Jul-2014 06:46:07 [rosetta@home] Temporarily failed upload of rtrpv1_full_length_rosettacm_cartrelax_truncated_asymm_IGNORE_THE_REST_176238_47563_0_0: connect() failed 30-Jul-2014 06:46:07 [rosetta@home] Backing off 3 hr 5 min 54 sec on upload of rtrpv1_full_length_rosettacm_cartrelax_truncated_asymm_IGNORE_THE_REST_176238_47563_0_0 30-Jul-2014 06:46:07 [rosetta@home] Temporarily failed upload of rb_07_21_48263_94856_ab_stage0_t000___robetta_IGNORE_THE_REST_08_06_179551_12_0_0: connect() failed 30-Jul-2014 06:46:07 [rosetta@home] Backing off 5 hr 18 min 18 sec on upload of rb_07_21_48263_94856_ab_stage0_t000___robetta_IGNORE_THE_REST_08_06_179551_12_0_0 30-Jul-2014 06:46:08 [---] Internet access OK - project servers may be temporarily down. 30-Jul-2014 06:49:55 [---] Received signal 15 30-Jul-2014 06:49:56 [---] Exit requested by user 30-Jul-2014 06:51:10 [---] Starting BOINC client version 7.0.27 for x86_64-pc-linux-gnu 30-Jul-2014 06:51:10 [---] log flags: file_xfer, sched_ops, task 30-Jul-2014 06:51:10 [---] Libraries: libcurl/7.26.0 OpenSSL/1.0.1e zlib/1.2.7 libidn/1.25 libssh2/1.4.2 librtmp/2.3 30-Jul-2014 06:51:10 [---] Data directory: /var/lib/boinc-client 30-Jul-2014 06:51:10 [---] Processor: 4 GenuineIntel Intel(R) Core(TM) i5-2500K CPU @ 3.30GHz [Family 6 Model 42 Stepping 7] 30-Jul-2014 06:51:10 [---] Processor: 6.00 MB cache 30-Jul-2014 06:51:10 [---] Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt tsc_deadline_timer aes xsave avx lahf_lm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid 30-Jul-2014 06:51:10 [---] OS: Linux: 3.2.0-4-amd64 30-Jul-2014 06:51:10 [---] Memory: 7.79 GB physical, 15.62 GB virtual 30-Jul-2014 06:51:10 [---] Disk: 18.15 GB total, 12.60 GB free 30-Jul-2014 06:51:10 [---] Local time is UTC -7 hours 30-Jul-2014 06:51:10 [---] No usable GPUs found 30-Jul-2014 06:51:10 [---] Config: GUI RPC allowed from: 30-Jul-2014 06:51:10 [---] Config: 192.168.242.174 30-Jul-2014 06:51:10 [---] A new version of BOINC is available. <a href=http://boinc.berkeley.edu/download.php>Download it.</a> 30-Jul-2014 06:51:10 [rosetta@home] URL https://boinc.bakerlab.org/rosetta/; Computer ID 1675855; resource share 100 30-Jul-2014 06:51:10 [rosetta@home] General prefs: from rosetta@home (last modified 19-Dec-2010 18:19:25) 30-Jul-2014 06:51:10 [rosetta@home] Computer location: home 30-Jul-2014 06:51:10 [---] General prefs: using separate prefs for home 30-Jul-2014 06:51:10 [---] Reading preferences override file 30-Jul-2014 06:51:10 [---] Preferences: 30-Jul-2014 06:51:10 [---] max memory usage when active: 7176.28MB 30-Jul-2014 06:51:10 [---] max memory usage when idle: 7176.28MB 30-Jul-2014 06:51:10 [---] max disk usage: 14.84GB 30-Jul-2014 06:51:10 [---] (to change preferences, visit the web site of an attached project, or select Preferences in the Manager) 30-Jul-2014 06:51:10 [---] Not using a proxy Initialization completed 30-Jul-2014 06:51:56 [rosetta@home] Started upload of tj_7_11_2helix_highRadius_X16_BBB_14_GB_1_o_fb_fragments_abinitio_SAVE_ALL_OUT_174752_292_0_0 30-Jul-2014 06:51:56 [rosetta@home] Started upload of ab_Tx767_t000__cstwt_1.0_IGNORE_THE_REST_03_09_179579_2110_0_0 30-Jul-2014 06:51:56 [rosetta@home] Restarting task frxtrimer_b5_04744_r3_A_frxtrimer_b5_04744_r3_B_patchdock_split_06_140721_SAVE_ALL_OUT__179597_176_0 using minirosetta version 352 in slot 3 30-Jul-2014 06:52:21 [rosetta@home] Restarting task 5H2LD_3_A_5H2LD_3_B_patchdock_split_05_140722_SAVE_ALL_OUT__179674_600_0 using minirosetta version 352 in slot 0 30-Jul-2014 06:52:21 [rosetta@home] Restarting task rb_07_20_48108_94838_ab_stage0_h001___robetta_IGNORE_THE_REST_09_09_179487_21_0 using minirosetta version 352 in slot 2 30-Jul-2014 06:52:41 [rosetta@home] Restarting task benchmark_0026_master_9699c665b4702afa86c605c374d1e7c8266f4b0e_T5_0.00_7.10_0.00_contact_opt_iteration_2_b417a9e3170e49b3874bdd0ac2ed91a3_fold_SAVE_ALL_OUT_179672_1875_0 using minirosetta version 352 in slot 1 30-Jul-2014 06:56:48 [---] Project communication failed: attempting access to reference site 30-Jul-2014 06:56:48 [rosetta@home] Temporarily failed upload of tj_7_11_2helix_highRadius_X16_BBB_14_GB_1_o_fb_fragments_abinitio_SAVE_ALL_OUT_174752_292_0_0: connect() failed 30-Jul-2014 06:56:48 [rosetta@home] Backing off 3 hr 26 min 31 sec on upload of tj_7_11_2helix_highRadius_X16_BBB_14_GB_1_o_fb_fragments_abinitio_SAVE_ALL_OUT_174752_292_0_0 30-Jul-2014 06:56:48 [rosetta@home] Started upload of hc_centroids_2bf5_4_0.25_06-01-14_SAVE_ALL_OUT_168127_3439_0_0 30-Jul-2014 06:57:01 [---] Internet access OK - project servers may be temporarily down. 30-Jul-2014 06:58:03 [---] Project communication failed: attempting access to reference site 30-Jul-2014 06:58:03 [rosetta@home] Temporarily failed upload of hc_centroids_2bf5_4_0.25_06-01-14_SAVE_ALL_OUT_168127_3439_0_0: connect() failed 30-Jul-2014 06:58:03 [rosetta@home] Backing off 54 min 35 sec on upload of hc_centroids_2bf5_4_0.25_06-01-14_SAVE_ALL_OUT_168127_3439_0_0 30-Jul-2014 06:58:03 [rosetta@home] Started upload of tj_7_11_2helix_highRadius_X16_BAB_14_BBGB_1_h_fb_fragments_abinitio_SAVE_ALL_OUT_174713_373_0_0 30-Jul-2014 06:58:10 [---] Internet access OK - project servers may be temporarily down. 30-Jul-2014 06:58:12 [---] Project communication failed: attempting access to reference site 30-Jul-2014 06:58:12 [rosetta@home] Temporarily failed upload of ab_Tx767_t000__cstwt_1.0_IGNORE_THE_REST_03_09_179579_2110_0_0: transient HTTP error 30-Jul-2014 06:58:12 [rosetta@home] Backing off 1 hr 45 min 48 sec on upload of ab_Tx767_t000__cstwt_1.0_IGNORE_THE_REST_03_09_179579_2110_0_0 30-Jul-2014 06:58:14 [---] Internet access OK - project servers may be temporarily down. 30-Jul-2014 06:59:12 [---] Project communication failed: attempting access to reference site 30-Jul-2014 06:59:12 [rosetta@home] Temporarily failed upload of tj_7_11_2helix_highRadius_X16_BAB_14_BBGB_1_h_fb_fragments_abinitio_SAVE_ALL_OUT_174713_373_0_0: connect() failed 30-Jul-2014 06:59:12 [rosetta@home] Backing off 7 min 56 sec on upload of tj_7_11_2helix_highRadius_X16_BAB_14_BBGB_1_h_fb_fragments_abinitio_SAVE_ALL_OUT_174713_373_0_0 30-Jul-2014 06:59:14 [---] Internet access OK - project servers may be temporarily down. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
Hi. Wrong University. That was UCLA in Los Angeles California. This project is located at the University of Washington in Seattle. |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2024 University of Washington
https://www.bakerlab.org