Message boards : Number crunching : Problems with web site
Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 . . . 19 · Next
Author | Message |
---|---|
hedera Send message Joined: 15 Jul 06 Posts: 76 Credit: 5,254,801 RAC: 745 |
I just realized my CPU usage was essentially nil, and when I checked BOINC, the message log shows it's been requesting tasks from Rosetta all day (since 8:15 AM local) and never gotten any tasks to work on. I'm running BOINC Manager 6.2.19 on Windows XP SP2, with 4 GB of RAM and a 3.2 GHz Pentium 4 CPU. I've never seen this problem before. Every request for tasks gets this: 01/03/2009 10:23:32 PM|rosetta@home|Sending scheduler request: Requested by user. Requesting 60480 seconds of work, reporting 0 completed tasks 01/03/2009 10:23:37 PM|rosetta@home|Scheduler request succeeded: got 0 new tasks Is there something wrong with requesting 60480 seconds of work? Do I need to upgrade BOINC Manager again? Have you no tasks available? Help? --hedera Never be afraid to try something new. Remember that amateurs built the ark. Professionals built the Titanic. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
There is no work available for the time being due to a system problem. Perhaps by Monday morning sometime Pacific time the team will correct this problem. Hang in there. See the server problems and not getting any work threads to follow whats going on. Also you can tell by the home page stats if there is any work or not, also look at the server status page via the link on the lower left hand side of the home page. I just realized my CPU usage was essentially nil, and when I checked BOINC, the message log shows it's been requesting tasks from Rosetta all day (since 8:15 AM local) and never gotten any tasks to work on. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,270,985 RAC: 1,405 |
I just realized my CPU usage was essentially nil, and when I checked BOINC, the message log shows it's been requesting tasks from Rosetta all day (since 8:15 AM local) and never gotten any tasks to work on. The systems status shows that one of the two work generator programs is down, and the other one can't keep up with the demand for more workunits; there were only 22 workunits available the last I looked, not necessarily including any suitable for your machine. For cases like this, you might want to add another project rather reliable at supplying workunits, but give it only a small fraction of your available CPU time: http://boinc.fzk.de/poem/ This one, at least, offers rather short workunits compared to many BOINC projects. Also, it is working on protein folding, as Rosetta@home is. Doesn't claim to be helping any specific diseases, though. Giving it only a small share of your available CPU time means that even if you keep it enabled, it normally won't use much of your machine, but it will still fill in any times when you can't get workunits from Rosetta@home. |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,154,825 RAC: 4,074 |
For cases like this, you might want to add another project rather reliable at supplying workunits, but give it only a small fraction of your available CPU time: Thanks I was looking for another Project with short turn around times AND was doing something for Science. I always thought Poem had to to do with poetry, silly me!!! I couldn't right poetry if my life depended on it, so always ignored the site. Thank again!!! |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,270,985 RAC: 1,405 |
For cases like this, you might want to add another project rather reliable at supplying workunits, but give it only a small fraction of your available CPU time: You're welcome. The SIMAP site also has short workunits, but is only active perhaps one week every month. Active today, though. http://boinc.bio.wzw.tum.de/boincsimap/ The malariacontrol.net site also has short workunits, but is going less active and often doesn't have any workunits available. http://www.malariacontrol.net/ Neither is as closely related to what Rosetta@home is doing, though. You might also want to watch Cels@home. Currently inactive for some changes including moving to another server, and with fewer workunits than requested even when it was active. Typical workunits were about 6 CPU hours on my machine. http://cels-at-home-dev.dyndns.org/cels/ This website is related to Cels@home, but it's not very clear if that's where they plan to move: http://ficp.engr.utexas.edu/cels/ Predictor@home has been essentially inactive for several months; I don't know how long the workunits are. World Community Grid tends to provide long workunits, but is about to go inactive in order to move to another servers site. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2121 Credit: 41,179,074 RAC: 11,480 |
I couldn't right poetry if my life depended on it... Surely you mean you couldn't write poetry if your life depended on it. That sounds right. Ok, sorry. I'm waiting for work and have nothing better to say. You may beat me with a stick... |
upstatelabs Send message Joined: 22 Jun 06 Posts: 10 Credit: 516,767 RAC: 0 |
Seems as there aren't any new WUs available.... Any idea when it'll be back to normal? |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
Seems as there aren't any new WUs available.... no one but the team does, its 920am back in Seattle so they should be looking into the problem by now. |
upstatelabs Send message Joined: 22 Jun 06 Posts: 10 Credit: 516,767 RAC: 0 |
no one but the team does, its 920am back in Seattle so they should be looking into the problem by now. Things seem to be fixed... I have WUs again :) |
Paul D. Buck Send message Joined: 17 Sep 05 Posts: 815 Credit: 1,812,737 RAC: 0 |
I have not been able to tell if they are issuing again, but WCG is on the air and taking work back and the web site is up again ... I am not sure if they are issuing work again or not ... I can't tell from my buffers ... and I just lowered their priority so I am probably not asking for work yet ... But most of their projects are Bio related ... |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,270,985 RAC: 1,405 |
I have not been able to tell if they are issuing again, but WCG is on the air and taking work back and the web site is up again ... They are issuing workunits again, although perhaps not as many. They're still checking for any unexpected effects of their new environment, such as running their server software on new servers. |
Kurre Send message Joined: 12 Apr 06 Posts: 9 Credit: 69,240 RAC: 0 |
My comp cant report work. This is the last error indication i could find in my log's. I have about 5 workunits that my comp tried to report but couldn't. They just dissapare from my comp and must be floating out there somewhere in the big cloud. Is it my instalation or is this an general issue???? Running boinc 6.4.5 2009-01-14 15:32:23||Internet access OK - project servers may be temporarily down. 2009-01-14 15:32:23|rosetta@home|Finished upload of abinitio_norelax_homfrag_129_B_2ccvA_SAVE_ALL_OUT_4626_12633_0_0 2009-01-14 15:32:25|rosetta@home|Scheduler request failed: Transferred a partial file 2009-01-14 15:33:25|rosetta@home|Sending scheduler request: To fetch work. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
If the tasks were removed from the list in your BOINC Manager, then another scheduler request went through successfully. BOINC will automatically retry until it goes through. Rosetta Moderator: Mod.Sense |
Kurre Send message Joined: 12 Apr 06 Posts: 9 Credit: 69,240 RAC: 0 |
If the tasks were removed from the list in your BOINC Manager, then another scheduler request went through successfully. BOINC will automatically retry until it goes through. The thing is that they dissapare from my lokal boinc client but they are still marked as in progress and new at the website. This is an example 220365003 200760705 12 Jan 2009 19:28:50 UTC 22 Jan 2009 19:28:50 UTC In Progress Unknown New |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
The only way I know of for tasks to show on the website, but not appear on your local BOINC Manager display is when your machine never received the work in the first place. The server thinks that it assigned it to you, but your machine never saw it. Some people called these "ghost WUs". Many of the task names are very similar. The simplest way to tell them apart is by the numbers at the very end of the name. Are you certain these are the tasks that you completed? Here is a link to your host. Rosetta Moderator: Mod.Sense |
Kurre Send message Joined: 12 Apr 06 Posts: 9 Credit: 69,240 RAC: 0 |
The only way I know of for tasks to show on the website, but not appear on your local BOINC Manager display is when your machine never received the work in the first place. The server thinks that it assigned it to you, but your machine never saw it. Some people called these "ghost WUs". Seems like I have 2 different problems Today i had same problem that i had a few years ago server was down but my client didn't care about that and just fluched the result. 13-Jan-2009 17:28:02 [rosetta@home] Finished upload of t075_1_NMRREF_1_t075_1_S_00002_0000200IGNORE_THE_REST_070000_6211_20_0_0 13-Jan-2009 20:50:14 [rosetta@home] Finished upload of abrelax_nofilter_-1n0u_-SAVE_ALL_OUT_6206_14591_0_0 13-Jan-2009 22:07:34 [rosetta@home] Finished upload of MaR214A_t071_1_RDC_NMR_NESG_SAVE_ALL_OUT_6215_13455_0_0 14-Jan-2009 15:32:23 [rosetta@home] Finished upload of abinitio_norelax_homfrag_129_B_2ccvA_SAVE_ALL_OUT_4626_12633_0_0 14-Jan-2009 17:21:43 [rosetta@home] Finished upload of abinitio_norelax_homfrag_129_B_1ten__SAVE_ALL_OUT_4626_10396_0_0 14-Jan-2009 19:25:12 [rosetta@home] Finished upload of t076_1_NMRREF_1_t076_1_idid_model_05_coreIGNORE_THE_REST_idl_6217_7848_0_0 Then in another log i found these notes that indicates problems at reboot or uncontrolled shutdowns of the client. Had that fealing before today that the error happened around or after reboots. Had to reboot the comp some times after patches and upgades after a 3 weeks vacation. cant find C:ProgramBOINC\RebootPending.txt Lets hope that the doubble \ just is a error in the logstring ;-) So no i can't be sure yet that my client had got all those jobs has to dig a bit deeper into the logfile's before i can say that but i haven't the time for that right now. Can there be some problems in the code that handle restart of the workunits. Probably an commom pease of code shared by all your projects. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Mike posted yesterday some of the issues they have been working on for the next release. I don't know the details of exactly what he means, but as I read his comment about "Bug fix in checkpointing machinery, states were not being correctly restored", I would say it sounds possible that this is exactly what you are talking about. If so, then yes, some problems were uncovered, and fixes are being tested and should be available in the next release. Rosetta Moderator: Mod.Sense |
Kurre Send message Joined: 12 Apr 06 Posts: 9 Credit: 69,240 RAC: 0 |
Mike posted yesterday some of the issues they have been working on for the next release. Ah ok the one that talking about instability in handling textfiles might fit on my earlier problem, because my machine get an error no 2 from XP. Can't find file that is and it's textfiles that don't seems to be handeled ok when a restart is done. And it seems to affects any WU:s so it's probably fixed now. And the error i had today is probably a harder one to isolate because it might not be easily reprodusable. Your server or my connection has to be in a special state so the boinc client thinks it's ok until it's to late (files already deleted from my client). It's not easy to get a two face commit to work properly or what it's called today. Some wasted cputime but you can't get them all. |
MM Sihombing Send message Joined: 22 May 06 Posts: 15 Credit: 1,424,082 RAC: 0 |
|
Paul D. Buck Send message Joined: 17 Sep 05 Posts: 815 Credit: 1,812,737 RAC: 0 |
And the error i had today is probably a harder one to isolate because it might not be easily reprodusable. Your server or my connection has to be in a special state so the boinc client thinks it's ok until it's to late (files already deleted from my client). It's not easy to get a two face commit to work properly or what it's called today. It is called Two Phase COmmit and actually it is not hard to make work at all ... it is just that the BOINC Developers probably did not think that it was really important to ensure that a proper two phase commit is needed. By their lights, it isn't ... the actual science is, and has been loaded, so, the data that they care about has been moved to the server. The trivia of proper accounting of credit and things like that are not that important to them ... As to the last statement ... well ... in relational databases, the two phase commit protocol is core to all activities ... including MySQL and SQL Server which are used by projects for BOINC ... |
Message boards :
Number crunching :
Problems with web site
©2024 University of Washington
https://www.bakerlab.org