Message boards : Number crunching : Work Unit Compression discussion
Author | Message |
---|---|
Scribe Send message Joined: 2 Nov 05 Posts: 284 Credit: 157,359 RAC: 0 |
It had to happen, Nite Owl who has crunched DC medical projects since at least 2001, with his farm of 38 machines has now quit DC altogether. Mainly 'cos he want to do ONLY Rosetta and his ISP has imposed a bandwidth limit on him which stops him. He is in the boondocks and on a satellite download. Please try and get the WU compression as soon as other work permits and maybe we can get him back. Please do not litter this thread with "Boinc is multi project" etc as he is not interested, we have already tried him on this and we do know about it - see my stats! If you can keep us up to date about progress in here I can make sure he knows....thanks. |
Moderator9 Volunteer moderator Send message Joined: 22 Jan 06 Posts: 1014 Credit: 0 RAC: 0 |
At the request of a user this thread was started as a place for people to discuss issues related to movement of large Work Units, and possible changes to the compression or work unit size to solve the problem for users with low, time limited, rate structure limited, restricted or rationed internet connections. THis should provide a place for the project team to comment on the issue as they work on the problem. Moderator9 ROSETTA@home FAQ Moderator Contact |
Scribe Send message Joined: 2 Nov 05 Posts: 284 Credit: 157,359 RAC: 0 |
Thanks Mod No. 9 |
FluffyChicken Send message Joined: 1 Nov 05 Posts: 1260 Credit: 369,635 RAC: 0 |
To save everyone a lot of retyping i'll post this link here to previous descussion and ideas on Bandwidth usage. Feedback, .. bandwidth usage :-( Team mauisun.org |
carl.h Send message Joined: 28 Dec 05 Posts: 555 Credit: 183,449 RAC: 0 |
Copied and pasted from our forum My current viewpoint After reading through (struggling) Fluffy Chickens thread on Rosetta. Not all Czech`s bounce but I`d like to try with Barbar ;-) Make no mistake This IS the TEDDIES TEAM. |
carl.h Send message Joined: 28 Dec 05 Posts: 555 Credit: 183,449 RAC: 0 |
I maybe wrong with my assumption previously as the WU`s appear to have gone back to a 2 hour 20 finish. Not all Czech`s bounce but I`d like to try with Barbar ;-) Make no mistake This IS the TEDDIES TEAM. |
Moderator9 Volunteer moderator Send message Joined: 22 Jan 06 Posts: 1014 Credit: 0 RAC: 0 |
I maybe wrong with my assumption previously as the WU`s appear to have gone back to a 2 hour 20 finish. Carl.h Dr. Baker has posted a news item on the Home page news area just today (27/1/06). While it does not speak directly to the bandwidth issue, you can gather from what it does say, that the problem should be reduced. Also the Max Time failures should diminish as well. Moderator9 ROSETTA@home FAQ Moderator Contact |
R/B Send message Joined: 8 Dec 05 Posts: 195 Credit: 28,095 RAC: 0 |
This would seem to explain that of the 7 wu's I d/l on one of my machines today I have 6 with a shortened 1 week deadline and 1 with a deadline of 4 weeks labeled 'INCREASE_CYCLES_10_dtj1_287_583'. Founder of BOINC GROUP - Objectivists - Philosophically minded rational data crunchers. |
truckpuller Send message Joined: 5 Nov 05 Posts: 40 Credit: 229,134 RAC: 0 |
Here as late i have had to drop 2 computers from running rosetta because of band width also ((being on dial-up @ 32Kbps ))just not enough time to be able to download enough jobs to keep all my computers running. I have left my internet connection on all night to get up 8 hours later and jobs are still downloading. I do know that there are a lot of people out running dial-up also. Visit us at Christianboards.org |
Jeff Gilchrist Send message Joined: 7 Oct 05 Posts: 33 Credit: 2,398,990 RAC: 0 |
I'm assuming it's the Rosetta core that handles the .gz input files and not the BOINC client, so switching from GZIP to using BZIP2 (http://www.bzip.org/) seems would save a lot of space for tranferring data. A bzip2 library is available as free open source and works very similarly to the gzip library that Rosetta would be currently using. Taking a few files my client is current working on: aa1ogw_09_05.400_v1_3.gz is 3657522 bytes bb1bm8_09_05.200_v1_3.gz is 2425482 bytes bb1iibA09_05.200_v1_3.gz is 2492662 bytes The same files compressed with bzip2 are: aa1ogw_09_05.400_v1_3.bz2 is 2224708 bytes bb1bm8_09_05.200_v1_3.bz2 is 1481306 bytes bb1iibA09_05.200_v1_3.bz2 is 1488119 bytes So you can see it would make a big difference in bandwidth to switch over to using bzip2 to compress the data files from gzip. Jeff. |
FluffyChicken Send message Joined: 1 Nov 05 Posts: 1260 Credit: 369,635 RAC: 0 |
I was assuming (from what the previous discussions) that it is the BOINC client that does the gzipping and Rosetta would need to implement their own (although I could well be wrong :-D) From memory they where going to look into BZIP2 (due to the similarity wit GZIP) also good old ZIP (as Climateprediction use that) 7ZIP was also talked about (open souse as well) RAR, not open source but nice chaps and an the decompressor is free to use AFAIK. Around 50% (or 2x) compression for all options was seen. The layout of the files has also been stated as a way to help compression further. Team mauisun.org |
Johnathon Send message Joined: 5 Nov 05 Posts: 120 Credit: 138,226 RAC: 0 |
I'm having to think about shutting down rossetta now... I just cant afford to keep on spending my money on call costs on dialup, even on off peak rates. |
Carlos_Pfitzner Send message Joined: 22 Dec 05 Posts: 71 Credit: 138,867 RAC: 0 |
I'm having to think about shutting down rossetta now... I just cant afford to keep on spending my money on call costs on dialup, even on off peak rates. On Brazil "Telemar" lauched a plan "Internet sem limites" (ilimited internet) at fixed rate R$ 30.00/month This plan arrived into a good time. -:) *last month my telephone bill surpassed R$ 400.00 --(: I was about to left too However speed is still limited to dialup speeds 14-40 kbps cause hundreds of miles of deteriored telephone cables. *must explict adhere to this plan or continue being billed by impulse, R$ 0.14 each 4 minutes out-off 00:00-06:00 *valid for any ISP wich dialup number begins 1500-nnnn At my connection speed, IF (WU size) does not reduce, even connected 24 hours/day I will not able to keep my pc busy -:( I believe the best is bzip2 cause it compress more than gzip it is free, and run in all OS(s) 7zip compress even more, but consumes too much cpu -:( Other thing that can be done is supressing from WUs some files not essential for "crunching" , at users option eg: xxxx.pbd -> used by the screen saver to show "native fold" Sure, may be, there are other files that can be supressed too. Click signature for global team stats |
Dimitris Hatzopoulos Send message Joined: 5 Jan 06 Posts: 336 Credit: 80,939 RAC: 0 |
Carlos, as David Baker posted yesterday: David Kim has a very nice fix for all of the work unit time related problems. The new app will have a default target run time of 8 hours, and this rather than -nstruct will determine how many structures are generated per work unit. You will be able to change this target run time to fit your individual preferences--dial up users may wish to make this somewhat longer to reduce traffic still more. So you just need to wait a few more days, then set R to work for e.g. 24 or 48 or 72 CPU hours (depending on your resource share, so you can meet the 1 week deadline) on the SAME WU, so you won't have to download a new WU every 3 hours. We'll need to experiment with this new feature, but I think people with big farms and traffic quotas (like NightOwl) might want to increase it quickly to a high number, e.g. 72hrs, so it'll crunch on the same WU for 3 full days (and still have plenty of time to meet the 7 day deadline). That should drop traffic to 30MBytes/month/P4. This was indeed the best possible solution. Compression will be a welcome, but minor bonus. The use of bzip would be nice, but insignificant compared to the new feature imho. Best UFO Resources Wikipedia R@h How-To: Join Distributed Computing projects that benefit humanity |
Carlos_Pfitzner Send message Joined: 22 Dec 05 Posts: 71 Credit: 138,867 RAC: 0 |
This was indeed the best possible solution. Compression will be a welcome, but minor bonus. The use of bzip would be nice, but insignificant compared to the new feature imho. Some posts backward into this thread Jeff Gilchrist showed that bzip2 vs gzip do a effective reduction of about 50% on the size of some WUs, he picked to do this test ... Remains only the psycologic interpretation of "insignificant" For me, 50% size down is a very significant size/network traffic reduction, may be that on your interpretation, 50% means insignificant ? ps: the use of bzip2 , does not preclude the new feature. Why not implementing both at same time ? Click signature for global team stats |
BennyRop Send message Joined: 17 Dec 05 Posts: 555 Credit: 140,800 RAC: 0 |
If we're limited to 24 hours (and hopefully we'll be able to ask for 3-6 days of jobs for WUs) we'll be able to download just 1 WU instead of 8ea 3 hour WUs, or 48ea 30 minute WUs (the ones that pounded the server into submission earlier this week). 1/8th to 1/48th of the download bandwidth is a much more significant reduction than a mere 1/2 of the download bandwidth. (And the value decreases even further for each additional day we're allowed to add to a WU we download.) However, as Carlos implied, reducing the bandwidth usage to half by switching to a much better compression algorithm is yet another quick and easy way of reducing the time we spend communicating with the server. As the project heads have mentioned an interest in compressing the WU downloads using one of the open source applications we've suggested, we'll see it incorporated into the client - hopefully in the near future. |
Johnathon Send message Joined: 5 Nov 05 Posts: 120 Credit: 138,226 RAC: 0 |
|
Dimitris Hatzopoulos Send message Joined: 5 Jan 06 Posts: 336 Credit: 80,939 RAC: 0 |
Carlos, I 100% AGREE with you on bzip2 vs gzip. Also bzip2 lib is afaik a plug-in replacement for gzip under Linux and Win, so implementing it should be real easy. What I meant is that bzip-vs-gzip will at best reduce bandwidth to to 1/2 of current bandwidth, whereas the new crunch-wu-for-X-hrs feature can reduce bandwidth to 1/30th or more (if you have your PC on 24/7 and set it to crunch R WUs for 4-CPU-days, i.e. 345600sec and still meet even the smallest 7-calendar day R@H deadline). If both "crunch-for-x-hrs" and bzip2, it'd reduce bandwidth requirements to 1/50th or 1/60th or current levels. Obviously, the less bandwidth the better (I can sympathise with dialup guys, many years ago during the BBS + early internet days, I used to download international e-mail at speed of 2400bps, i.e. 230 bytes per second, paying $$$/min to the monopoly telecom, some of it uncompressed -MNP5 or V.32bis LAP/M weren't supported by everyone back then). Best UFO Resources Wikipedia R@h How-To: Join Distributed Computing projects that benefit humanity |
truckpuller Send message Joined: 5 Nov 05 Posts: 40 Credit: 229,134 RAC: 0 |
I have noticed larger uploads and downloads i have 1 download waiting that is 6.5MB and im on dial-up@ 32Kbps and have had uploads of 250Kb- 750Kb so this is very hard for me to keep 3 computers rumming at all times. I have left internet connection on all night to get up 8 hours later and still downloading jobs. I know im not the only 1 here on dial-up . Visit us at Christianboards.org |
R/B Send message Joined: 8 Dec 05 Posts: 195 Credit: 28,095 RAC: 0 |
If I were to adjust my preferences down for a 4 or 6 hour target to completion per work unit would I be able to keep my 4 week deadline? One of my machines is not operating very often and one work unit just missed the deadline. This, however, was a 1 week deadline work unit though. Thanks. Founder of BOINC GROUP - Objectivists - Philosophically minded rational data crunchers. |
Message boards :
Number crunching :
Work Unit Compression discussion
©2024 University of Washington
https://www.bakerlab.org